NAME
Muldis::D::Core::Text - Muldis D character string operators
VERSION
This document is Muldis::D::Core::Text version 0.135.0.
PREFACE
This document is part of the Muldis D language specification, whose root document is Muldis::D; you should read that root document before you read this one, which provides subservient details. Moreover, you should read the Muldis::D::Core document before this current document, as that forms its own tree beneath a root document branch.
DESCRIPTION
This document describes essentially all of the core Muldis D operators that are specific to the core data type Text
, essentially all the generic ones that a typical programming language should have.
This documentation is pending.
FUNCTIONS IMPLEMENTING VIRTUAL ORDERED FUNCTIONS
sys.std.Core.Text.order
function order (Order <-- topic : Text, other : Text, misc_args? : Tuple, is_reverse_order? : Bool) implements sys.std.Core.Ordered.order {...}
This is a (total) order-determination
function specific to Text
. TODO: What (optional) misc_args
does this support?
FUNCTIONS IMPLEMENTING VIRTUAL STRINGY FUNCTIONS
sys.std.Core.Text.catenation
function catenation (Text <-- topic? : array_of.Text) implements sys.std.Core.Stringy.catenation {...}
This function results in the catenation of the N element values of its argument; it is a reduction operator that recursively takes each consecutive pair of input values and catenates (which is associative) them together until just one is left, which is the result. If topic
has zero values, then catenation
results in the empty string value, which is the identity value for catenation.
sys.std.Core.Text.replication
function replication (Text <-- topic : Text, count : NNInt) implements sys.std.Core.Stringy.replication {...}
This function results in the catenation of count
instances of topic
.
GENERIC FUNCTIONS FOR TEXTS
These functions implement commonly used character string operations.
sys.std.Core.Text.cat_with_sep
function cat_with_sep (Text <-- topic : array_of.Text, sep : Text) {...}
This function results in the catenation of the N element values of its topic
argument such that an instance of its sep
argument is catenated between each pair of consecutive topic
elements.
sys.std.Core.Text.len_in_nfd_codes
function len_in_nfd_codes (NNInt <-- topic : Text) {...}
This function results in the length of its argument in Unicode canonical decomposed normal form (NFD) abstract codepoints, or in other words, in the actual length of the argument since Muldis D explicitly works natively at the abstract codepoint abstraction level.
sys.std.Core.Text.len_in_graphs
function len_in_graphs (NNInt <-- topic : Text) {...}
This function results in the length of its argument in language-independent graphemes.
sys.std.Core.Text.has_substr
function has_substr (Bool <-- look_in : Text, look_for : Text, fixed_start? : Bool, fixed_end? : Bool) {...}
This function results in Bool:True
iff its look_for
argument is a substring of its look_in
argument as per the optional fixed_start
and fixed_end
constraints, and Bool:False
otherwise. If fixed_start
or fixed_end
are Bool:True
, then look_for
must occur right at the start or end, respectively, of look_in
in order for contains
to result in Bool:True
; if either flag is Bool:False
, its additional constraint doesn't apply. Each of the fixed_[start|end]
parameters is optional and defaults to Bool:False
if no explicit argument is given to it. Note that has_substr
will handle the common special cases of SQL's "LIKE" operator for patterns like ['foo', '%foo', 'foo%', '%foo%'], but see also the is_like
function which provides the full generality of SQL's "LIKE", such as 'foo%bar%baz'.
sys.std.Core.Text.has_not_substr
function has_not_substr (Bool <-- look_in : Text, look_for : Text, fixed_start? : Bool, fixed_end? : Bool) {...}
This function is exactly the same as sys.std.Core.Text.has_substr
except that it results in the opposite boolean value when given the same arguments.
FUNCTIONS FOR TEXT NORMALIZATION
These functions implement commonly used text normalization operations which are relatively simple or whose details are fully specified by the Unicode standard; examples are folding letters to lower or upper case, removing combining characters like accent marks and other diacritics from base letters, or removing or normalizing whitespace, or that convert text from a larger to a smaller character repertoire such as to ASCII. By contrast, operations such as stemming or removing common words or expanding abbreviations are not done by these functions and are best implemented by a third party language extension or library. You can use these functions as a basis for making comparison or ranking or collation operators that ignore some distinctions between values such as their case or marks, such as to do case-insensitive or mark-insensitive or whitespace-insensitive matching or indexing or sorting; the actual system-defined matching operators are still sensitive to case et al, but you can pretend they're not by having them work with the results of these normalization functions rather than on the inputs to these functions. This is useful when you want to emulate the semantics of insensitive though possibly preserving systems over Muldis D.
sys.std.Core.Text.upper
function upper (Text <-- topic : Text) {...}
This function results in the normalization of its argument where any letters considered to be (small) lowercase are folded to (capital) uppercase.
sys.std.Core.Text.lower
function lower (Text <-- topic : Text) {...}
This function results in the normalization of its argument where any letters considered to be (capital) uppercase are folded to (small) lowercase.
sys.std.Core.Text.marks_stripped
function marks_stripped (Text <-- topic : Text) {...}
This function results in the normalization of its argument where any accent marks or diacritics are removed from letters, leaving just the primary letters.
sys.std.Core.Text.ASCII
function ASCII (Text <-- topic : Text, mark? : Text) {...}
This function results in the normalization of its topic
argument where any characters not in the 7-bit ASCII repertoire are stripped out, where each non-ASCII character is replaced with the common ASCII character string specified by its mark
argument; if mark
is the empty string, then the non-ASCII characters are simply stripped. This function is quite simple and does not do a smart replace with sequences of similar looking ASCII characters. The mark
parameter is optional and defaults to the empty string if no explicit argument is given to it.
sys.std.Core.Text.trim
function trim (Text <-- topic : Text) {...}
This function results in the normalization of its argument where any leading or trailing whitespace characters are trimmed, but no other changes are made, including to any whitespace bounded by non-whitespace characters.
FUNCTIONS FOR PATTERN MATCHING AND TRANSLITERATION
These functions implement commonly used operations for matching text against a pattern or performing substitutions of characters for others; included are both the functionality of SQL's simple "LIKE" pattern matching operator but also support for Perl 5's regular expressions and Perl 6's rules. All of these functions are case-sensitive et al as per is_identical
unless explicitly given flags to do otherwise, where applicable; or just use them to search results of normalization functions if you need to. Note that Perl 5.10+ is also an inspiration such that its regular expression feature is algorithm-agnositic and can both be plugined with new algorithms or have multiple system-defined ones. Note that a lot of this section is still TODO, with several useful functions missing, or more complicated parts like the Perl pattern matching may be separated off into their own language extensions later. ACTUALLY, EACH NON-TRIVIAL PATTERN-MATCHING WILL BE ITS OWN OPTIONAL EXTENSION, SO ONE FOR PERL 6 RULES, ONE FOR PERL 5 REGEX, 1 PER OTHER REGEX KIND, ETC. CORE KEEPS THE TRIVIALLY SIMPLE 'LIKE' OF SQL.
sys.std.Core.Text.is_like
function is_like (Bool <-- look_in : Text, look_for : Text, escape? : Text) {...}
This function results in Bool:True
iff its look_in
argument is matched by the pattern given in its look_for
argument, and Bool:False
otherwise. This function implements the full generalization of SQL's simple "LIKE" pattern matching operator. Any characters in look_for
are matched literally except for the 2 wildcard characters _
(match any single character) and %
(match any string of 0..N characters); the preceeding assumes that the escape
argument is the empty string (or is missing). If escape
is a character, then that character is also special and its lone occurrence in look_for
will no longer match itself as per the 2 wildcard characters; rather it will be used in look_for
to indicate when the pattern wishes to match a literal _
or %
or the escape character itself literally. For example, if \
is used as the escape character, then you use \_
, \%
, \\
to match the literal wildcard characters or itself, respectively. Note that this operation is also known as is match using like or like
.
sys.std.Core.Text.is_not_like
function is_not_like (Bool <-- look_in : Text, look_for : Text, escape? : Text) {...}
This function is exactly the same as sys.std.Core.Text.is_like
except that it results in the opposite boolean value when given the same arguments; it implements SQL's "NOT LIKE". Note that this operation is also known as is not match using like or !like
or not-like
.
SEE ALSO
Go to Muldis::D for the majority of distribution-internal references, and Muldis::D::SeeAlso for the majority of distribution-external references.
AUTHOR
Darren Duncan (darren@DarrenDuncan.net
)
LICENSE AND COPYRIGHT
This file is part of the formal specification of the Muldis D language.
Muldis D is Copyright © 2002-2010, Muldis Data Systems, Inc.
See the LICENSE AND COPYRIGHT of Muldis::D for details.
TRADEMARK POLICY
The TRADEMARK POLICY in Muldis::D applies to this file too.
ACKNOWLEDGEMENTS
The ACKNOWLEDGEMENTS in Muldis::D apply to this file too.