NAME

Muldis::D::Ext::Text - Muldis D extension for character string data types and operators

VERSION

This document is Muldis::D::Ext::Text version 0.55.0.

PREFACE

This document is part of the Muldis D language specification, whose root document is Muldis::D; you should read that root document before you read this one, which provides subservient details.

DESCRIPTION

Muldis D has a mandatory core set of system-defined (eternally available) entities, which is referred to as the Muldis D core or the core; they are the minimal entities that all Muldis D implementations need to provide; they are mutually self-describing and are used to bootstrap the language; any entities outside the core, called Muldis D extensions, are non-mandatory and are defined in terms of the core or each other, but the reverse isn't true.

This current Text document describes the system-defined Muldis D Text Extension, which consists of character string data types and operators, essentially all the generic ones that a typical programming language should have, but for the bare minimum needed for bootstrapping Muldis D, which are defined in the language core instead.

This current document does not describe the polymorphic operators that all types, or some types including core types, have defined over them; said operators are defined once for all types in Muldis::D::Core.

This documentation is pending.

Maybe TODO: Add proper subtypes of Text specific to those values in each Unicode Normal Form; such would be the output of a folded_to_UFC etc function; or maybe not as then maybe we'd want to add ASCII etc subtypes too, so all said, too much complexity for too little benefit.

SYSTEM-DEFINED TEXT-CONCERNING FUNCTIONS

These functions implement commonly used character string operations.

function sys.std.Text.catenation result Text params { topic(array_of.Text) }: This function results in the catenation of the N element values of its argument; it is a reduction operator that recursively takes each consecutive pair of input values and catenates (which is associative) them together until just one is left, which is the result. If topic has zero values, then catenate results in the empty string value, which is the identity value for catenate.
function sys.std.Text.repeat result Text params { topic(Text), count(NNInt) }: This function results in the catenation of count instances of topic.
function sys.std.Text.length_in_codepoints result NNInt params { topic(Text) }: This function results in the length of its argument in codepoints, or in other words, in the actual length of the argument since Muldis D explicitly works natively at the codepoint abstraction level.
function sys.std.Text.length_in_graphemes result NNInt params { topic(Text) }: This function results in the length of its argument in language-independent graphemes.
function sys.std.Text.is_substr result Bool params { look_in(Text), look_for(Text), fixed_start(Bool)?, fixed_end(Bool)? }: This function results in Bool:true iff its look_for argument is a substring of its look_in argument as per the optional fixed_start and fixed_end constraints, and Bool:false otherwise. If fixed_start or fixed_end are Bool:true, then look_for must occur right at the start or end, respectively, of look_in in order for contains to result in Bool:true; if either flag is Bool:false, its additional constraint doesn't apply. Each of the fixed_(start|end) parameters is optional and defaults to Bool:false if no explicit argument is given to it. Note that is_substr will handle the common special cases of SQL's "LIKE" operator for patterns like ['foo', '%foo', 'foo%', '%foo%'], but see also the is_match_using_like function which provides the full generality of SQL's "LIKE", such as 'foo%bar%baz'.
function sys.std.Text.is_not_substr result Bool params { look_in(Text), look_for(Text), fixed_start(Bool)?, fixed_end(Bool)? }: This function is exactly the same as sys.std.Text.is_substr except that it results in the opposite boolean value when given the same arguments.

FUNCTIONS FOR TEXT NORMALIZATION

These functions implement commonly used text normalization operations which are relatively simple or whose details are fully specified by the Unicode standard; examples are folding letters to lower or upper case, removing combining characters like accent marks and other diacritics from base letters, or removing or normalizing whitespace, or that convert text from a larger to a smaller character repertoire such as to ASCII. By contrast, operations such as stemming or removing common words or expanding abbreviations are not done by these functions and are best implemented by a third party language extension or library. You can use these functions as a basis for making comparison or ranking or collation operators that ignore some distinctions between values such as their case or accents, such as to do case-insensitive or accent-insensitive or whitespace-insensitive matching or indexing or sorting; the actual system-defined matching operators are still sensitive to case et al, but you can pretend they're not by having them work with the results of these normalization functions rather than on the inputs to these functions. This is useful when you want to emulate the semantics of insensitive though possibly preserving systems over Muldis D.

function sys.std.Text.folded_to_NF(C|D) result Text { topic(Text) }: This function results in the normalization of its argument into Unicode Normal Form C|D. TODO: Generalize this to handle the other normal forms, such as with an extra enum argument, or add extra functions. Also to do eventually, and definitely with the extra argument version, add normalization specific to locales, such as to handle language-specific graphemes right.
function sys.std.Text.case_folded_to_upper result Text { topic(Text) }: This function results in the normalization of its argument where any letters considered to be (small) lowercase are folded to (capital) uppercase.
function sys.std.Text.case_folded_to_lower result Text { topic(Text) }: This function results in the normalization of its argument where any letters considered to be (capital) uppercase are folded to (small) lowercase.
function sys.std.Text.accents_stripped result Text { topic(Text) }: This function results in the normalization of its argument where any accent marks or diacritics are removed from letters, leaving just the primary letters.
function sys.std.Text.ASCII result Text { topic(Text), mark(Text)? }: This function results in the normalization of its topic argument where any characters not in the 7-bit ASCII repertoire are stripped out, where each non-ASCII character is replaced with the common ASCII character string specified by its mark argument; if mark is the empty string, then the non-ASCII characters are simply stripped. This function is quite simple and does not do a smart replace with sequences of similar looking ASCII characters. The mark parameter is optional and defaults to the empty string if no explicit argument is given to it.
function sys.std.Text.whitespace_trimmed result Text { topic(Text) }: This function results in the normalization of its argument where any leading or trailing whitespace characters are trimmed.

FUNCTIONS FOR PATTERN MATCHING AND TRANSLITERATION

These functions implement commonly used operations for matching text against a pattern or performing substitutions of characters for others; included are both the functionality of SQL's simple "LIKE" pattern matching operator but also support for Perl 5's regular expressions and Perl 6's rules. All of these functions are case-sensitive et al as per is_identical unless explicitly given flags to do otherwise, where applicable; or just use them to search results of normalization functions if you need to. Note that Perl 5.10+ is also an inspiration such that its regular expression feature is algorithm-agnositic and can both be plugined with new algorithms or have multiple system-defined ones. Note that a lot of this section is still TODO, with several useful functions missing, or more complicated parts like the Perl pattern matching may be separated off into their own language extensions later.

function sys.std.Text.is_match_using_like result Bool params { look_in(Text), look_for(Text), escape(Text) }: This function results in Bool:true iff its look_in argument is matched by the pattern given in its look_for argument, and Bool:false otherwise. This function implements the full generalization of SQL's simple "LIKE" pattern matching operator. Any characters in look_for are matched literally except for the 2 wildcard characters _ (match any single character) and % (match any string of 0..N characters); the preceeding assumes that the escape argument is the empty string. If escape is a character, then that character is also special and its lone occurrence in look_for will no longer match itself as per the 2 wildcard characters; rather it will be used in look_for to indicate when the pattern wishes to match a literal _ or % or the escape character itself literally. For example, if \ is used as the escape character, then you use \_, \%, \\ to match the literal wildcard characters or itself, respectively.
function sys.std.Text.is_not_match_using_like result Bool params { look_in(Text), look_for(Text), escape(Text) }: This function is exactly the same as sys.std.Text.is_match_using_like except that it results in the opposite boolean value when given the same arguments; it implements SQL's "NOT LIKE".

AUTHOR

Darren Duncan (perl@DarrenDuncan.net)

LICENSE AND COPYRIGHT

This file is part of the formal specification of the Muldis D language.

See the LICENSE AND COPYRIGHT of Muldis::D for details.

TRADEMARK POLICY

The TRADEMARK POLICY in Muldis::D applies to this file too.

ACKNOWLEDGEMENTS

The ACKNOWLEDGEMENTS in Muldis::D apply to this file too.

To install Muldis::D, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Muldis::D

CPAN shell

perl -MCPAN -e shell
install Muldis::D

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)