NAME

Muldis::D::Dialect::PTMD_Tiny - How to format Plain Text Muldis D

VERSION

This document is Muldis::D::Dialect::PTMD_Tiny version 0.67.0.

PREFACE

This document is part of the Muldis D language specification, whose root document is Muldis::D; you should read that root document before you read this one, which provides subservient details.

DESCRIPTION

This document outlines the grammar of the Plain Text Muldis D dialect named PTMD_Tiny. The fully-qualified name of this Muldis D dialect, in combination with the base language spec it is bundled with, is Muldis_D:"http://muldis.com":"N.N.N":PTMD_Tiny (when the bundled base language version is substituted for the N.N.N).

This dialect is designed to exactly match the Muldis D system catalog (the possible representation of Muldis D code that is visible to or updateable by Muldis D programs at runtime) as to what non-critical meta-data it explicitly stores; so code in the PTMD_Tiny dialect should be round-trippable with the system catalog with the result maintaining all the details that were started with. Since it matches the system catalog, this dialect should be able to exactly represent all possible Muldis D base language code (and probably all extensions too), rather than a subset of it. That said, the PTMD_Tiny dialect does provide a choice of multiple syntax options for writing Muldis D value literals and DBMS entity (eg type and routine) declarations, so several very distinct PTMD_Tiny code artifacts may parse into the same system catalog entries. There is even a considerable level of abstraction in some cases, so that it is easier for programmers to write and understand typical PTMD_Tiny code, and so that this code isn't absurdly verbose.

This dialect is designed to be as small as possible while meeting the above criteria, and is designed such that a parser that handles all of this dialect can be tiny, hence the dialect's Tiny name. Likewise, a code generator for this dialect from the system catalog can be tiny.

A significant quality of the PTMD_Tiny dialect is that it is designed to work easily for a single-pass parser, or at least a single-pass lexer; all the context that one needs to know for how to parse or lex any arbitrary substring of code is provided by prior code, or any required lookahead is just by a few characters in general. Therefore, a PTMD_Tiny parser can easily work on a streaming input like a file-handle where you can't go back earlier in the stream. Often this means a parser can work with little RAM.

Also the dialect is designed that any amount of whitespace can be added or omitted next to most non-alphanumeric characters (which happen to be next to alphanumeric tokens) without that affecting the meaning of the code at all, except obviously for within character string literals. And long binary or character or numeric or identifier strings can be split into arbitrary-size substrings, without affecting the meaning. And many elements are identified by name rather than ordinal position, so to some degree the order they appear has no effect on the meaning. So programmers can easily format (separate, indent, linewrap, order) code how they like, and making an automated code reformatter shouldn't be difficult. Often, named elements can also be omitted entirely for brevity, in which case the parser would use context to supply default values for those elements.

Given that plain text is (more or less) universally unambiguously portable between all general purpose languages that could be used to implement a DBMS, it is expected that every single Muldis D implementation will natively accept input in the PTMD_Tiny dialect, which isn't dependent on any specific host language and should be easy enough to process, so it should be considered the safest official Muldis D dialect to write in by default, when you don't have a specific reason to use some other dialect.

See also the dialects HDMD_Perl6_Tiny and HDMD_Perl5_Tiny, which are derived directly from PTMD_Tiny, and represent possible Perl 6 and 5 concrete syntax trees for it; in fact, most of the details in common with those other dialects are described just in the current file, for all 3 dialects.

GENERAL STRUCTURE

A PTMD_Tiny Muldis D code file consists just of a full or partial Muldis D bootloader routine definition, which begins with a language name declaration, and otherwise is simply an ordered sequence of imperative routine calls, where earlier routine calls are to system-defined data-definition routines (their arguments are values to put in the system catalog), and later ones are then to user-defined routines that the earlier statements either loaded or defined. This is conceptually what a PTMD_Tiny file is, and it can even be that literally, but PTMD_Tiny provides a canonical further abstraction which should be used when doing data-definition. And so you typically use syntax resembling routine and type declarations in a general purpose programming language, where simply declaring such an entity will cause it to be written into the system catalog for subsequent use.

The grammar in this file is formatted as a Perl 6 grammar (see http://perlcabal.org/syn/S05.html for details) which could be used to parse it, but it is only meant to be illustrative, and would need further additions or changes to actually function in Perl 6. The grammar consists mainly of named tokens which define matching rules. A token's name is declared as a bareword following the keyword token and it is subsequently referenced with '<' and '>' delimiters. Any other bareword in a token definition consisting of alphanumerics is matched literally, and all non-quoted whitespace is not significant. Any pairs of parenthesis ('(' and ')') in token definitions are capturing groups, and each parser match by a pair corresponds to a capture node or node element in the concrete syntax tree resulting from the parse.

The root grammar token for the entire dialect is bootloader.

BOOTLOADER

Grammar:

token bootloader {
    (
        <language_name>
        [
              <value>
            | <depot>
            | <boot_stmt>+
        ]?
    )
}

A bootloader node has 1..N ordered elements where the first element is a language_name node and then either: 1. there is exactly one (second) element that is a value node or a depot node; 2. there are 1..N ordered elements where each is a boot_stmt node; 3. there are no other elements, making the bootloader a no-op.

See the pod sections in this file named "LANGUAGE NAME", "VALUE LITERALS AND SELECTORS", "DEPOT DECLARATION", and "BOOTLOADER STATEMENT" for more details about the aforementioned tokens/nodes.

When Muldis D is being compiled and invoked piecemeal, such as because the Muldis D implementing virtual machine (VM) is attached to an interactive user terminal, or the VM is embedded in a host language where code in the host language invokes Muldis D code at various times, the conceptual bootloader is usually split up, and so not every Muldis D code fragment would then have its own language_name. Usually a language_name would be supplied to the Muldis D VM just once as a VM configuration step, which provides a context for further interaction with the VM that just involves Muldis D code that isn't itself qualified with a language_name.

LANGUAGE NAME

Grammar:

token language_name {
    (
        <ln_base_name>
        <value_node_elem_sep>
        <ln_base_authority>
        <value_node_elem_sep>
        <ln_base_version_number>
        <value_node_elem_sep>
        <ln_dialect>
        [
            <value_node_elem_sep>
            <ln_extensions>
        ]?
    )
}

token ln_base_name { (Muldis_D) }

token ln_base_authority { <Name_payload> }

token ln_base_version_number { <Name_payload> }

token ln_dialect { (PTMD_Tiny) }

token ln_extensions { <QTuple_payload> }

As per the VERSIONING pod section of Muldis::D, code written in Muldis D must start by declaring the fully-qualified Muldis D language name it is written in. The PTMD_Tiny dialect formats this name as a language_name node having 4-5 ordered elements:

ln_base_name

This is the Muldis D language base name; it is simply the bareword character string Muldis_D.

ln_base_authority

This is the base authority; it is a character string formatted as per a specific-context Name value literal; it is typically the delimited character string http://muldis.com.

ln_base_version_number

This is the base version number; it is a character string formatted as per ln_base_authority; it is typically a character string like 1.2.3.

ln_dialect

This is the dialect name; it is simply the bareword character string PTMD_Tiny.

ln_extensions

Optional; this is a set of chosen pragma/parser-config options as per a Tuple SCVL; see the "MULDIS D TINY DIALECT PRAGMAS" pod section for more details.

Examples:

Muldis_D:"http://muldis.com":"1.2.3":PTMD_Tiny

Muldis_D:"http://muldis.com":"1.2.3":PTMD_Tiny:{
    auto_add_attrs          => true,
    auto_unabbrev_std_names => true,
    auto_chains_from_names  => true
}

VALUE LITERALS AND SELECTORS

Grammar:

token value {
      <opaque_value_literal>
    | <coll_value_selector>
}

token opaque_value_literal {
      <Bool>
    | <Order>
    | <RatRoundMeth>
    | <Int>
    | <Rat>
    | <Blob>
    | <Text>
    | <Name>
    | <NameChain>
    | <DeclNameChain>
    | <Comment>
    | <Instant>
    | <Duration>
    | <UTCInstant>
    | <FloatInstant>
    | <UTCDuration>
    | <String>
}

token coll_value_selector {
      <QScalar>
    | <QTuple>
    | <QRelation>
    | <QSet>
    | <QMaybe>
    | <QArray>
    | <QBag>
}

A value node is a Muldis D value literal, which is a common special case of a Muldis D value selector.

Unlike value selectors in general, which must be composed beneath a depot or boot_stmt because they actually represent a Muldis D value expression tree of a function or updater or type definition, a value node does not represent an expression tree, but rather a value constant; by definition, a value can be completely evaluated at compile time. A bootloader consisting directly of a value is hence just a serialized Muldis D value.

The PTMD_Tiny grammar subsection for value literals (having the root grammar token value) is completely self-defined and can be used in isolation from the wider grammar as a Muldis D sub-language; for example, a hosted-data Muldis D implementation may have an object representing a Muldis D value, which is initialized using code written in that sub-language.

Every grammar token, and corresponding capture node, representing a Muldis D value literal is similarly formatted and has 1-3 elements; the following pod section "Value Literal Common Elements" describes the similarities once for all of them, in terms of an alternate value token definition which is called x_value. And then the other pod sections specific to each kind of value literal then just focus on describing their unique aspects, namely their payloads.

An opaque_value_literal node represents a conceptually opaque Muldis D value, such that every one of these values is defined with its own literal syntax that is compact and doesn't look like a collection of other nodes; this includes the basic numeric and string literals.

A coll_value_selector node represents a conceptually transparent Muldis D value, such that every one of these values is defined visibly in terms of a collection of other nodes; this includes the basic tuple and relation selectors.

Value Literal Common Elements

A generic context value literal (or GCVL) is a value literal that can be properly interpreted in a context that is expecting a value but has no expectation that said value belongs to a specific data type; in the general case, a GCVL includes explicit value kind meta-data (such as, "this is an Int" or "this is a Name"); but with a few specific data types (see the value_kind node description for details) that meta-data may be omitted for brevity because the main literal has mutually uniquely identifying characteristics. For example, each element of a generic Muldis D collection value, such as a member of an array or tuple, could potentially have any type at all. In contrast, a specific context value literal (or SCVL) is a value literal that does not include explicit value kind meta-data, even when the main literal doesn't have uniquely identifying characteristics, because the context of its use supplies said meta-data. For example, in a tuple value literal it is assumed that a value literal in an attribute name position must denote a Name. The grammar token value|x_value denotes a GCVL, as do most short-named grammar tokens, like Int or Name; in contrast, a grammar token containing payload denotes a SCVL, like Int_payload or Name_payload.

Every GCVL has 1-3 elements, illustrated by this grammar:

token x_value {
    (
        [
            <value_kind> <value_node_elem_sep>
            [<type_name> <value_node_elem_sep>]?
        ]?
        <payload>
    )
}

token value_node_elem_sep { \s* ':' \s* }

token value_kind {
    (
          Bool
        | Order
        | RatRoundMeth
        | Int | NNInt | PInt
        | Rat | NNRat | PRat
        | Blob | OctetBlob
        | Text
        | Name
        | NameChain
        | DeclNameChain
        | Comment
        | Instant
        | Duration
        | UTC [Instant | DateTime | Date | Time]
        | Float [Instant | DateTime | Date | Time]
        | UTCDuration
        | String | BString | OString | UCPString
        | Q? Scalar
        | Q? Tuple | Database
        | Q? Relation
        | Q? Set
        | Q? [Maybe | Single]
        | Q? Array
        | Q? Bag
    )
}

token type_name { <NameChain_payload> }

token payload {
      <Bool_payload>
    | <Order_payload>
    | <RatRoundMeth_payload>
    | <Int_payload>
    | <Rat_payload>
    | <Blob_payload>
    | <Text_payload>
    | <Name_payload>
    | <NameChain_payload>
    | <DeclNameChain_payload>
    | <Comment_payload>
    | <Instant_payload>
    | <Duration_payload>
    | <UTCInstant_payload>
    | <FloatInstant_payload>
    | <UTCDuration_payload>
    | <String_payload>
    | <QScalar_payload>
    | <QTuple_payload>
    | <QRelation_payload>
    | <QSet_payload>
    | <QMaybe_payload>
    | <QArray_payload>
    | <QBag_payload>
}

So a x_value|value node has 1-3 elements in general:

value_kind

This is a character string of the format <[A..Z]> <[ a..z A..Z ]>+; it identifies the data type of the value literal in broad terms and is the only external meta-data of payload generally necessary to interpret the latter; what grammars are valid for payload depend just on value_kind.

For all values of just the 8 data types [Bool, Order, RatRoundMeth, Int, Rat, Blob, Text, Comment], the value_kind portion of a GCVL may be omitted for brevity, but the code parser should still be able to infer it easily by examining the first few characters of the payload, which for each of said 8 data types has a mutually uniquely identifying format, which is also distinct from all possible value_kind. Note that omission of value_kind is only allowed when the GCVL doesn't include a type_name element.

For just these certain special values of other data types, the same option of omitting the value_kind (and type_name) applies: Maybe:nothing.

type_name

This is a Muldis D data type name, for example sys.std.Core.Type.Int; it identifies a specific subtype of the generic type denoted by value_kind, and serves as an assertion that the Muldis D value denoted by payload is a member of the named subtype. Iff value_kind is [Q|]Scalar then type_name is mandatory; otherwise, type_name is optional for all value, except that type_name must be omitted when value_kind is one of the 2 [Bool, Order]; this isn't because those 2 types can't be subtyped, but because in practice doing so isn't useful.

payload

This is mandatory for all value.

For GCVL and SCVL examples, see the subsequent documentation sections.

OPAQUE VALUE LITERALS

Boolean Literals

Grammar:

token Bool {
    (
        [(Bool) <value_node_elem_sep>]?
        <Bool_payload>
    )
}

token Bool_payload {
    (false | true)
}

A Bool node represents a logical boolean value. It is interpreted as a Muldis D sys.std.Core.Type.Bool value as follows: The Bool_payload is a bareword character string formatted as per a Name SCVL, and it maps directly to the name possrep of the Bool type.

Examples:

Bool:true

false

Order-Determination Literals

Grammar:

token Order {
    (
        [(Order) <value_node_elem_sep>]?
        <Order_payload>
    )
}

token Order_payload {
    (increase | same | decrease)
}

An Order node represents an order-determination. It is interpreted as a Muldis D sys.std.Core.Type.Cat.Order value as follows: The Order_payload is a bareword character string formatted as per a Name SCVL, and it maps directly to the name possrep of the Order type.

Examples:

Order:same

decrease

Rational Rounding Method Literals

Grammar:

token RatRoundMeth {
    (
        [
            (RatRoundMeth) <value_node_elem_sep>
            [<type_name> <value_node_elem_sep>]?
        ]?
        <RatRoundMeth_payload>
    )
}

token RatRoundMeth_payload {
    (
          half_down | half_up
        | half_even
        | to_floor | to_ceiling
        | to_zero | to_inf
    )
}

A RatRoundMeth node represents a rational rounding method. It is interpreted as a Muldis D sys.std.Core.Type.Cat.RatRoundMeth value as follows: The RatRoundMeth_payload is a bareword character string formatted as per a Name SCVL, and it maps directly to the only possrep of the RatRoundMeth type.

Examples:

RatRoundMeth:half_up

to_zero

General Purpose Integer Numeric Literals

Grammar:

token Int {
    (
        [
            (Int | NNInt | PInt) <value_node_elem_sep>
            [<type_name> <value_node_elem_sep>]?
        ]?
        <Int_payload>
    )
}

token Int_payload {
    (
          (<num_max_col_val>) <value_payload_elem_sep> (<int_body>)
        | <d_int_body>
    )
}

token value_payload_elem_sep { \s* ';' \s* }

token num_max_col_val { <pint_head> }

token int_body { [0 | \-?<pint_body>] }

token pint_body { <pint_head> <pint_tail>? }

token pint_head { <[ 1..9 A..Z ]> }

token pint_tail { [[_?<[ 0..9 A..Z ]>+]+] ** <segment_sep> }

token segment_sep { \s* '~' \s* }

token d_int_body { [0 | \-?<d_pint_body>] }

token d_pint_body { <d_pint_head> <d_pint_tail>? }

token d_pint_head { <[ 1..9 ]> }

token d_pint_tail { [[_?<[ 0..9 ]>+]+] ** <segment_sep> }

An Int node represents an integer numeric value. It is interpreted as a Muldis D sys.std.Core.Type.Int value as follows:

If the Int_payload is composed of a num_max_col_val plus int_body, then the int_body is interpreted as a base-N integer where N might be between 2 and 36, and the num_max_col_val says which possible value of N to use. Assuming all int_body column values are between zero and N-minus-one, the num_max_col_val contains that N-minus-one. So to specify, eg, bases [2,8,10,16], use num_max_col_val of [1,7,9,F].

If the Int_payload is a d_int_body, then it is interpreted as a base 10 integer.

Fundamentally the body part of an Int node consists of a string of digits and uppercased (but not lowercased) letters, where each digit (0..9) represents its own number and each letter (A..Z) represents a number in [10..35]. A body may optionally contain underscore characters (_), which exist just to help with visual formatting, such as for 10_000_000, and these are ignored/stripped by the parser. A body may optionally be split into 1..N segments where each segment is separated by a tilde token (~); this segmenting ability is provided to support code that contains very long numeric literals while still being well formatted (no extra long lines); the tilde tokens are also ignored/stripped by the parser, and the body is interpreted as if all its alphanumeric characters were contiguous.

If the value_kind of a value node is NNInt or PInt rather than Int, then the value node is interpreted simply as an Int node whose type_name is NNInt or PInt, and the allowed body is appropriately further restricted.

Examples:

Int:1;11001001 # binary #

7;0 # octal #

7;644 # octal #

-34 # decimal #

42 # decimal #

F;DEADBEEF # hexadecimal #

Z;-HELLOWORLD # base-36 #

3;301 # base-4 #

B;A09B # base-12 #

General Purpose Rational Numeric Literals

Grammar:

token Rat {
    (
        [
            (Rat | NNRat | PRat) <value_node_elem_sep>
            [<type_name> <value_node_elem_sep>]?
        ]?
        <Rat_payload>
    )
}

token Rat_payload {
    (
          (<num_max_col_val>) <value_payload_elem_sep> (<rat_body>)
        | <d_rat_body>
    )
}

token rat_body {
      <int_body>\.<pint_tail>
    | (<int_body>) \s* \/ \s* (<pint_body>)
    | (<int_body>) \s* \* \s* (<pint_body>) \s* \^ \s* (<int_body>)
}

token d_rat_body {
      <d_int_body>\.<d_pint_tail>
    | (<d_int_body>) \s* \/ \s* (<d_pint_body>)
    | (<d_int_body>) \s* \* \s* (<d_pint_body>)
        \s* \^ \s* (<d_int_body>)
}

A Rat node represents a rational numeric value. It is interpreted as a Muldis D sys.std.Core.Type.Rat value as follows:

Fundamentally a Rat node is formatted and interpreted like an Int node, and any similarities won't be repeated here. The differences of interpreting a Rat_payload being composed of a num_max_col_val plus rat_body versus the Rat_payload being a d_rat_body are as per the corresponding differences of interpreting an Int_payload. Also interpreting a NNRat or PRat is as per a NNInt or PInt.

If the body part of a Rat node contains a radix point (.), then it is interpreted as is usual for a programming language with such a literal.

If the body part of a Rat node contains a solidus (/), then the rational's value is interpreted as the leading integer (a numerator) divided by the trailing positive integer (a denominator); that is, the two integers collectively map to the ratio possrep of the Rat type.

If the body part of a Rat node contains a asterisk (*) plus a circumflex accent (^), then the rational's value is interpreted as the leading integer (a mantissa) multiplied by the result of the middle positive integer (a radix) taken to the power of the trailing integer (an exponent); that is, the three integers collectively map to the float possrep of the Rat type.

Examples:

Rat:1;-1.1

-1.5 # same val as prev #

3.14159

A;0.0

F;DEADBEEF.FACE

Z;0.000AZE

Rat:6;500001/1000

B;A09B/A

Rat:1;1011101101*10^-11011

45207196*10^37

1/43

314159*10^-5

General Purpose Binary String Literals

Grammar:

token Blob {
    (
        [
            (Blob | OctetBlob) <value_node_elem_sep>
            [<type_name> <value_node_elem_sep>]?
        ]?
        <Blob_payload>
    )
}

token Blob_payload {
    ((<blob_max_col_val>) <value_payload_elem_sep> (<blob_body>))
}

token blob_max_col_val { <[137F]> }

token blob_body {
    [
        <[']>
            (<[ 0..9 A..F ]>*)
        <[']>
    ] ** <segment_sep>
}

A Blob node represents a general purpose bit string. It is interpreted as a Muldis D sys.std.Core.Type.Blob value as follows: Fundamentally the body part of a Blob node consists of a delimited string of digits and uppercased (but not lowercased) letters, where each digit (0..9) represents its own number and each letter (A..F) represents a number in [10..15]; this string is qualified with a blob_max_col_val character ([137F]), similarly to how an int_body is qualified by a num_max_col_val. Each character of the delimited string specifies a sequence of one of [1,2,3,4] bits, depending on whether blob_max_col_val is [1,3,7,F]. If the value_kind of a value node is OctetBlob rather than Blob, then the value node is interpreted simply as an Blob node whose type_name is OctetBlob, and the delimited string is appropriately further restricted.

Examples:

Blob:1;'00101110100010' # binary #

3;''

F;'A705E' # hexadecimal #

7;'523504376'

General Purpose Character String Literals

Grammar:

token Text {
    (
        [
            (Text) <value_node_elem_sep>
            [<type_name> <value_node_elem_sep>]?
        ]?
        <Text_payload>
    )
}

token Text_payload {
    (
        [
            <[']>
                ([ <-[\\\'\t\n\f\r]> | <escaped_char>]*)
            <[']>
        ] ** <segment_sep>
    )
}

token escaped_char {
      '\b' | '\a' | '\q' | '\h'
    | '\s'
    | '\t' | '\n' | '\f' | '\r'
    | '\c<' [
          [<[ A..Z ]>+] ** ' '
        | [0 | <[ 1..9 ]> <[ 0..9 ]>*]
        | <[ 1..9 A..Z ]> ';' [0 | <[ 1..9 A..Z ]> <[ 0..9 A..Z ]>*]
      ] '>'
}

A Text node represents a general purpose character string. It is interpreted as a Muldis D sys.std.Core.Type.Text value as follows:

The Text_payload is interpreted generally as is usual for a programming language with such a delimited character string literal.

A Text_payload may optionally be split into 1..N segments where each segment is delimited by apostrophes/single-quotes (') and separated by a tilde token (~); this segmenting ability is provided to support code that contains long string literals while still being well formatted (no extra long lines); the tilde tokens and adjoining string delimiters are ignored/stripped by the parser, and the Text_payload is interpreted as if it just consisted of a single delimited string.

All Muldis D delimited character string literals (generally the 3 Text, Name, Comment) may contain some characters denoted with escape sequences rather than literally. The Muldis D parser would substitute the escape sequences with the characters they represent, so the resulting character string values don't contain those escape sequences. Currently there are 2 classes of escape sequences, called simple and complex.

The meanings of the simple escape sequences are:

Esc | Unicode   | Unicode         | Chr | Literal character used
Seq | Codepoint | Character Name  | Lit | for when not escaped
----+-----------+-----------------+-----+------------------------------
\b  | F;5C      | REVERSE SOLIDUS | \   | esc seq lead (aka backslash)
\a  | F;27      | APOSTROPHE      | '   | delim Text literals
\q  | F;22      | QUOTATION MARK  | "   | delim quoted Name literals
\h  | F;23      | NUMBER SIGN     | #   | delim Comment lit (aka hash)
\s  | F;20      | SPACE           |     | space char
\t  | F;9       | CHAR... TAB...  |     | control char horizontal tab
\n  | F;A       | LINE FEED (LF)  |     | ctrl char line feed / newline
\f  | F;C       | FORM FEED (FF)  |     | control char form feed
\r  | F;D       | CARR. RET. (CR) |     | control char carriage return

One design decision of PTMD_Tiny that is distinct from typical other languages is that an escape sequence for any character used as a delimiter never contains that literal character. For example, while in SQL or Perl character strings delimited by ', they typically escape literal apostrophes/single-quotes as '' or \'; while this is unambiguous, the task of parsing such code is considerably more difficult than it could be. In contrast, while in PTMD_Tiny character strings delimited by ', a literal of the same is escaped with \a; so parsing such code is an order of magnitude easier because the parser doesn't have to understand the internals of any character string literal in order to separate out the character string from its surrounding code.

Another design decision of PTMD_Tiny that is distinct at least from Perl is that non-"space" whitespace characters in character string literals must never appear literally, but must instead be denoted with escape sequences. The main reason for this is to ensure that the actual values being selected by the string literals were not variable per the kind of linebreaks used to format the Muldis D source code itself.

There is currently just one complex escape sequence, of the format \c<...>, that supports specifying characters in terms of their Unicode codepoint name or number. If the ... consists of just uppercased (not lowercased) letters and the space character, then the ... is interpreted as a Unicode character name. If the ... looks like an Int_payload, sans that underscores and tilde segmentation aren't allowed here, then the ... is interpreted as a Unicode codepoint number. One reason for this feature is to empower specifying exactly which sequence of codepoints you want for a particular grapheme (which text editors tend to abstract away when you write characters literally), and also it is to empower more elegant passing of Unicode-savvy PTMD_Tiny source code through a communications channel that is more limited, such as to 7-bit ASCII.

Examples:

Text:'Ceres'

'サンプル'

''

'Perl'

'\c<LATIN SMALL LETTER OU>\c<F;263A>\c<65>'

DBMS Entity Name Literals

Grammar:

token Name {
    (
        (Name) <value_node_elem_sep>
        [<type_name> <value_node_elem_sep>]?
        <Name_payload>
    )
}

token Name_payload {
    (<nonquoted_name_str> | <quoted_name_str>)
}

token nonquoted_name_str { <[ a..z A..Z _ ]><[ a..z A..Z 0..9 _ - ]>* }

token quoted_name_str {
    [
        <["]>
            ([ <-[\\\"\t\n\f\r]> | <escaped_char>]*)
        <["]>
    ] ** <segment_sep>
}

token NameChain {
    (
        (NameChain) <value_node_elem_sep>
        [<type_name> <value_node_elem_sep>]?
        <NameChain_payload>
    )
}

token NameChain_payload {
    (<Name_payload> [<name_chain_elem_sep> <Name_payload>]+)
}

token name_chain_elem_sep { \s* '.' \s* }

token DeclNameChain {
    (
        (DeclNameChain) <value_node_elem_sep>
        [<type_name> <value_node_elem_sep>]?
        <DeclNameChain_payload>
    )
}

token DeclNameChain_payload {
    (
          <Name_payload> ** <name_chain_elem_sep>
        | <ord_list_open> <ord_list_close>
    )
}

A Name node represents a canonical short name for any kind of DBMS entity when declaring it; it is a character string type, that is disjoint from Text. It is interpreted as a Muldis D sys.std.Core.Type.Cat.Name value as follows:

Fundamentally a Name node is formatted and interpreted like a Text node, and any similarities won't be repeated here. Unlike a Text_payload literal which must always be delimited, a Name_payload has 2 variants, one delimited (quoted_name_str) and one not (nonquoted_name_str). The delimited Name_payload form differs from Text_payload only in that each string segment is delimited by double-quotes rather than apostrophes/single-quotes.

A nonquoted_name_str is composed of a single alphabetic or underscore character followed by zero or more characters that are each alphanumeric or underscore or hyphen. It can not be segmented, so you will have to use the quoted_name_str equivalent if you want a segmented string. The definitions of alphabetic and numeric should include appropriate Unicode characters, but at the moment this isn't reflected in the grammar; TODO: fix this.

A NameChain node represents a canonical long name for invoking a DBMS entity in some contexts; it is conceptually a sequence of entity short names. This node is interpreted as a Muldis D sys.std.Core.Type.Cat.NameChain value as follows: A NameChain_payload consists of a sequence of 2 or more Name_payload where the elements of the sequence are separated by name_chain_elem_sep tokens (.); each element of the sequence, in order, defines an element of the array possrep's attribute of the result NameChain value.

A DeclNameChain node represents a canonical long name for declaring a DBMS entity in N-depth contexts; the format and interpretation of a DeclNameChain_payload (but as a sys.std.Core.Type.Cat.DeclNameChain value) is the same as a NameChain_payload but that the chain may have as few as zero parts rather than as few as 2, and a zero-element chain is represented by the special DeclNameChain_payload syntax of [].

Examples:

Name:login_pass

Name:"First Name"

NameChain:fed.data.the_db.gene.sorted_person_name

NameChain:fed.data.the_db.stats."samples by order"

DeclNameChain:gene.sorted_person_name

DeclNameChain:stats."samples by order"

DeclNameChain:[]

Code Comment Literals

Grammar:

token Comment {
    (
        [
            (Comment) <value_node_elem_sep>
            [<type_name> <value_node_elem_sep>]?
        ]?
        <Comment_payload>
    )
}

token Comment_payload {
    (
        [
            '#' ' '*
                ([ <-[\\\#\t\n\f\r]> | <escaped_char>]*)
            ' '* '#'
        ] ** \s*
    )
}

A Comment node represents the text of a Muldis D code comment; it is a character string type, that is disjoint from both Text and Name. It is interpreted as a Muldis D sys.std.Core.Type.Cat.Comment value as follows:

Fundamentally a Name node is formatted and interpreted like a Text node, and any similarities won't be repeated here. The Comment_payload differs from Text_payload only in that each string segment is delimited by number-signs/hash-marks rather than apostrophes/single-quotes, and also that:

Note that any leading or trailing space (F;20) characters inside the # delimiters of a Comment_payload are also part of the delimiters, and are not part of the selected Comment value; if you want to denote a Comment value with leading or trailing space chars, you must write those space chars in an escaped form such as with \s.

Examples:

Comment:# This does something. #

# So does this. #

TAI Temporal Literals

Grammar:

token Instant {
    (
        (Instant) <value_node_elem_sep>
        [<type_name> <value_node_elem_sep>]?
        <Instant_payload>
    )
}

token Instant_payload {
    <Rat_payload>
}

token Duration {
    (
        (Duration) <value_node_elem_sep>
        [<type_name> <value_node_elem_sep>]?
        <Duration_payload>
    )
}

token Duration_payload {
    <Rat_payload>
}

An Instant node represents a single point in time which is specified in terms of of atomic seconds; it is a rational numeric type, that is disjoint from both Rat and Duration. This node is interpreted as a Muldis D sys.std.Core.Type.Instant value as follows: An Instant_payload is formatted and interpreted in the same way as a Rat_payload.

A Duration node represents a single amount of time (the difference between two instants) which is specified in terms of of atomic seconds; it is a rational numeric type, that is disjoint from both Rat and Instant. This node is interpreted as a Muldis D sys.std.Core.Type.Duration value as follows: A Duration_payload is formatted and interpreted in the same way as a Rat_payload.

Examples:

Instant:1235556432.0

Instant:854309115.0

Duration:3600.0

Duration:-50.0

Duration:3.14159

Duration:1;1011101101*10^-11011

Duration:1/43

UTC and Float Temporal Literals

Grammar:

token UTCInstant {
    (
        (UTC [Instant | DateTime | Date | Time]) <value_node_elem_sep>
        [<type_name> <value_node_elem_sep>]?
        <UTCInstant_payload>
    )
}

token UTCInstant_payload {
    <UTCDuration_payload>
}

token FloatInstant {
    (
        (Float [Instant | DateTime | Date | Time])
            <value_node_elem_sep>
        [<type_name> <value_node_elem_sep>]?
        <FloatInstant_payload>
    )
}

token FloatInstant_payload {
    <UTCDuration_payload>
}

token UTCDuration {
    (
        (UTCDuration) <value_node_elem_sep>
        [<type_name> <value_node_elem_sep>]?
        <UTCDuration_payload>
    )
}

token UTCDuration_payload {
    (
          (<num_max_col_val>) <value_payload_elem_sep>
                (<utc_duration_body>)
        | <d_utc_duration_body>
    )
}

token utc_duration_body {
    <ord_list_open>
        [(<int_body>)? <list_elem_sep>] ** 5
        (<rat_body>)
    <ord_list_close>
}

token d_utc_duration_body {
    <ord_list_open>
        [(<d_int_body>)? <list_elem_sep>] ** 5
        (<d_rat_body>)
    <ord_list_close>
}

A UTCInstant node represents an "instant"/"datetime" value that is affiliated with the UTC time-zone. This node is interpreted as a Muldis D sys.std.Temporal.Type.UTCInstant value whose instant possrep attribute values are defined as follows:

A UTCInstant_payload consists mainly of a bracket-delimited sequence of 6 comma-separated elements, where each element is either a valid numeric literal or is completely absent. The 6 elements correspond in order to the 6 attributes: year, month day, hour, minute, second. For each element that is absent or defined, the corresponding attribute has the nothing or a Single value, respectively. For each of the first 5 elements, when it is defined, it must qualify as a valid body part of an Int node; for the 6th element, when it is defined, it must qualify as a valid body part of a Rat node.

Fundamentally each UTCInstant node element is formatted and interpreted like an Int or Rat node, and any similarities won't be repeated here.

A defined year may be any integer, each of [month, day] must be a positive integer, each of [hour, minute] must be a non-negative integer, and second must be a non-negative rational number. If all 6 attributes are defined, then the new UTCInstant value is also a UTCDateTime; if just the first 3 or last 3 are defined, then the value is not a UTCDateTime but rather a UTCDate or UTCTime, respectively; if any other combination of attributes are defined, then the value is just a UTCInstant and not of any of the other 3 subtypes.

If the value_kind of a value node is UTCDateTime or UTCDate or UTCTime rather than UTCInstant, then the value node is interpreted simply as a UTCInstant node whose type_name is UTCDateTime or UTCDate or UTCTime, and the allowed body is appropriately further restricted.

A FloatInstant node represents an "instant"/"datetime" value that is "floating" / not affiliated with any time-zone. This node is interpreted as a Muldis D sys.std.Temporal.Type.FloatInstant value in an identical fashion to how a UTCInstant node is interpreted, whose format it completely shares. Likewise regarding Float[DateTime|Date|Time].

A UTCDuration node represents a duration value, an amount of time, which is not fixed to any instant in time. This node is interpreted as a Muldis D sys.std.Temporal.Type.UTCDuration value whose duration possrep attribute values are defined as follows:

A UTCDuration_payload consists mainly of a bracket-delimited sequence of 6 comma-separated elements, where each element is either a valid numeric literal or is completely absent. The 6 elements correspond in order to the 6 attributes: years, months days, hours, minutes, seconds. For each element that is absent or defined, the corresponding attribute has the nothing or a Single value, respectively. For each of the first 5 elements, when it is defined, it must qualify as a valid body part of an Int node; for the 6th element, when it is defined, it must qualify as a valid body part of a Rat node.

Mostly a UTCDuration is formatted and interpreted like a UTCInstant node, and any similarities won't be repeated here.

A defined [years, months, days, hours, minutes] may be any integer, and seconds may be any rational number. Currently, UTCDuration has no system-defined subtypes, but that may change later.

Examples:

UTCInstant:[1964,10,16,16,12,47.5] # a UTCDateTime #

UTCInstant:[2002,12,16,,,] # a UTCDate #

UTCInstant:[,,,14,2,29.0] # a UTCTime #

FloatInstant:[2003,4,5,2,,] # min,sec unknown or N/A #

FloatInstant:[1407,,,,,] # just know its sometime in 1407 #

UTCDuration:[3,5,1,6,15,45.000012]

Low Level Integer String Literals

Grammar:

token String {
    (
        (String | BString | OString | UCPString) <value_node_elem_sep>
        [<type_name> <value_node_elem_sep>]?
        <String_payload>
    )
}

token String_payload {
    (
          (<num_max_col_val>) <value_payload_elem_sep> (<string_body>)
        | <d_string_body>
    )
}

token string_body {
    <ord_list_open>
        ([(<int_body>) ** <list_elem_sep>]?)
    <ord_list_close>
}

token d_string_body {
    <ord_list_open>
        ([(<d_int_body>) ** <list_elem_sep>]?)
    <ord_list_close>
}

token ord_list_open { \s* '[' \s* }

token ord_list_close { \s* ']' \s* }

token list_elem_sep { \s* ',' \s* }

A String node represents an integer string value. It is interpreted as a Muldis D sys.std.Core.Type.Cat.String value as follows: A String_payload consists mainly of a bracket-delimited sequence of 0..N elements, where each element must qualify as a valid body part of a Int node, and the new String is conceptually that sequence of integers. Fundamentally each String node element is formatted and interpreted like an Int node, and any similarities won't be repeated here.

Examples:

String:F;[50,65,72,6C] # Unicode codepoints = 'Perl' #

String:[80,101,114,109] # same thing #

COLLECTION VALUE SELECTORS

Note that, with each of the main value selector nodes documented in this main POD section (members of coll_value_selector etc), any occurrences of child expr nodes should be read as being value nodes instead in contexts where instances of the main nodes are being composed beneath value nodes. That is, any expr node options beyond what value options exist are only valid within a depot node or boot_stmt node.

Scalar Selectors

Grammar:

token QScalar {
    (
        (Q? Scalar) <value_node_elem_sep>
        <type_name> <value_node_elem_sep>
        <QScalar_payload>
    )
}

token QScalar_payload {
    (<possrep_name> <value_payload_elem_sep> <possrep_attrs>)
}

token possrep_name { <Name_payload> }

token possrep_attrs { <QTuple_payload> }

A QScalar node represents a literal or selector invocation for a quasi-scalar subtype value. It is interpreted as a Muldis D sys.std.Core.Type.QScalar subtype value whose declared type is specified by the node's (mandatory for QScalar) type_name and whose attributes are defined by the QScalar_payload. The possrep_attrs is interpreted specifically as attributes of the declared type's possrep which is specified by the possrep_name. Each name+expr pair of the possrep_attrs defines a named possrep attribute of the new quasi-scalar; the pair's name and expr specify, respectively, the possrep attribute name, and the possrep attribute value. If the value_kind of a value node is Scalar rather than QScalar, then the value node is interpreted simply as an QScalar node that is appropriately further restricted; the type_name must name a Scalar subtype, and the possrep_attrs must not specify any quasi- typed attribute values.

See also the definition of the catalog data type sys.std.Core.Type.Cat.ScaPRSelExprNodeSet, which is what in general a QScalar node distills to when it is beneath the context of a depot or boot_stmt node, as it describes some semantics.

Examples:

Scalar:sys.std.Core.Type.Rat:float;{
    mantissa => 45207196,
    radix    => 10,
    exponent => 37
}

Scalar:sys.std.Temporal.Type.UTCDateTime:datetime;{
    year   => 2003,
    month  => 10,
    day    => 26,
    hour   => 1,
    minute => 30,
    second => 0.0
}

Scalar:fed.lib.the_db.WeekDay:name;{
    "" => "monday"
}

Scalar:fed.lib.the_db.WeekDay:number;{
    "" => 5
}

Tuple and Database Selectors

Grammar:

token QTuple {
    (
        (Q? Tuple | Database) <value_node_elem_sep>
        [<type_name> <value_node_elem_sep>]?
        <QTuple_payload>
    )
}

token QTuple_payload {
    <nonord_list_open>
        ([(<Name_payload> <pair_elem_sep> <expr>) ** <list_elem_sep>]?)
    <nonord_list_close>
}

token nonord_list_open { \s* '{' \s* }

token nonord_list_close { \s* '}' \s* }

token pair_elem_sep { \s* '=>' \s* }

A QTuple node represents a literal or selector invocation for a quasi-tuple value. It is interpreted as a Muldis D sys.std.Core.Type.QTuple value whose attributes are defined by the QTuple_payload. Each name+expr pair of the QTuple_payload defines a named attribute of the new quasi-tuple; the pair's name and expr specify, respectively, the attribute name, and the attribute value. If the value_kind of a value node is Tuple rather than QTuple, then the value node is interpreted simply as an QTuple node that is appropriately further restricted; the QTuple_payload must not specify any quasi- typed attribute values. If the value_kind is instead Database, then then only Relation attribute values may be specified.

See also the definition of the catalog data type sys.std.Core.Type.Cat.TupSelExprNodeSet, which is what in general a QTuple node distills to when it is beneath the context of a depot or boot_stmt node, as it describes some semantics.

Examples:

Tuple:{}

Tuple:type.tuple_from.var.fed.data.the_db.account.users:{
    login_name => 'hartmark',
    login_pass => 'letmein',
    is_special => true
}

Tuple:{
    name => 'Michelle',
    age  => 17
}

Relation Selectors

Grammar:

token QRelation {
    (
        (Q? Relation) <value_node_elem_sep>
        [<type_name> <value_node_elem_sep>]?
        <QRelation_payload>
    )
}

token QRelation_payload {
      <r_empty_qbody_payload>
    | <r_nonordered_qattr_payload>
    | <r_ordered_qattr_payload>
}

token r_empty_qbody_payload {
    <nonord_list_open>
        ([<Name_payload> ** <list_elem_sep>]?)
    <nonord_list_close>
}

token r_nonordered_qattr_payload {
    <nonord_list_open>
        ([<QTuple_payload> ** <list_elem_sep>]?)
    <nonord_list_close>
}

token r_ordered_qattr_payload {
    (
        <ord_list_open>
            ([<Name_payload> ** <list_elem_sep>]?)
        <ord_list_close>
        <value_payload_elem_sep>
        <nonord_list_open>
            ([<ordered_qtuple_qattrs> ** <list_elem_sep>]?)
        <nonord_list_close>
    )
}

token ordered_qtuple_qattrs {
    <ord_list_open>
        ([<expr> ** <list_elem_sep>]?)
    <ord_list_close>
}

A QRelation node represents a literal or selector invocation for a quasi-relation value. It is interpreted as a Muldis D sys.std.Core.Type.QRelation value whose attributes and tuples are defined by the QRelation_payload, which is interpreted as follows:

Iff the QRelation_payload is composed of just a nonord_list_[open|close] pair with zero elements between them, then it defines the only relation value having zero attributes and zero tuples.

Iff the QRelation_payload is a r_empty_qbody_payload with at least one Name_payload element, then it defines the attribute names of a relation having zero tuples.

Iff the QRelation_payload is a r_nonordered_qattr_payload with at least one <QTuple_payload> element, then each element defines a quasi-tuple of the new quasi-relation; every <QTuple_payload> must define a quasi-tuple of the same degree and have the same attribute names as its sibling <QTuple_payload>; these are the degree and attribute names of the quasi-relation as a whole, which is its heading for the current purposes.

Iff the QRelation_payload is a r_ordered_qattr_payload, then: The new quasi-relation value's attribute names are defined by the Name_payload elements, and the relation body's tuples' attribute values are defined by the ordered_qtuple_qattrs elements. This format is meant to be the most compact of the generic relation selector formats, as the attribute names only appear once for the relation rather than repeating for each tuple. As a trade-off, the attribute values per tuple from all of the ordered_qtuple_attrs elements must appear in the same order as their corresponding attribute names appear in the collection of Name_payload elements, as the names and values in the relation literal are matched up by ordinal position here.

If the value_kind of a value node is Relation rather than QRelation, then the value node is interpreted simply as an QRelation node that is appropriately further restricted; the QRelation_payload must not specify any quasi- typed attribute values.

See also the definition of the catalog data type sys.std.Core.Type.Cat.RelSelExprNodeSet, which is what in general a QRelation node distills to when it is beneath the context of a depot or boot_stmt node, as it describes some semantics.

Examples:

Relation:{}  # zero attrs + zero tuples #

Relation:{ x, y, z }  # 3 attrs + zero tuples #

Relation:{ {} }  # zero attrs + 1 tuple #

Relation:{
    {
        login_name => 'hartmark',
        login_pass => 'letmein',
        is_special => true
    }
}  # 3 attrs + 1 tuple #

Relation:fed.lib.the_db.gene.Person:[ name, age ];{
    [ 'Michelle', 17 ]
}  # 2 attrs + 1 tuple #

Set Selectors

Grammar:

token QSet {
    (
        (Q? Set) <value_node_elem_sep>
        [<type_name> <value_node_elem_sep>]?
        <QSet_payload>
    )
}

token QSet_payload {
    <nonord_list_open>
        ([<expr> ** <list_elem_sep>]?)
    <nonord_list_close>
}

A QSet node represents a literal or selector invocation for a quasi-set value. It is interpreted as a Muldis D sys.std.Core.Type.QSet value whose elements are defined by the QSet_payload. Each expr of the QSet_payload defines a unary quasi-tuple of the new quasi-set; each expr defines the value attribute of the quasi-tuple. If the value_kind of a value node is Set rather than QSet, then the value node is further restricted.

See also the definition of the catalog data type sys.std.Core.Type.Cat.SetSelExprNodeSet, which is what in general a QSet node distills to when it is beneath the context of a depot or boot_stmt node, as it describes some semantics.

Examples:

Set:fed.lib.the_db.account.Country_Names:{
    'Canada',
    'Spain',
    'Jordan',
    'Thailand'
}

Set:{
    3,
    16,
    85
}

Maybe Selectors

Grammar:

token QMaybe {
    (
        (Q? (Maybe | Single)) <value_node_elem_sep>
        [<type_name> <value_node_elem_sep>]?
        <QMaybe_payload>
    )
}

token QMaybe_payload {
    (<maybe_list> | <maybe_nothing>)
}

token maybe_list {
    <nonord_list_open> <expr> <nonord_list_close>
}

token maybe_nothing {
    nothing
}

A QMaybe node represents a literal or selector invocation for a quasi-maybe value. It is interpreted as a Muldis D sys.std.Core.Type.QMaybe value whose elements are defined by the QMaybe_payload.

Iff the QMaybe_payload is a maybe_list then it defines either zero or one expr; in the case of one, the expr defines the unary quasi-tuple of the new quasi-maybe, which is a quasi-'single'; the expr defines the value attribute of the quasi-tuple. If the value_kind of a value node is Maybe or [Q|]Single rather than QMaybe, then the value node is further restricted, either to not having any quasi- resulting expr or to having exactly one expr, as appropriate.

Iff the QMaybe_payload is a maybe_nothing then the QMaybe node is interpreted as the special value Maybe:nothing, aka nothing, which is the only QMaybe value with zero elements. Note that this is just an alternative syntax, as set_expr_list can select that value too. As a further restriction, the value_kind must be just one of [Q|]Maybe when the QMaybe_payload is a maybe_nothing.

See also the definition of the catalog data type sys.std.Core.Type.Cat.SetSelExprNodeSet, which is what in general a QMaybe node distills to same as when QSet does.

Examples:

Maybe:{ 'I know this one!' }

Maybe:nothing

nothing

Array Selectors

Grammar:

token QArray {
    (
        (Q? Array) <value_node_elem_sep>
        [<type_name> <value_node_elem_sep>]?
        <QArray_payload>
    )
}

token QArray_payload {
    <ord_list_open>
        ([<expr> ** <list_elem_sep>]?)
    <ord_list_close>
}

A QArray node represents a literal or selector invocation for a quasi-array value. It is interpreted as a Muldis D sys.std.Core.Type.QArray value whose elements are defined by the QArray_payload. Each expr of the QArray_payload defines a binary quasi-tuple of the new sequence; the expr defines the value attribute of the quasi-tuple, and the index attribute of the quasi-tuple is generated such that the first expr gets an index of zero and subsequent ones get consecutive higher integer values. If the value_kind of a value node is Array rather than QArray, then the value node is further restricted.

See also the definition of the catalog data type sys.std.Core.Type.Cat.ArySelExprNodeSet, which is what in general a QArray node distills to when it is beneath the context of a depot or boot_stmt node, as it describes some semantics.

Examples:

Array:[
    'Alphonse',
    'Edward',
    'Winry'
]

Array:fed.lib.the_db.stats.Samples_By_Order:[
    57,
    45,
    63,
    61
]

Bag Selectors

Grammar:

token QBag {
    (
        (Q? Bag) <value_node_elem_sep>
        [<type_name> <value_node_elem_sep>]?
        <QBag_payload>
    )
}

token QBag_payload {
      <qbag_payload_counted_values>
    | <qbag_payload_repeated_values>
}

token qbag_payload_counted_values {
    <nonord_list_open>
        ([(<expr> <pair_elem_sep> <count>) ** <list_elem_sep>]?)
    <nonord_list_close>
}

token count {
    (
          (<num_max_col_val>) <value_payload_elem_sep> (<pint_body>)
        | <d_pint_body>
    )
}

token qbag_payload_repeated_values {
    <nonord_list_open>
        ([<expr> ** <list_elem_sep>]?)
    <nonord_list_close>
}

A QBag node represents a literal or selector invocation for a quasi-bag value. It is interpreted as a Muldis D sys.std.Core.Type.QBag value whose elements are defined by the QBag_payload, which is interpreted as follows:

Iff the QBag_payload is composed of just a nonord_list_[open|close] pair with zero elements between them, then it defines the only bag value having zero elements.

Iff the QBag_payload is a qbag_payload_counted_values with at least one expr/count-pair element, then each pair defines a binary quasi-tuple of the new quasi-bag; the expr defines the value attribute of the quasi-tuple, and the count defines the count attribute.

Iff the QBag_payload is a qbag_payload_repeated_values with at least one <expr> element, then each expr contributes to a binary quasi-tuple of the new quasi-bag; the expr defines the value attribute of the quasi-tuple. The quasi-bag has 1 quasi-tuple for every distinct (after normalization or evaluation) expr and expr-derived value in the QBag_payload, and the count attribute of that quasi-tuple says how many instances of said value there were.

See also the definition of the catalog data type sys.std.Core.Type.Cat.BagSelExprNodeSet, which is what in general a QBag node distills to when it is beneath the context of a depot or boot_stmt node, as it describes some semantics.

Examples:

Bag:fed.lib.the_db.inventory.Fruit:{
    'Apple'  => 500,
    'Orange' => 300,
    'Banana' => 400
}

Bag:{
    'Foo',
    'Quux',
    'Foo',
    'Bar',
    'Baz',
    'Baz'
}

DEPOT DECLARATION

TODO: ALL OF THIS HERE MAIN POD SECTION!

Grammar:

token depot {
    (depot)
    <nonord_list_open>
        <depot_catalog>
        <depot_data>?
    <nonord_list_close>
}

BOOTLOADER STATEMENT

TODO/REDO: ALL OF THIS HERE MAIN POD SECTION!

Grammar:

token boot_stmt {
    (
          <generic_boot_stmt>
        | ...
    )
}

token generic_boot_stmt
    (boot_stmt)
    <value_node_elem_sep>
    <imperative_routine_name>
    <value_node_elem_sep>
    <imperative_routine_upd_args>
    <value_node_elem_sep>
    <imperative_routine_ro_args>
}

token imperative_routine_name { <NameChain_payload> }

token imperative_routine_upd_args {
    <nonord_list_open>
        [[<Name_payload> <pair_elem_sep> <NameChain_payload>]
            ** <list_elem_sep>]?
    <nonord_list_close>
}

token imperative_routine_ro_args { <QTuple_payload> }

Examples:

boot_stmt:sys.std.Core.Cat.create_depot_procedure:{}:{ ... }

MULDIS D TINY DIALECT PRAGMAS

TODO/REDO: ALL OF THIS HERE MAIN POD SECTION!

All of the following pragmas apply to both the PTMD_Tiny and HDMD_Perl[6|5]_Tiny dialects, and have the same semantics with both.

auto_add_attrs

All Muldis D values are defined in terms of a collection of attribute values, and there is no such thing as an attribute being undefined; normally when one selects a value of a particular attribute-based type, they must supply values for all of its attributes; this is true with values comprising the system catalog as with any other values. Code written in the Muldis D PTMD_Tiny or HDMD_Perl[6|5]_Tiny dialect is comprised almost entirely of value literals, and by default all of the attribute values of said values must be explicitly given in the literals as sub-literals, even in the common case where some attributes just have the default values for their type.

While this fact allows for parsers to be very simple and for sub-literals to be compilable into values without knowing the context they're compiled into, it means that programmers would have to write maybe about twice as much code as they otherwise would if they could simply not write out the default-valued attributes.

If the 5th Extensions portion of the fully-qualified Muldis D language name contains a name+value pair of auto_add_attrs + Bool:true, then this activates the optional auto_add_attrs pragma, which provides one kind of automatic code completion. When auto_add_attrs is active, programmers may omit any literal attributes that they want, and those attributes will be automatically defined by the parser to have the default values for their type. Or more specifically, the wider literal whose attributes are missing will be extended to become the default value of the type of the wider literal but that those attributes of its that were explicitly given will override the default's values for those attributes. The actual behaviour is essentially what the sys.std.QTuple.subst_in_default function does.

But the auto_add_attrs pragma is not simply an automatically invoked pre-processing Muldis D function, because it also serves the common case where one is defining relation literals that have different attributes specified per tuple; such a thing by itself isn't even valid as a generic relation, so it certainly can't be given to a Muldis D function; so the pragma has at least that advantage unique to itself.

Note that the lexer is exactly the same regardless of whether the auto_add_attrs pragma is turned on or off, because the matters of missing attributes were never tested or enforced at the lexical level in the first place; rather the pragma only affects the parsing stage that follows the lexing. In other words, the actual syntax or grammar is identical regardless of the setting of this pragma.

Now one consequence of using the auto_add_attrs pragma is that in general the parser must be more complicated, and read type definitions from the DBMS information schema so that it knows what attributes each literal is supposed to have, and their declared types, and also sub-literals can no longer in general be fully converted to values in isolation; now the parent-most literal must be evaluated first, because its declared type generally determines the declared types of its attributes, and then their attributes recursively. For nonscalar types, the initial declared type being looked at is the declared type of the bootloader-invoked routine's parameter that the literal is being given to as an argument.

Now if the declared type of said parameter is just a generic type, such as Relation or Array, then often no information can be gleaned from this context for what attributes should exist, and so you will need to make the arg literal include treat-as-type metadata that explicitly provides the specific type information needed; otherwise, auto_add_attrs won't help you and you must then fully define relation values with the same attributes per tuple. But fortunately for brevity, a lot of the places where auto_add_attrs would help you the most is when the bootloader is invoking system-defined data-defining procedures, and their parameters are all of attribute-specifying types, and it is in such data definition that you may be most likely to face a large number of default-valued attributes, such as comment.

Note that the reason the auto_add_attrs behaviour is turned off by default is twofold. First, the parser can be a lot simpler / more tiny with it off. Second, requiring users to explicitly define even default-valued attributes can make the code more self-documenting and can help users avoid some kinds of bugs due to action from unseen values, or due to some default values "silently" changing between language versions. So then essentially, turning on auto_add_attrs means the programmer is telling the parser "I know what I'm doing" by explicitly asking for potentially less-safe behaviour. Of course, even with auto_add_attrs turned on, you can still explicitly define attribute values that are their type's default values, so it is possible to compromise such as you like.

Also note that it should be trivial for a Muldis D implementation to let users input code written with auto_add_attrs turned on, and then output the version of that code for their perusal with it turned off, so they can see what extra values were filled in without having to manually write said.

auto_unabbrev_std_names

Normally when one is specifying a NameChain literal that is a reference to a standard system-defined type or routine, they must write out the name in full, starting with sys.std and so on through the unique part of the entity name. While this allows for clearly self-documenting code, as well as for relatively simple parsers, it can also be added tedium to programmers that would prefer to write out the names in a less verbose manner, especially since to a point, a slightly more complicated parser could still unambiguously resolve a much shorter substring of the name.

If the 5th Extensions portion of the fully-qualified Muldis D language name contains a name+value pair of auto_unabbrev_std_names + Bool:true, then this activates the optional auto_unabbrev_std_names pragma, which provides one kind of automatic code completion. When auto_unabbrev_std_names is active, programmers may omit any number of consecutive leading chain elements from such a NameChain literal, so long as the remaining unqualified chain is distinct among all standard system-defined (sys.std-prefix) DBMS entities (but that as an exception, a non-distinct abbreviation is allowed iff exactly 1 of the candidate entities is in the language core, sys.std.Core-prefix, in which case that 1 is unambiguously the entity that is resolved to). This feature has no effect on the namespace prefixes like tuple_from or array_of; one still writes those as normal prepended to the otherwise shortened chains.

So for example, one can just write Int rather than sys.std.Core.Type.Int, is_identical rather than sys.std.Core.Universal.is_identical, QTuple.attr rather than sys.std.Core.QTuple.attr, min rather than sys.std.Ordered.min, array_of.Rat rather than array_of.sys.std.Core.Type.Rat, and so on.

The auto_unabbrev_std_names pragma intentionally does not empower auto un-abbreviations of any namespaces other than sys.std, to keep things simple for users to predict and for systems to implement; it does not affect sys.[imp|cat], nor any other top-level namespace. When one is referencing either any system-defined implementation-specific (non-standard) types or routines, or any user-defined types or routines, or any dbvars or constraints or whatever, their names can not be written abbreviated due to the auto_unabbrev_std_names pragma.

Note that the lexer is exactly the same regardless of whether the auto_unabbrev_std_names pragma is turned on or off, as per the auto_add_attrs pragma. Many other comments about the other pragma also apply to this one.

auto_chains_from_names

Iff both the auto_add_attrs and auto_unabbrev_std_names pragmas are active, then the optional auto_chains_from_names dependent pragma may be activated in the same manner (as an Extensions name+value pair with Bool:true. When auto_chains_from_names is active, programmers may write an otherwise abbreviated-to-one-chain-element NameChain literal as a plain Name literal; this can chop the literal down to a third or fourth of its otherwise-length such as in the case of a reference to the Int type. When the parent literal of such a faux-Name literal is examined for missing attributes, or examined that existing attributes are of the correct type, any attributes whose declared type says they are supposed to be NameChain but that have an explicitly defined Name child literal will have that literal mapped to and replaced with a single element NameChain literal, which can be subsequently un-abbreviated into a standard system-defined type or routine name. The auto_chains_from_names pragma will not work when the declared type being applied to a faux-Name is not a NameChain subtype, and such literals will then be taken as actual Name; where such declared type information is missing, you will need to write out the abbreviated chain as an actual NameChain literal. Note that the auto_chains_from_names pragma has no effect on NameChain literal bodies that don't comprise the payload portion of their parent literal, such as with the imperative routine name composed into a boot_stmt literal; literal bodies in those positions will always be interpreted according to NameChain literal body syntax.

SEE ALSO

Go to Muldis::D for the majority of distribution-internal references, and Muldis::D::SeeAlso for the majority of distribution-external references.

AUTHOR

Darren Duncan (perl@DarrenDuncan.net)

LICENSE AND COPYRIGHT

This file is part of the formal specification of the Muldis D language.

Muldis D is Copyright © 2002-2009, Muldis Data Systems, Inc.

See the LICENSE AND COPYRIGHT of Muldis::D for details.

TRADEMARK POLICY

The TRADEMARK POLICY in Muldis::D applies to this file too.

ACKNOWLEDGEMENTS

The ACKNOWLEDGEMENTS in Muldis::D apply to this file too.