NAME
Muldis::D::Dialect::PTMD_Tiny - How to format Plain Text Muldis D
VERSION
This document is Muldis::D::Dialect::PTMD_Tiny version 0.66.0.
PREFACE
This document is part of the Muldis D language specification, whose root document is Muldis::D; you should read that root document before you read this one, which provides subservient details.
DESCRIPTION
This document outlines the grammar of the Plain Text Muldis D dialect named PTMD_Tiny
. The fully-qualified name of this Muldis D dialect, in combination with the base language spec it is bundled with, is Muldis_D:"http://muldis.com":"N.N.N":PTMD_Tiny
(when the bundled base language version is substituted for the N.N.N
).
This dialect is designed to exactly match the Muldis D system catalog (the possible representation of Muldis D code that is visible to or updateable by Muldis D programs at runtime) as to what non-critical meta-data it explicitly stores; so code in the PTMD_Tiny
dialect should be round-trippable with the system catalog with the result maintaining all the details that were started with. Since it matches the system catalog, this dialect should be able to exactly represent all possible Muldis D base language code (and probably all extensions too), rather than a subset of it. That said, the PTMD_Tiny
dialect does provide a choice of multiple syntax options for writing Muldis D value literals and DBMS entity (eg type and routine) declarations, so several very distinct PTMD_Tiny
code artifacts may parse into the same system catalog entries. There is even a considerable level of abstraction in some cases, so that it is easier for programmers to write and understand typical PTMD_Tiny
code, and so that this code isn't absurdly verbose.
This dialect is designed to be as small as possible while meeting the above criteria, and is designed such that a parser that handles all of this dialect can be tiny, hence the dialect's Tiny
name. Likewise, a code generator for this dialect from the system catalog can be tiny.
A significant quality of the PTMD_Tiny
dialect is that it is designed to work easily for a single-pass parser, or at least a single-pass lexer; all the context that one needs to know for how to parse or lex any arbitrary substring of code is provided by prior code, or any required lookahead is just by a few characters in general. Therefore, a PTMD_Tiny
parser can easily work on a streaming input like a file-handle where you can't go back earlier in the stream. Often this means a parser can work with little RAM.
Also the dialect is designed that any amount of whitespace can be added or omitted next to most non-alphanumeric characters (which happen to be next to alphanumeric tokens) without that affecting the meaning of the code at all, except obviously for within character string literals. And long binary or character or numeric or identifier strings can be split into arbitrary-size substrings, without affecting the meaning. And many elements are identified by name rather than ordinal position, so to some degree the order they appear has no effect on the meaning. So programmers can easily format (separate, indent, linewrap, order) code how they like, and making an automated code reformatter shouldn't be difficult. Often, named elements can also be omitted entirely for brevity, in which case the parser would use context to supply default values for those elements.
Given that plain text is (more or less) universally unambiguously portable between all general purpose languages that could be used to implement a DBMS, it is expected that every single Muldis D implementation will natively accept input in the PTMD_Tiny
dialect, which isn't dependent on any specific host language and should be easy enough to process, so it should be considered the safest official Muldis D dialect to write in by default, when you don't have a specific reason to use some other dialect.
See also the dialects HDMD_Perl6_Tiny and HDMD_Perl5_Tiny, which are derived directly from PTMD_Tiny
, and represent possible Perl 6 and 5 concrete syntax trees for it; in fact, most of the details in common with those other dialects are described just in the current file, for all 3 dialects.
GENERAL STRUCTURE
A PTMD_Tiny
Muldis D code file consists just of a full or partial Muldis D bootloader
routine definition, which begins with a language name declaration, and otherwise is simply an ordered sequence of imperative routine calls, where earlier routine calls are to system-defined data-definition routines (their arguments are values to put in the system catalog), and later ones are then to user-defined routines that the earlier statements either loaded or defined. This is conceptually what a PTMD_Tiny
file is, and it can even be that literally, but PTMD_Tiny
provides a canonical further abstraction which should be used when doing data-definition. And so you typically use syntax resembling routine and type declarations in a general purpose programming language, where simply declaring such an entity will cause it to be written into the system catalog for subsequent use.
The grammar in this file is formatted as a Perl 6 grammar (see http://perlcabal.org/syn/S05.html for details) which could be used to parse it, but it is only meant to be illustrative, and would need further additions or changes to actually function in Perl 6. The grammar consists mainly of named tokens which define matching rules. A token's name is declared as a bareword following the keyword token
and it is subsequently referenced with '<' and '>'
delimiters. Any other bareword in a token definition consisting of alphanumerics is matched literally, and all non-quoted whitespace is not significant. Any pairs of parenthesis ('(' and ')'
) in token definitions are capturing groups, and each parser match by a pair corresponds to a capture node or node element in the concrete syntax tree resulting from the parse.
The root grammar token for the entire dialect is bootloader
.
BOOTLOADER
Grammar:
token bootloader {
(
<language_name>
[
<value>
| <depot>
| <boot_stmt>+
]?
)
}
A bootloader
node has 1..N ordered elements where the first element is a language_name
node and then either: 1. there is exactly one (second) element that is a value
node or a depot
node; 2. there are 1..N ordered elements where each is a boot_stmt
node; 3. there are no other elements, making the bootloader a no-op.
See the pod sections in this file named "LANGUAGE NAME", "VALUE LITERALS AND SELECTORS", "DEPOT DECLARATION", and "BOOTLOADER STATEMENT" for more details about the aforementioned tokens/nodes.
When Muldis D is being compiled and invoked piecemeal, such as because the Muldis D implementing virtual machine (VM) is attached to an interactive user terminal, or the VM is embedded in a host language where code in the host language invokes Muldis D code at various times, the conceptual bootloader
is usually split up, and so not every Muldis D code fragment would then have its own language_name
. Usually a language_name
would be supplied to the Muldis D VM just once as a VM configuration step, which provides a context for further interaction with the VM that just involves Muldis D code that isn't itself qualified with a language_name
.
LANGUAGE NAME
Grammar:
token language_name {
(
<ln_base_name>
<value_node_elem_sep>
<ln_base_authority>
<value_node_elem_sep>
<ln_base_version_number>
<value_node_elem_sep>
<ln_dialect>
<value_node_elem_sep>
<ln_extensions>
)
}
token ln_base_name { (Muldis_D) }
token ln_base_authority { <Name_payload> }
token ln_base_version_number { <Name_payload> }
token ln_dialect { (PTMD_Tiny) }
token ln_extensions { <QTuple_payload> }
As per the VERSIONING pod section of Muldis::D, code written in Muldis D must start by declaring the fully-qualified Muldis D language name it is written in. The PTMD_Tiny
dialect formats this name as a language_name
node having 5 ordered elements:
ln_base_name
-
This is the Muldis D language base name; it is simply the bareword character string
Muldis_D
. -
This is the base authority; it is a character string formatted as per a specific-context
Name
value literal; it is typically the delimited character stringhttp://muldis.com
. ln_base_version_number
-
This is the base version number; it is a character string formatted as per
ln_base_authority
; it is typically a character string like1.2.3
. ln_dialect
-
This is the dialect name; it is simply the bareword character string
PTMD_Tiny
. ln_extensions
-
This is a set of chosen pragma/parser-config options as per a
Tuple
SCVL; see the "MULDIS D TINY DIALECT PRAGMAS" pod section for more details.
Examples:
Muldis_D:"http://muldis.com":"1.2.3":PTMD_Tiny:{}
Muldis_D:"http://muldis.com":"1.2.3":PTMD_Tiny:{
auto_add_attrs => true,
auto_unabbrev_std_names => true,
auto_chains_from_names => true
}
VALUE LITERALS AND SELECTORS
Grammar:
token value {
<opaque_value_literal>
| <coll_value_selector>
}
token opaque_value_literal {
<Nothing>
| <Bool>
| <Order>
| <RatRoundMeth>
| <Int>
| <Rat>
| <Blob>
| <Text>
| <Name>
| <NameChain>
| <DeclNameChain>
| <Comment>
| <Instant>
| <Duration>
| <UTCInstant>
| <FloatInstant>
| <UTCDuration>
| <String>
}
token coll_value_selector {
<QScalar>
| <QTuple>
| <QRelation>
| <QSet>
| <QSingle>
| <QArray>
| <QBag>
}
A value
node is a Muldis D value literal, which is a common special case of a Muldis D value selector.
Unlike value selectors in general, which must be composed beneath a depot
or boot_stmt
because they actually represent a Muldis D value expression tree of a function or updater or type definition, a value
node does not represent an expression tree, but rather a value constant; by definition, a value
can be completely evaluated at compile time. A bootloader
consisting directly of a value
is hence just a serialized Muldis D value.
The PTMD_Tiny grammar subsection for value literals (having the root grammar token value
) is completely self-defined and can be used in isolation from the wider grammar as a Muldis D sub-language; for example, a hosted-data Muldis D implementation may have an object representing a Muldis D value, which is initialized using code written in that sub-language.
Every grammar token, and corresponding capture node, representing a Muldis D value literal is similarly formatted and has 1-3 elements; the following pod section "Value Literal Common Elements" describes the similarities once for all of them, in terms of an alternate value
token definition which is called x_value
. And then the other pod sections specific to each kind of value literal then just focus on describing their unique aspects, namely their payloads.
An opaque_value_literal
node represents a conceptually opaque Muldis D value, such that every one of these values is defined with its own literal syntax that is compact and doesn't look like a collection of other nodes; this includes the basic numeric and string literals.
A coll_value_selector
node represents a conceptually transparent Muldis D value, such that every one of these values is defined visibly in terms of a collection of other nodes; this includes the basic tuple and relation selectors.
Value Literal Common Elements
A generic context value literal (or GCVL) is a value literal that can be properly interpreted in a context that is expecting a value but has no expectation that said value belongs to a specific data type; in the general case, a GCVL includes explicit value kind meta-data (such as, "this is an Int
" or "this is a Name
"); but with a few specific data types (see the value_kind
node description for details) that meta-data may be omitted for brevity because the main literal has mutually uniquely identifying characteristics. For example, each element of a generic Muldis D collection value, such as a member of an array or tuple, could potentially have any type at all. In contrast, a specific context value literal (or SCVL) is a value literal that does not include explicit value kind meta-data, even when the main literal doesn't have uniquely identifying characteristics, because the context of its use supplies said meta-data. For example, in a tuple value literal it is assumed that a value literal in an attribute name position must denote a Name
. The grammar token value
|x_value
denotes a GCVL, as do most short-named grammar tokens, like Int
or Name
; in contrast, a grammar token containing payload
denotes a SCVL, like Int_payload
or Name_payload
.
Every GCVL has 1-3 elements, illustrated by this grammar:
token x_value {
(
<value_kind>
[
<value_node_elem_sep>
[<type_name> <value_node_elem_sep>]?
<payload>
]?
| <payload>
)
}
token value_node_elem_sep { \s* ':' \s* }
token value_kind {
(
Nothing
| Bool
| Order
| RatRoundMeth
| Int | NNInt | PInt
| Rat | NNRat | PRat
| Blob | OctetBlob
| Text
| Name
| NameChain
| DeclNameChain
| Comment
| Instant
| Duration
| UTC [Instant | DateTime | Date | Time]
| Float [Instant | DateTime | Date | Time]
| UTCDuration
| String | BString | OString | UCPString
| Q? Scalar
| Q? Tuple | Database
| Q? Relation
| Q? Set
| Q? Single
| Q? Array
| Q? Bag
)
}
token type_name { <NameChain_payload> }
token payload {
<Bool_payload>
| <Order_payload>
| <RatRoundMeth_payload>
| <Int_payload>
| <Rat_payload>
| <Blob_payload>
| <Text_payload>
| <Name_payload>
| <NameChain_payload>
| <DeclNameChain_payload>
| <Comment_payload>
| <Instant_payload>
| <Duration_payload>
| <UTCInstant_payload>
| <FloatInstant_payload>
| <UTCDuration_payload>
| <String_payload>
| <QScalar_payload>
| <QTuple_payload>
| <QRelation_payload>
| <QSet_payload>
| <QSingle_payload>
| <QArray_payload>
| <QBag_payload>
}
So a x_value
|value
node has 1-3 elements in general:
value_kind
-
This is a character string of the format
<[A..Z]> <[ a..z A..Z ]>+
; it identifies the data type of the value literal in broad terms and is the only external meta-data ofpayload
generally necessary to interpret the latter; what grammars are valid forpayload
depend just onvalue_kind
.With just the 8 data types [
Bool
,Int
,Blob
,Text
,Rat
,Comment
,Order
,RatRoundMeth
], thevalue_kind
portion of a GCVL may be omitted for brevity, but the code parser should still be able to infer it easily by examining the first few characters of thepayload
, which for each of said 8 data types has a mutually uniquely identifying format, which is also distinct from all possiblevalue_kind
. Note that omission ofvalue_kind
is only allowed when the GCVL doesn't include atype_name
element. type_name
-
This is a Muldis D data type name, for example
sys.std.Core.Type.Int
; it identifies a specific subtype of the generic type denoted byvalue_kind
, and serves as an assertion that the Muldis D value denoted bypayload
is a member of the named subtype. Iffvalue_kind
is[Q|]Scalar
thentype_name
is mandatory; otherwise,type_name
is optional for allvalue
, except thattype_name
must be omitted whenvalue_kind
is one of the 4 [Bool
,Nothing
,Order
,RatRoundMeth
]; this isn't because those 4 types can't be subtyped, but because in practice doing so isn't useful. payload
-
This is mandatory for all
value
except whenvalue_kind
isNothing
, in which case it must be omitted; sinceNothing
is a singleton type, it isn't useful to say anything for that value other than thevalue_kind
.
For GCVL and SCVL examples, see the subsequent documentation sections.
OPAQUE VALUE LITERALS
The Nothing Literal
Grammar:
token Nothing {
(Nothing)
}
A Nothing
node represents a 'nothing' value; it is interpreted as a Muldis D sys.std.Core.Type.Nothing
, which is a singleton type. It is simply the bareword character string Nothing
. Nothing literals are distinct from all other value literals in that, besides being a singleton, its SCVL form is identical to its GCVL form.
Examples:
Nothing
Boolean Literals
Grammar:
token Bool {
(
[(Bool) <value_node_elem_sep>]?
<Bool_payload>
)
}
token Bool_payload {
(false | true)
}
A Bool
node represents a logical boolean value. It is interpreted as a Muldis D sys.std.Core.Type.Bool
value as follows: The Bool_payload
is a bareword character string formatted as per a Name
SCVL, and it maps directly to the name
possrep of the Bool
type.
Examples:
Bool:true
false
Order-Determination Literals
Grammar:
token Order {
(
[(Order) <value_node_elem_sep>]?
<Order_payload>
)
}
token Order_payload {
(increase | same | decrease)
}
An Order
node represents an order-determination. It is interpreted as a Muldis D sys.std.Core.Type.Cat.Order
value as follows: The Order_payload
is a bareword character string formatted as per a Name
SCVL, and it maps directly to the name
possrep of the Order
type.
Examples:
Order:same
decrease
Rational Rounding Method Literals
Grammar:
token RatRoundMeth {
(
[(RatRoundMeth) <value_node_elem_sep>]?
<RatRoundMeth_payload>
)
}
token RatRoundMeth_payload {
(
half_down | half_up
| half_even
| to_floor | to_ceiling
| to_zero | to_inf
)
}
A RatRoundMeth
node represents a rational rounding method. It is interpreted as a Muldis D sys.std.Core.Type.Cat.RatRoundMeth
value as follows: The RatRoundMeth_payload
is a bareword character string formatted as per a Name
SCVL, and it maps directly to the only possrep of the RatRoundMeth
type.
Examples:
RatRoundMeth:half_up
to_zero
General Purpose Integer Numeric Literals
Grammar:
token Int {
(
[
(Int | NNInt | PInt) <value_node_elem_sep>
[<type_name> <value_node_elem_sep>]?
]?
<Int_payload>
)
}
token Int_payload {
(
(<num_max_col_val>) <value_payload_elem_sep> (<int_body>)
| <d_int_body>
)
}
token value_payload_elem_sep { \s* ';' \s* }
token num_max_col_val { <pint_head> }
token int_body { [0 | \-?<pint_body>] }
token nnint_body { [0 | <pint_body>] }
token pint_body { <pint_head> <pint_tail>? }
token pint_head { <[ 1..9 A..Z ]> }
token pint_tail { [[_?<[ 0..9 A..Z ]>+]+] ** <segment_sep> }
token segment_sep { \s* '~' \s* }
token d_int_body { [0 | \-?<d_pint_body>] }
token d_nnint_body { [0 | <d_pint_body>] }
token d_pint_body { <d_pint_head> <d_pint_tail>? }
token d_pint_head { <[ 1..9 ]> }
token d_pint_tail { [[_?<[ 0..9 ]>+]+] ** <segment_sep> }
An Int
node represents an integer numeric value. It is interpreted as a Muldis D sys.std.Core.Type.Int
value as follows:
If the Int_payload
is composed of a num_max_col_val
plus int_body
, then the int_body
is interpreted as a base-N integer where N might be between 2 and 36, and the num_max_col_val
says which possible value of N to use. Assuming all int_body
column values are between zero and N-minus-one, the num_max_col_val
contains that N-minus-one. So to specify, eg, bases [2,8,10,16], use num_max_col_val
of [1,7,9,F].
If the Int_payload
is a d_int_body
, then it is interpreted as a base 10 integer.
Fundamentally the body part of an Int
node consists of a string of digits and uppercased (but not lowercased) letters, where each digit (0..9
) represents its own number and each letter (A..Z
) represents a number in [10..35]. A body may optionally contain underscore characters (_
), which exist just to help with visual formatting, such as for 10_000_000
, and these are ignored/stripped by the parser. A body may optionally be split into 1..N segments where each segment is separated by a tilde token (~
); this segmenting ability is provided to support code that contains very long numeric literals while still being well formatted (no extra long lines); the tilde tokens are also ignored/stripped by the parser, and the body is interpreted as if all its alphanumeric characters were contiguous.
If the value_kind
of a value
node is NNInt
or PInt
rather than Int
, then the value
node is interpreted simply as an Int
node whose type_name
is NNInt
or PInt
, and the allowed body is appropriately further restricted.
Examples:
Int:1;11001001
7;0
7;644
-34
42
F;DEADBEEF
Z;-HELLOWORLD
3;301
B;A09B
General Purpose Rational Numeric Literals
Grammar:
token Rat {
(
[
(Rat | NNRat | PRat) <value_node_elem_sep>
[<type_name> <value_node_elem_sep>]?
]?
<Rat_payload>
)
}
token Rat_payload {
(
(<num_max_col_val>) <value_payload_elem_sep> (<rat_body>)
| <d_rat_body>
)
}
token rat_body {
<int_body>\.<pint_tail>
| (<int_body>) \s* \/ \s* (<pint_body>)
| (<int_body>) \s* \* \s* (<pint_body>) \s* \^ \s* (<int_body>)
}
token nnrat_body {
<nnint_body>\.<pint_tail>
| (<nnint_body>) \s* \/ \s* (<pint_body>)
| (<nnint_body>) \s* \* \s* (<pint_body>) \s* \^ \s* (<int_body>)
}
token d_rat_body {
<d_int_body>\.<d_pint_tail>
| (<d_int_body>) \s* \/ \s* (<d_pint_body>)
| (<d_int_body>) \s* \* \s* (<d_pint_body>)
\s* \^ \s* (<d_int_body>)
}
token d_nnrat_body {
<d_nnint_body>\.<d_pint_tail>
| (<d_nnint_body>) \s* \/ \s* (<d_pint_body>)
| (<d_nnint_body>) \s* \* \s* (<d_pint_body>)
\s* \^ \s* (<d_int_body>)
}
A Rat
node represents a rational numeric value. It is interpreted as a Muldis D sys.std.Core.Type.Rat
value as follows:
Fundamentally a Rat
node is formatted and interpreted like an Int
node, and any similarities won't be repeated here. The differences of interpreting a Rat_payload
being composed of a num_max_col_val
plus rat_body
versus the Rat_payload
being a d_rat_body
are as per the corresponding differences of interpreting an Int_payload
. Also interpreting a NNRat
or PRat
is as per a NNInt
or PInt
.
If the body part of a Rat
node contains a radix point (.
), then it is interpreted as is usual for a programming language with such a literal.
If the body part of a Rat
node contains a solidus (/
), then the rational's value is interpreted as the leading integer (a numerator) divided by the trailing positive integer (a denominator); that is, the two integers collectively map to the ratio
possrep of the Rat
type.
If the body part of a Rat
node contains a asterisk (*
) plus a circumflex accent (^
), then the rational's value is interpreted as the leading integer (a mantissa) multiplied by the result of the middle positive integer (a radix) taken to the power of the trailing integer (an exponent); that is, the three integers collectively map to the float
possrep of the Rat
type.
Examples:
Rat:1;-1.1
-1.5
3.14159
A;0.0
F;DEADBEEF.FACE
Z;0.000AZE
Rat:6;500001/1000
B;A09B/A
Rat:1;1011101101*10^-11011
45207196*10^37
1/43
314159*10^-5
General Purpose Binary String Literals
Grammar:
token Blob {
(
[
(Blob | OctetBlob) <value_node_elem_sep>
[<type_name> <value_node_elem_sep>]?
]?
<Blob_payload>
)
}
token Blob_payload {
((<blob_max_col_val>) <value_payload_elem_sep> (<blob_body>))
}
token blob_max_col_val { <[137F]> }
token blob_body {
[
<[']>
(<[ 0..9 A..F ]>*)
<[']>
] ** <segment_sep>
}
A Blob
node represents a general purpose bit string. It is interpreted as a Muldis D sys.std.Core.Type.Blob
value as follows: Fundamentally the body part of a Blob
node consists of a delimited string of digits and uppercased (but not lowercased) letters, where each digit (0..9
) represents its own number and each letter (A..F
) represents a number in [10..15]; this string is qualified with a blob_max_col_val
character ([137F]
), similarly to how an int_body
is qualified by a num_max_col_val
. Each character of the delimited string specifies a sequence of one of [1,2,3,4] bits, depending on whether blob_max_col_val
is [1,3,7,F]. If the value_kind
of a value
node is OctetBlob
rather than Blob
, then the value
node is interpreted simply as an Blob
node whose type_name
is OctetBlob
, and the delimited string is appropriately further restricted.
Examples:
Blob:1;'00101110100010'
3;''
F;'A705E'
7;'523504376'
General Purpose Character String Literals
Grammar:
token Text {
(
[
(Text) <value_node_elem_sep>
[<type_name> <value_node_elem_sep>]?
]?
<Text_payload>
)
}
token Text_payload {
(
[
<[']>
([ <-[\\\'\t\n\f\r]> | <escaped_char>]*)
<[']>
] ** <segment_sep>
)
}
token escaped_char {
'\b' | '\a' | '\q' | '\h'
| '\s'
| '\t' | '\n' | '\f' | '\r'
| '\c<' [
[<[ A..Z ]>+] ** ' '
| [0 | <[ 1..9 ]> <[ 0..9 ]>*]
| <[ 1..9 A..Z ]> ';' [0 | <[ 1..9 A..Z ]> <[ 0..9 A..Z ]>*]
] '>'
}
A Text
node represents a general purpose character string. It is interpreted as a Muldis D sys.std.Core.Type.Text
value as follows:
The Text_payload
is interpreted generally as is usual for a programming language with such a delimited character string literal.
A Text_payload
may optionally be split into 1..N segments where each segment is delimited by apostrophes/single-quotes ('
) and separated by a tilde token (~
); this segmenting ability is provided to support code that contains long string literals while still being well formatted (no extra long lines); the tilde tokens and adjoining string delimiters are ignored/stripped by the parser, and the Text_payload
is interpreted as if it just consisted of a single delimited string.
All Muldis D delimited character string literals (generally the 3 Text
, Name
, Comment
) may contain some characters denoted with escape sequences rather than literally. The Muldis D parser would substitute the escape sequences with the characters they represent, so the resulting character string values don't contain those escape sequences. Currently there are 2 classes of escape sequences, called simple and complex.
The meanings of the simple escape sequences are:
Esc | Unicode | Unicode | Chr | Literal character used
Seq | Codepoint | Character Name | Lit | for when not escaped
----+-----------+-----------------+-----+------------------------------
\b | F;5C | REVERSE SOLIDUS | \ | esc seq lead (aka backslash)
\a | F;27 | APOSTROPHE | ' | delim Text literals
\q | F;22 | QUOTATION MARK | " | delim quoted Name literals
\h | F;23 | NUMBER SIGN | # | delim Comment lit (aka hash)
\s | F;20 | SPACE | | space char
\t | F;9 | CHAR... TAB... | | control char horizontal tab
\n | F;A | LINE FEED (LF) | | ctrl char line feed / newline
\f | F;C | FORM FEED (FF) | | control char form feed
\r | F;D | CARR. RET. (CR) | | control char carriage return
One design decision of PTMD_Tiny that is distinct from typical other languages is that an escape sequence for any character used as a delimiter never contains that literal character. For example, while in SQL or Perl character strings delimited by '
, they typically escape literal apostrophes/single-quotes as ''
or \'
; while this is unambiguous, the task of parsing such code is considerably more difficult than it could be. In contrast, while in PTMD_Tiny character strings delimited by '
, a literal of the same is escaped with \a
; so parsing such code is an order of magnitude easier because the parser doesn't have to understand the internals of any character string literal in order to separate out the character string from its surrounding code.
Another design decision of PTMD_Tiny that is distinct at least from Perl is that non-"space" whitespace characters in character string literals must never appear literally, but must instead be denoted with escape sequences. The main reason for this is to ensure that the actual values being selected by the string literals were not variable per the kind of linebreaks used to format the Muldis D source code itself.
There is currently just one complex escape sequence, of the format \c<...>
, that supports specifying characters in terms of their Unicode codepoint name or number. If the ...
consists of just uppercased (not lowercased) letters and the space character, then the ...
is interpreted as a Unicode character name. If the ...
looks like an Int_payload
, sans that underscores and tilde segmentation aren't allowed here, then the ...
is interpreted as a Unicode codepoint number. One reason for this feature is to empower specifying exactly which sequence of codepoints you want for a particular grapheme (which text editors tend to abstract away when you write characters literally), and also it is to empower more elegant passing of Unicode-savvy PTMD_Tiny source code through a communications channel that is more limited, such as to 7-bit ASCII.
Examples:
Text:'Ceres'
'サンプル'
''
'Perl'
'\c<LATIN SMALL LETTER OU>\c<F;263A>\c<65>'
DBMS Entity Name Literals
Grammar:
token Name {
(
(Name) <value_node_elem_sep>
[<type_name> <value_node_elem_sep>]?
<Name_payload>
)
}
token Name_payload {
(<nonquoted_name_str> | <quoted_name_str>)
}
token nonquoted_name_str { <[ a..z A..Z _ ]><[ a..z A..Z 0..9 _ - ]>* }
token quoted_name_str {
[
<["]>
([ <-[\\\"\t\n\f\r]> | <escaped_char>]*)
<["]>
] ** <segment_sep>
}
token NameChain {
(
(NameChain) <value_node_elem_sep>
[<type_name> <value_node_elem_sep>]?
<NameChain_payload>
)
}
token NameChain_payload {
(<Name_payload> [<name_chain_elem_sep> <Name_payload>]+)
}
token name_chain_elem_sep { \s* '.' \s* }
token DeclNameChain {
(
(DeclNameChain) <value_node_elem_sep>
[<type_name> <value_node_elem_sep>]?
<DeclNameChain_payload>
)
}
token DeclNameChain_payload {
(
<Name_payload> ** <name_chain_elem_sep>
| <ord_list_open> <ord_list_close>
)
}
A Name
node represents a canonical short name for any kind of DBMS entity when declaring it; it is a character string type, that is disjoint from Text
. It is interpreted as a Muldis D sys.std.Core.Type.Cat.Name
value as follows:
Fundamentally a Name
node is formatted and interpreted like a Text
node, and any similarities won't be repeated here. Unlike a Text_payload
literal which must always be delimited, a Name_payload
has 2 variants, one delimited (quoted_name_str
) and one not (nonquoted_name_str
). The delimited Name_payload
form differs from Text_payload
only in that each string segment is delimited by double-quotes rather than apostrophes/single-quotes.
A nonquoted_name_str
is composed of a single alphabetic or underscore character followed by zero or more characters that are each alphanumeric or underscore or hyphen. It can not be segmented, so you will have to use the quoted_name_str
equivalent if you want a segmented string. The definitions of alphabetic and numeric should include appropriate Unicode characters, but at the moment this isn't reflected in the grammar; TODO: fix this.
A NameChain
node represents a canonical long name for invoking a DBMS entity in some contexts; it is conceptually a sequence of entity short names. This node is interpreted as a Muldis D sys.std.Core.Type.Cat.NameChain
value as follows: A NameChain_payload
consists of a sequence of 2 or more Name_payload
where the elements of the sequence are separated by name_chain_elem_sep
tokens (.
); each element of the sequence, in order, defines an element of the array
possrep's attribute of the result NameChain
value.
A DeclNameChain
node represents a canonical long name for declaring a DBMS entity in N-depth contexts; the format and interpretation of a DeclNameChain_payload
(but as a sys.std.Core.Type.Cat.DeclNameChain
value) is the same as a NameChain_payload
but that the chain may have as few as zero parts rather than as few as 2, and a zero-element chain is represented by the special DeclNameChain_payload
syntax of []
.
Examples:
Name:login_pass
Name:"First Name"
NameChain:fed.data.the_db.gene.sorted_person_name
NameChain:fed.data.the_db.stats."samples by order"
DeclNameChain:gene.sorted_person_name
DeclNameChain:stats."samples by order"
DeclNameChain:[]
Code Comment Literals
Grammar:
token Comment {
(
[
(Comment) <value_node_elem_sep>
[<type_name> <value_node_elem_sep>]?
]?
<Comment_payload>
)
}
token Comment_payload {
(
[
'#' ' '*
([ <-[\\\#\t\n\f\r]> | <escaped_char>]*)
' '* '#'
] ** \s*
)
}
A Comment
node represents the text of a Muldis D code comment; it is a character string type, that is disjoint from both Text
and Name
. It is interpreted as a Muldis D sys.std.Core.Type.Cat.Comment
value as follows:
Fundamentally a Name
node is formatted and interpreted like a Text
node, and any similarities won't be repeated here. The Comment_payload
differs from Text_payload
only in that each string segment is delimited by number-signs/hash-marks rather than apostrophes/single-quotes, and also that:
Note that any leading or trailing space (F;20) characters inside the #
delimiters of a Comment_payload
are also part of the delimiters, and are not part of the selected Comment
value; if you want to denote a Comment
value with leading or trailing space chars, you must write those space chars in an escaped form such as with \s
.
Examples:
Comment:# This does something. #
# So does this. #
TAI Temporal Literals
Grammar:
token Instant {
(
(Instant) <value_node_elem_sep>
[<type_name> <value_node_elem_sep>]?
<Instant_payload>
)
}
token Instant_payload {
<Rat_payload>
}
token Duration {
(
(Duration) <value_node_elem_sep>
[<type_name> <value_node_elem_sep>]?
<Duration_payload>
)
}
token Duration_payload {
<Rat_payload>
}
An Instant
node represents a single point in time which is specified in terms of of atomic seconds; it is a rational numeric type, that is disjoint from both Rat
and Duration
. This node is interpreted as a Muldis D sys.std.Core.Type.Instant
value as follows: An Instant_payload
is formatted and interpreted in the same way as a Rat_payload
.
A Duration
node represents a single amount of time (the difference between two instants) which is specified in terms of of atomic seconds; it is a rational numeric type, that is disjoint from both Rat
and Instant
. This node is interpreted as a Muldis D sys.std.Core.Type.Duration
value as follows: A Duration_payload
is formatted and interpreted in the same way as a Rat_payload
.
Examples:
Instant:1235556432.0
Instant:854309115.0
Duration:3600.0
Duration:-50.0
Duration:3.14159
Duration:1;1011101101*10^-11011
Duration:1/43
UTC and Float Temporal Literals
Grammar:
token UTCInstant {
(
(UTC [Instant | DateTime | Date | Time]) <value_node_elem_sep>
[<type_name> <value_node_elem_sep>]?
<UTCInstant_payload>
)
}
token UTCInstant_payload {
(
(<num_max_col_val>) <value_payload_elem_sep>
(<utc_instant_body>)
| <d_utc_instant_body>
)
}
token utc_instant_body {
<ord_list_open>
(<int_body>)? <list_elem_sep>
[(<pint_body>)? <list_elem_sep>] ** 2
[(<nnint_body>)? <list_elem_sep>] ** 2
(<nnrat_body>)?
<ord_list_close>
}
token d_utc_instant_body {
<ord_list_open>
(<d_int_body>)? <list_elem_sep>
[(<d_pint_body>)? <list_elem_sep>] ** 2
[(<d_nnint_body>)? <list_elem_sep>] ** 2
(<d_nnrat_body>)?
<ord_list_close>
}
token FloatInstant {
(
(Float [Instant | DateTime | Date | Time])
<value_node_elem_sep>
[<type_name> <value_node_elem_sep>]?
<FloatInstant_payload>
)
}
token FloatInstant_payload {
<UTCInstant_payload>
}
token UTCDuration {
(
(UTCDuration) <value_node_elem_sep>
[<type_name> <value_node_elem_sep>]?
<UTCDuration_payload>
)
}
token UTCDuration_payload {
(
(<num_max_col_val>) <value_payload_elem_sep>
(<utc_duration_body>)
| <d_utc_duration_body>
)
}
token utc_duration_body {
<ord_list_open>
[(<int_body>)? <list_elem_sep>] ** 5
(<rat_body>)
<ord_list_close>
}
token d_utc_duration_body {
<ord_list_open>
[(<d_int_body>)? <list_elem_sep>] ** 5
(<d_rat_body>)
<ord_list_close>
}
A UTCInstant
node represents an "instant"/"datetime" value that is affiliated with the UTC time-zone. This node is interpreted as a Muldis D sys.std.Temporal.Type.UTCInstant
value whose instant
possrep attribute values are defined as follows:
A UTCInstant_payload
consists mainly of a bracket-delimited sequence of 6 comma-separated elements, where each element is either a valid numeric literal or is completely absent. The 6 elements correspond in order to the 6 attributes: year
, month
day
, hour
, minute
, second
. For each element that is absent or defined, the corresponding attribute has a Nothing
or a Single
value, respectively. For each of the first 5 elements, when it is defined, it must qualify as a valid body part of an Int
node; for the 6th element, when it is defined, it must qualify as a valid body part of a Rat
node.
Fundamentally each UTCInstant
node element is formatted and interpreted like an Int
or Rat
node, and any similarities won't be repeated here.
A defined year
may be any integer, each of [month
, day
] must be a positive integer, each of [hour
, minute
] must be a non-negative integer, and second
must be a non-negative rational number. If all 6 attributes are defined, then the new UTCInstant
value is also a UTCDateTime
; if just the first 3 or last 3 are defined, then the value is not a UTCDateTime
but rather a UTCDate
or UTCTime
, respectively; if any other combination of attributes are defined, then the value is just a UTCInstant
and not of any of the other 3 subtypes.
If the value_kind
of a value
node is UTCDateTime
or UTCDate
or UTCTime
rather than UTCInstant
, then the value
node is interpreted simply as a UTCInstant
node whose type_name
is UTCDateTime
or UTCDate
or UTCTime
, and the allowed body is appropriately further restricted.
A FloatInstant
node represents an "instant"/"datetime" value that is "floating" / not affiliated with any time-zone. This node is interpreted as a Muldis D sys.std.Temporal.Type.FloatInstant
value in an identical fashion to how a UTCInstant
node is interpreted, whose format it completely shares. Likewise regarding Float[DateTime|Date|Time]
.
A UTCDuration
node represents a duration value, an amount of time, which is not fixed to any instant in time. This node is interpreted as a Muldis D sys.std.Temporal.Type.UTCDuration
value whose duration
possrep attribute values are defined as follows:
A UTCDuration_payload
consists mainly of a bracket-delimited sequence of 6 comma-separated elements, where each element is either a valid numeric literal or is completely absent. The 6 elements correspond in order to the 6 attributes: years
, months
days
, hours
, minutes
, seconds
. For each element that is absent or defined, the corresponding attribute has a Nothing
or a Single
value, respectively. For each of the first 5 elements, when it is defined, it must qualify as a valid body part of an Int
node; for the 6th element, when it is defined, it must qualify as a valid body part of a Rat
node.
Mostly a UTCDuration
is formatted and interpreted like a UTCInstant
node, and any similarities won't be repeated here.
A defined [years
, months
, days
, hours
, minutes
] may be any integer, and seconds
may be any rational number. Currently, UTCDuration
has no system-defined subtypes, but that may change later.
Examples:
UTCInstant:[1964,10,16,16,12,47.5]
UTCInstant:[2002,12,16,,,]
UTCInstant:[,,,14,2,29.0]
FloatInstant:[2003,4,5,2,,]
FloatInstant:[1407,,,,,]
UTCDuration:[3,5,1,6,15,45.000012]
Low Level Integer String Literals
Grammar:
token String {
(
(String | BString | OString | UCPString) <value_node_elem_sep>
[<type_name> <value_node_elem_sep>]?
<String_payload>
)
}
token String_payload {
(
(<num_max_col_val>) <value_payload_elem_sep> (<string_body>)
| <d_string_body>
)
}
token string_body {
<ord_list_open>
([(<int_body>) ** <list_elem_sep>]?)
<ord_list_close>
}
token d_string_body {
<ord_list_open>
([(<d_int_body>) ** <list_elem_sep>]?)
<ord_list_close>
}
token ord_list_open { \s* '[' \s* }
token ord_list_close { \s* ']' \s* }
token list_elem_sep { \s* ',' \s* }
A String
node represents an integer string value. It is interpreted as a Muldis D sys.std.Core.Type.Cat.String
value as follows: A String_payload
consists mainly of a bracket-delimited sequence of 0..N elements, where each element must qualify as a valid body part of a Int
node, and the new String
is conceptually that sequence of integers. Fundamentally each String
node element is formatted and interpreted like an Int
node, and any similarities won't be repeated here.
Examples:
String:F;[50,65,72,6C] # Unicode codepts = 'Perl' #
String:[80,101,114,109] # same thing #
COLLECTION VALUE SELECTORS
Note that, with each of the main value selector nodes documented in this main POD section (members of coll_value_selector
etc), any occurrences of child expr
nodes should be read as being value
nodes instead in contexts where instances of the main nodes are being composed beneath value
nodes. That is, any expr
node options beyond what value
options exist are only valid within a depot
node or boot_stmt
node.
Scalar Selectors
Grammar:
token QScalar {
(
(Q? Scalar) <value_node_elem_sep>
<type_name> <value_node_elem_sep>
<QScalar_payload>
)
}
token QScalar_payload {
(<possrep_name> <value_payload_elem_sep> <possrep_attrs>)
}
token possrep_name { <Name_payload> }
token possrep_attrs { <QTuple_payload> }
A QScalar
node represents a literal or selector invocation for a quasi-scalar subtype value. It is interpreted as a Muldis D sys.std.Core.Type.QScalar
subtype value whose declared type is specified by the node's (mandatory for QScalar
) type_name
and whose attributes are defined by the QScalar_payload
. The possrep_attrs
is interpreted specifically as attributes of the declared type's possrep which is specified by the possrep_name
. Each name+expr pair of the possrep_attrs
defines a named possrep attribute of the new quasi-scalar; the pair's name and expr specify, respectively, the possrep attribute name, and the possrep attribute value. If the value_kind
of a value
node is Scalar
rather than QScalar
, then the value
node is interpreted simply as an QScalar
node that is appropriately further restricted; the type_name
must name a Scalar
subtype, and the possrep_attrs
must not specify any quasi- typed attribute values.
See also the definition of the catalog data type sys.std.Core.Type.Cat.ScaPRSelExprNodeSet
, which is what in general a QScalar
node distills to when it is beneath the context of a depot
or boot_stmt
node, as it describes some semantics.
Examples:
Scalar:sys.std.Core.Type.Rat:float;{
mantissa => 45207196,
radix => 10,
exponent => 37
}
Scalar:sys.std.Temporal.Type.UTCDateTime:datetime;{
year => 2003,
month => 10,
day => 26,
hour => 1,
minute => 30,
second => 0.0
}
Scalar:fed.lib.the_db.WeekDay:name;{
"" => "monday"
}
Scalar:fed.lib.the_db.WeekDay:number;{
"" => 5
}
Tuple and Database Selectors
Grammar:
token QTuple {
(
(Q? Tuple | Database) <value_node_elem_sep>
[<type_name> <value_node_elem_sep>]?
<QTuple_payload>
)
}
token QTuple_payload {
<nonord_list_open>
([(<Name_payload> <pair_elem_sep> <expr>) ** <list_elem_sep>]?)
<nonord_list_close>
}
token nonord_list_open { \s* '{' \s* }
token nonord_list_close { \s* '}' \s* }
token pair_elem_sep { \s* '=>' \s* }
A QTuple
node represents a literal or selector invocation for a quasi-tuple value. It is interpreted as a Muldis D sys.std.Core.Type.QTuple
value whose attributes are defined by the QTuple_payload
. Each name+expr pair of the QTuple_payload
defines a named attribute of the new quasi-tuple; the pair's name and expr specify, respectively, the attribute name, and the attribute value. If the value_kind
of a value
node is Tuple
rather than QTuple
, then the value
node is interpreted simply as an QTuple
node that is appropriately further restricted; the QTuple_payload
must not specify any quasi- typed attribute values. If the value_kind
is instead Database
, then then only Relation
attribute values may be specified.
See also the definition of the catalog data type sys.std.Core.Type.Cat.TupSelExprNodeSet
, which is what in general a QTuple
node distills to when it is beneath the context of a depot
or boot_stmt
node, as it describes some semantics.
Examples:
Tuple:{}
Tuple:type.tuple_from.var.fed.data.the_db.account.users:{
login_name => 'hartmark',
login_pass => 'letmein',
is_special => true
}
Tuple:{
name => 'Michelle',
age => 17
}
Relation Selectors
Grammar:
token QRelation {
(
(Q? Relation) <value_node_elem_sep>
[<type_name> <value_node_elem_sep>]?
<QRelation_payload>
)
}
token QRelation_payload {
<r_empty_qbody_payload>
| <r_nonordered_qattr_payload>
| <r_ordered_qattr_payload>
}
token r_empty_qbody_payload {
<nonord_list_open>
([<Name_payload> ** <list_elem_sep>]?)
<nonord_list_close>
}
token r_nonordered_qattr_payload {
<nonord_list_open>
([<QTuple_payload> ** <list_elem_sep>]?)
<nonord_list_close>
}
token r_ordered_qattr_payload {
(
<ord_list_open>
([<Name_payload> ** <list_elem_sep>]?)
<ord_list_close>
<value_payload_elem_sep>
<nonord_list_open>
([<ordered_qtuple_qattrs> ** <list_elem_sep>]?)
<nonord_list_close>
)
}
token ordered_qtuple_qattrs {
<ord_list_open>
([<expr> ** <list_elem_sep>]?)
<ord_list_close>
}
A QRelation
node represents a literal or selector invocation for a quasi-relation value. It is interpreted as a Muldis D sys.std.Core.Type.QRelation
value whose attributes and tuples are defined by the QTuple_payload
, which is interpreted as follows:
Iff the QRelation_payload
is composed of just a nonord_list_[open|close]
pair with zero elements between them, then it defines the only relation value having zero attributes and zero tuples.
Iff the QRelation_payload
is a r_empty_qbody_payload
with at least one Name_payload
element, then it defines the attribute names of a relation having zero tuples.
Iff the QRelation_payload
is a r_nonordered_qattr_payload
with at least one <QTuple_payload> element, then each element defines a quasi-tuple of the new quasi-relation; every <QTuple_payload> must define a quasi-tuple of the same degree and have the same attribute names as its sibling <QTuple_payload>; these are the degree and attribute names of the quasi-relation as a whole, which is its heading for the current purposes.
Iff the QRelation_payload
is a r_ordered_qattr_payload
, then: The new quasi-relation value's attribute names are defined by the Name_payload
elements, and the relation body's tuples' attribute values are defined by the ordered_qtuple_qattrs
elements. This format is meant to be the most compact of the generic relation selector formats, as the attribute names only appear once for the relation rather than repeating for each tuple. As a trade-off, the attribute values per tuple from all of the ordered_qtuple_attrs
elements must appear in the same order as their corresponding attribute names appear in the collection of Name_payload
elements, as the names and values in the relation literal are matched up by ordinal position here.
If the value_kind
of a value
node is Relation
rather than QRelation
, then the value
node is interpreted simply as an QRelation
node that is appropriately further restricted; the QRelation_payload
must not specify any quasi- typed attribute values.
See also the definition of the catalog data type sys.std.Core.Type.Cat.RelSelExprNodeSet
, which is what in general a QRelation
node distills to when it is beneath the context of a depot
or boot_stmt
node, as it describes some semantics.
Examples:
Relation:{}
Relation:{ x, y, z }
Relation:{ {} }
Relation:{
{
login_name => 'hartmark',
login_pass => 'letmein',
is_special => true
}
}
Relation:fed.lib.the_db.gene.Person:[ name, age ];{
[ 'Michelle', 17 ]
}
Set Selectors
Grammar:
token QSet {
(
(Q? Set) <value_node_elem_sep>
[<type_name> <value_node_elem_sep>]?
<QSet_payload>
)
}
token QSet_payload {
<nonord_list_open>
([<expr> ** <list_elem_sep>]?)
<nonord_list_close>
}
A QSet
node represents a literal or selector invocation for a quasi-set value. It is interpreted as a Muldis D sys.std.Core.Type.QSet
value whose elements are defined by the QSet_payload
. Each expr
of the QSet_payload
defines a unary quasi-tuple of the new quasi-set; each expr
defines the value
attribute of the quasi-tuple. If the value_kind
of a value
node is Set
rather than QSet
, then the value
node is further restricted.
See also the definition of the catalog data type sys.std.Core.Type.Cat.SetSelExprNodeSet
, which is what in general a QSet
node distills to when it is beneath the context of a depot
or boot_stmt
node, as it describes some semantics.
Examples:
Set:fed.lib.the_db.account.Country_Names:{
'Canada',
'Spain',
'Jordan',
'Thailand'
}
Set:{
3,
16,
85
}
Single Selectors
Grammar:
token QSingle {
(
(Q? Single) <value_node_elem_sep>
[<type_name> <value_node_elem_sep>]?
<QSingle_payload>
)
}
token QSingle_payload {
<nonord_list_open>
<expr>
<nonord_list_close>
}
A QSingle
node represents a literal or selector invocation for a quasi-'single' value. It is interpreted as a Muldis D sys.std.Core.Type.QSingle
value whose element is defined by the QSingle_payload
. The payload defines the value
attribute of the single quasi-tuple of the new quasi-'single'. If the value_kind
of a value
node is Single
rather than QSingle
, then the value
node is further restricted.
See also the definition of the catalog data type sys.std.Core.Type.Cat.SetSelExprNodeSet
, which is what in general a QSingle
node distills to same as when QSet
does.
Examples:
Single:{ 'I know this one!' }
Array Selectors
Grammar:
token QArray {
(
(Q? Array) <value_node_elem_sep>
[<type_name> <value_node_elem_sep>]?
<QArray_payload>
)
}
token QArray_payload {
<ord_list_open>
([<expr> ** <list_elem_sep>]?)
<ord_list_close>
}
A QArray
node represents a literal or selector invocation for a quasi-array value. It is interpreted as a Muldis D sys.std.Core.Type.QArray
value whose elements are defined by the QArray_payload
. Each expr
of the QArray_payload
defines a binary quasi-tuple of the new sequence; the expr
defines the value
attribute of the quasi-tuple, and the index
attribute of the quasi-tuple is generated such that the first expr
gets an index
of zero and subsequent ones get consecutive higher integer values. If the value_kind
of a value
node is Array
rather than QArray
, then the value
node is further restricted.
See also the definition of the catalog data type sys.std.Core.Type.Cat.ArySelExprNodeSet
, which is what in general a QArray
node distills to when it is beneath the context of a depot
or boot_stmt
node, as it describes some semantics.
Examples:
Array:[
'Alphonse',
'Edward',
'Winry'
]
Array:fed.lib.the_db.stats.Samples_By_Order:[
57,
45,
63,
61
]
Bag Selectors
Grammar:
token QBag {
(
(Q? Bag) <value_node_elem_sep>
[<type_name> <value_node_elem_sep>]?
<QBag_payload>
)
}
token QBag_payload {
<qbag_payload_counted_values>
| <qbag_payload_repeated_values>
}
token qbag_payload_counted_values {
<nonord_list_open>
([(<expr> <pair_elem_sep> <count>) ** <list_elem_sep>]?)
<nonord_list_close>
}
token count {
(
(<num_max_col_val>) <value_payload_elem_sep> (<pint_body>)
| <d_pint_body>
)
}
token qbag_payload_repeated_values {
<nonord_list_open>
([<expr> ** <list_elem_sep>]?)
<nonord_list_close>
}
A QBag
node represents a literal or selector invocation for a quasi-bag value. It is interpreted as a Muldis D sys.std.Core.Type.QBag
value whose elements are defined by the QBag_payload
, which is interpreted as follows:
Iff the QBag_payload
is composed of just a nonord_list_[open|close]
pair with zero elements between them, then it defines the only bag value having zero elements.
Iff the QBag_payload
is a qbag_payload_counted_values
with at least one expr
/count
-pair element, then each pair defines a binary quasi-tuple of the new quasi-bag; the expr
defines the value
attribute of the quasi-tuple, and the count
defines the count
attribute.
Iff the QBag_payload
is a qbag_payload_repeated_values
with at least one <expr> element, then each expr
contributes to a binary quasi-tuple of the new quasi-bag; the expr
defines the value
attribute of the quasi-tuple. The quasi-bag has 1 quasi-tuple for every distinct (after normalization or evaluation) expr
and expr
-derived value in the QBag_payload
, and the count
attribute of that quasi-tuple says how many instances of said value
there were.
See also the definition of the catalog data type sys.std.Core.Type.Cat.BagSelExprNodeSet
, which is what in general a QBag
node distills to when it is beneath the context of a depot
or boot_stmt
node, as it describes some semantics.
Examples:
Bag:fed.lib.the_db.inventory.Fruit:{
'Apple' => 500,
'Orange' => 300,
'Banana' => 400
}
Bag:{
'Foo',
'Quux',
'Foo',
'Bar',
'Baz',
'Baz'
}
DEPOT DECLARATION
TODO: ALL OF THIS HERE MAIN POD SECTION!
Grammar:
token depot {
(depot)
<nonord_list_open>
<depot_catalog>
<depot_data>?
<nonord_list_close>
}
BOOTLOADER STATEMENT
TODO/REDO: ALL OF THIS HERE MAIN POD SECTION!
Grammar:
token boot_stmt {
(
<generic_boot_stmt>
| ...
)
}
token generic_boot_stmt
(boot_stmt)
<value_node_elem_sep>
<imperative_routine_name>
<value_node_elem_sep>
<imperative_routine_upd_args>
<value_node_elem_sep>
<imperative_routine_ro_args>
}
token imperative_routine_name { <NameChain_payload> }
token imperative_routine_upd_args {
<nonord_list_open>
[[<Name_payload> <pair_elem_sep> <NameChain_payload>]
** <list_elem_sep>]?
<nonord_list_close>
}
token imperative_routine_ro_args { <QTuple_payload> }
Examples:
boot_stmt:sys.std.Core.Cat.create_depot_procedure:{}:{ ... }
MULDIS D TINY DIALECT PRAGMAS
TODO/REDO: ALL OF THIS HERE MAIN POD SECTION!
All of the following pragmas apply to both the PTMD_Tiny
and HDMD_Perl[6|5]_Tiny
dialects, and have the same semantics with both.
auto_add_attrs
All Muldis D values are defined in terms of a collection of attribute values, and there is no such thing as an attribute being undefined; normally when one selects a value of a particular attribute-based type, they must supply values for all of its attributes; this is true with values comprising the system catalog as with any other values. Code written in the Muldis D PTMD_Tiny
or HDMD_Perl[6|5]_Tiny
dialect is comprised almost entirely of value literals, and by default all of the attribute values of said values must be explicitly given in the literals as sub-literals, even in the common case where some attributes just have the default values for their type.
While this fact allows for parsers to be very simple and for sub-literals to be compilable into values without knowing the context they're compiled into, it means that programmers would have to write maybe about twice as much code as they otherwise would if they could simply not write out the default-valued attributes.
If the 5th Extensions portion of the fully-qualified Muldis D language name contains a name+value pair of auto_add_attrs
+ Bool:true
, then this activates the optional auto_add_attrs
pragma, which provides one kind of automatic code completion. When auto_add_attrs
is active, programmers may omit any literal attributes that they want, and those attributes will be automatically defined by the parser to have the default values for their type. Or more specifically, the wider literal whose attributes are missing will be extended to become the default value of the type of the wider literal but that those attributes of its that were explicitly given will override the default's values for those attributes. The actual behaviour is essentially what the sys.std.QTuple.subst_in_default
function does.
But the auto_add_attrs
pragma is not simply an automatically invoked pre-processing Muldis D function, because it also serves the common case where one is defining relation literals that have different attributes specified per tuple; such a thing by itself isn't even valid as a generic relation, so it certainly can't be given to a Muldis D function; so the pragma has at least that advantage unique to itself.
Note that the lexer is exactly the same regardless of whether the auto_add_attrs
pragma is turned on or off, because the matters of missing attributes were never tested or enforced at the lexical level in the first place; rather the pragma only affects the parsing stage that follows the lexing. In other words, the actual syntax or grammar is identical regardless of the setting of this pragma.
Now one consequence of using the auto_add_attrs
pragma is that in general the parser must be more complicated, and read type definitions from the DBMS information schema so that it knows what attributes each literal is supposed to have, and their declared types, and also sub-literals can no longer in general be fully converted to values in isolation; now the parent-most literal must be evaluated first, because its declared type generally determines the declared types of its attributes, and then their attributes recursively. For nonscalar types, the initial declared type being looked at is the declared type of the bootloader-invoked routine's parameter that the literal is being given to as an argument.
Now if the declared type of said parameter is just a generic type, such as Relation
or Array
, then often no information can be gleaned from this context for what attributes should exist, and so you will need to make the arg literal include treat-as-type metadata that explicitly provides the specific type information needed; otherwise, auto_add_attrs
won't help you and you must then fully define relation values with the same attributes per tuple. But fortunately for brevity, a lot of the places where auto_add_attrs
would help you the most is when the bootloader is invoking system-defined data-defining procedures, and their parameters are all of attribute-specifying types, and it is in such data definition that you may be most likely to face a large number of default-valued attributes, such as comment
.
Note that the reason the auto_add_attrs
behaviour is turned off by default is twofold. First, the parser can be a lot simpler / more tiny with it off. Second, requiring users to explicitly define even default-valued attributes can make the code more self-documenting and can help users avoid some kinds of bugs due to action from unseen values, or due to some default values "silently" changing between language versions. So then essentially, turning on auto_add_attrs
means the programmer is telling the parser "I know what I'm doing" by explicitly asking for potentially less-safe behaviour. Of course, even with auto_add_attrs
turned on, you can still explicitly define attribute values that are their type's default values, so it is possible to compromise such as you like.
Also note that it should be trivial for a Muldis D implementation to let users input code written with auto_add_attrs
turned on, and then output the version of that code for their perusal with it turned off, so they can see what extra values were filled in without having to manually write said.
auto_unabbrev_std_names
Normally when one is specifying a NameChain
literal that is a reference to a standard system-defined type or routine, they must write out the name in full, starting with sys.std
and so on through the unique part of the entity name. While this allows for clearly self-documenting code, as well as for relatively simple parsers, it can also be added tedium to programmers that would prefer to write out the names in a less verbose manner, especially since to a point, a slightly more complicated parser could still unambiguously resolve a much shorter substring of the name.
If the 5th Extensions portion of the fully-qualified Muldis D language name contains a name+value pair of auto_unabbrev_std_names
+ Bool:true
, then this activates the optional auto_unabbrev_std_names
pragma, which provides one kind of automatic code completion. When auto_unabbrev_std_names
is active, programmers may omit any number of consecutive leading chain elements from such a NameChain
literal, so long as the remaining unqualified chain is distinct among all standard system-defined (sys.std
-prefix) DBMS entities (but that as an exception, a non-distinct abbreviation is allowed iff exactly 1 of the candidate entities is in the language core, sys.std.Core
-prefix, in which case that 1 is unambiguously the entity that is resolved to). This feature has no effect on the namespace prefixes like tuple_from
or array_of
; one still writes those as normal prepended to the otherwise shortened chains.
So for example, one can just write Int
rather than sys.std.Core.Type.Int
, is_identical
rather than sys.std.Core.Universal.is_identical
, QTuple.attr
rather than sys.std.Core.QTuple.attr
, min
rather than sys.std.Ordered.min
, array_of.Rat
rather than array_of.sys.std.Core.Type.Rat
, and so on.
The auto_unabbrev_std_names
pragma intentionally does not empower auto un-abbreviations of any namespaces other than sys.std
, to keep things simple for users to predict and for systems to implement; it does not affect sys.[imp|cat]
, nor any other top-level namespace. When one is referencing either any system-defined implementation-specific (non-standard) types or routines, or any user-defined types or routines, or any dbvars or constraints or whatever, their names can not be written abbreviated due to the auto_unabbrev_std_names
pragma.
Note that the lexer is exactly the same regardless of whether the auto_unabbrev_std_names
pragma is turned on or off, as per the auto_add_attrs
pragma. Many other comments about the other pragma also apply to this one.
auto_chains_from_names
Iff both the auto_add_attrs
and auto_unabbrev_std_names
pragmas are active, then the optional auto_chains_from_names
dependent pragma may be activated in the same manner (as an Extensions name+value pair with Bool:true
. When auto_chains_from_names
is active, programmers may write an otherwise abbreviated-to-one-chain-element NameChain
literal as a plain Name
literal; this can chop the literal down to a third or fourth of its otherwise-length such as in the case of a reference to the Int
type. When the parent literal of such a faux-Name
literal is examined for missing attributes, or examined that existing attributes are of the correct type, any attributes whose declared type says they are supposed to be NameChain
but that have an explicitly defined Name
child literal will have that literal mapped to and replaced with a single element NameChain
literal, which can be subsequently un-abbreviated into a standard system-defined type or routine name. The auto_chains_from_names
pragma will not work when the declared type being applied to a faux-Name
is not a NameChain
subtype, and such literals will then be taken as actual Name
; where such declared type information is missing, you will need to write out the abbreviated chain as an actual NameChain
literal. Note that the auto_chains_from_names
pragma has no effect on NameChain
literal bodies that don't comprise the payload portion of their parent literal, such as with the imperative routine name composed into a boot_stmt
literal; literal bodies in those positions will always be interpreted according to NameChain
literal body syntax.
SEE ALSO
Go to Muldis::D for the majority of distribution-internal references, and Muldis::D::SeeAlso for the majority of distribution-external references.
AUTHOR
Darren Duncan (perl@DarrenDuncan.net
)
LICENSE AND COPYRIGHT
This file is part of the formal specification of the Muldis D language.
Muldis D is Copyright © 2002-2009, Muldis Data Systems, Inc.
See the LICENSE AND COPYRIGHT of Muldis::D for details.
TRADEMARK POLICY
The TRADEMARK POLICY in Muldis::D applies to this file too.
ACKNOWLEDGEMENTS
The ACKNOWLEDGEMENTS in Muldis::D apply to this file too.