NAME

Muldis::DB::Language - Design document of the Muldis D language

DESCRIPTION

The native command language of a Muldis::DB DBMS (database management system) / virtual machine is called Muldis D; this document, Muldis::DB::Language ("Language"), is the human readable authoritative design document for that language, and for the Muldis::DB virtual machine in which it executes. If there's a conflict between any other document and this one, then either the other document is in error, or the developers were negligent in updating it before Language, so you can yell at them.

Muldis D is intended to qualify as a "D" language as defined by "Databases, Types, and The Relational Model: The Third Manifesto" (TTM), a formal proposal for a solid foundation for data and database management systems, written by Christopher J. Date and Hugh Darwen; see http://www.aw-bc.com/catalog/academic/product/0,1144,0321399420,00.html for a publishers link to the book that formally publishes TTM. See http://www.thethirdmanifesto.com/ for some references to what TTM is, and also copies of some documents that were used in writing Muldis D.

It should be noted that Muldis D, being quite new, may omit some features that are mandatory for a "D" language initially, to speed the way to a useable partial solution, but any omissions will be corrected later. Also, it contains some features that go beyond the scope of a "D" language, so Muldis D is technically a "D plus extra"; examples of this are constructs for creating the databases themselves and managing connections to them.

Muldis D also incorporates design aspects and constructs that are taken from or influenced by Perl 6, other general-purpose languages (particularly functional ones like Haskell), Tutorial D, various D implementations, and various SQL implementations (see the Muldis::DB::SeeAlso file). It has also been suggested that some of the Muldis D design is like that of the Ada language, which I didn't really know anything about before writing it; this bears further investigation. Moreover, it has also been suggested by a different person that the Lua language design is very similar, in different ways, and from a glance, it certainly seems to be so.

In any event, the Muldis::DB documentation will be focusing mainly on how Muldis::DB itself works, and will spend little time in providing rationale; you can read the aforementioned external documentation for much of that.

This documentation is pending.

NOTES ON TERMINOLOGY

There are a few terms that the Muldis::DB documentation uses which may have different meanings than what you may be used to, so here are a few notes to clarify what they mean in this document. Similarly, there are some terms used in the industry that are expressly not used here so to help avoid confusion given what meaning is often attributed to them.

type / data type

The term type as a noun always refers to a data type; the term is not used to indicate classifications of other things; eg, kind or other terms will be used for such instead, to avoid confusion. The terms class and domain are not used in this documentation to mean type.

value, variable, constant

A value is unique, eternal, immutable, and is not fixed in time or space (it has no address). A variable is fixed in time and space (it does have an address); it holds an appearance of a value; it is neither unique nor eternal nor immutable in the general case. A constant is a variable which is defined to not mutate after initially being set. Terms like object are not used in this documentation for any aspects of Muldis D since their meaning in practice is both ambiguous and wide-reaching, and could refer to both values and variables depending on usage context.

text, character

A text is a string composed of Unicode characters, where a character is an abstract concept that usually is a grapheme or language-independent grapheme, but could potentially be a codepoint or language-specific grapheme. This documentation only uses the term character in an abstract sense, and no part of the Muldis::DB API is defined using that term. Rather, any operators or constraints that work with sub-strings of text will be specified in terms like NFC grapheme.

tuple

A tuple is an unordered heterogeneous collection of 0..N elements that are keyed by the element's name; each element is a name-value pair, and all names in the tuple are distinct. While tuple legitimately refers to the same thing as the Muldis::DB term sequence in other contexts, it does not in this documentation. Terms like record or row are not used in this documentation, the latter in particular because it implies ordinality.

relation, relvar, relcon

A relation is like an unordered homogeneous set of tuple where all member tuples have identical degree and name-sets, but that a relation data type knows what its allowed names are even if it has no tuples. Like with tuple, the term relation legitimately refers to a set or "ordinal tuple" in other contexts, but it does not in this documentation. Terms like record set or row set or table are not used in this documentation, the last 2 in particular because they imply a significance to the order of tuples, where there is none in a relation. Moreover, the term domain does not mean the same thing as relation, and neither does the term function; those terms have distinct meanings here. Note that the term relvar is short hand for relation-typed variable, and relcon is short hand for relation-typed constant. Note also that a relational database is called that because it is composed of relations, and not just because its relations can be joined or be associated through foreign key constraints.

function

A function is a routine whose invocation is used as a value expression, and it conceptually serves as a map between the domains of its parameters and its result value. A function is not the same as a relation, though both can be used as maps between values. Besides their conceptual difference in Muldis D as a routine vs a value, a selected relation value in Muldis D is always finite, and hence so is the cardinality of the map it can provide; whereas, a function can have an infinite map size.

database / relational database, dbvar, dbcon

Within this documentation, the actually more generic term database will be used to refer exclusively to a relational database, so you should read the former as if it were the latter. A database is a tuple whose (distinctly named) attributes are all relation-typed; one holds all user data that is being maintained as an interconnected unit. A database-typed variable, aka a dbvar, is managed by a DBMS/RDBMS, and such is what is more informally referred to outside this documentation as a "database". Whenever a user is "using a database", they are reading or updating a dbvar. Examples of databases are genealogy records, financial records, and a CMS' data. A database is not a program. A database-typed constant is a dbcon.

catalog

A catalog is a special kind of dbvar or dbcon whose relations hold meta-data about the normal databases that hold user data (and about themselves too); updating a catalog dbvar has the side-effect of changing the structure of the associated normal database. This meta-data describes all user-defined data types and operators, plus base and viewed relations, stored with and used with the database.

depot / repository

A depot or repository is a local abstraction of a typically external storage system which holds 1 database variable and 1 associated catalog, plus perhaps other details that assist the mapping of the abstraction to the actuality.

DBMS / RDBMS

Within this documentation, the actually more generic term DBMS will be used to refer exclusively to a RDBMS (Relational Database Management System), so you should read the former as if it were the latter. A RDBMS is a computer program that manages relational database variables, associated catalogs, and depots in general. Muldis::DB aspires to be or is one, and likewise are various other TTM-inspired programs like Rel and Duro; most other DBMS-like programs are technically non-relational, including all SQL DBMSs such as Oracle, PostgreSQL and SQLite, though they usually give lip-service to the relational data model and approximate a RDBMS to varying degrees.

sequence

Within this documentation, a sequence generically refers to an ordered collection of 0..N elements. The term array is not used in this documentation because that word's actual meaning is more broad, and includes both matrices plus unordered collections of name-value pairs. Note that a sequence may be used simply to maintain a simple collection in order, though the actual order of its elements may not always be significant. Sometimes sequence also refers specifically to the Seq data type, which is a particular binary relation.

selector

A selector is a routine that captures an appearance of a value for use in a variable or expression. The term constructor is not used in this documentation because all values in Muldis::DB are conceptually eternal and immutable, so it does not make sense to say that we are "building" one; we are "selecting" one.

This documentation is pending.

INTERPRETATION OF THE RELATIONAL MODEL

The relational model of data is based on predicate logic and set theory.

The model assumes that all data is represented as mathematical N-ary relations, an N-ary relation being a subset of the cartesian product of N data types. Reasoning about such data is done in two-valued predicate logic, meaning there are 2 possible evaluations for each proposition, either true or false.

The basic relational building block is the data type, which can consist of either scalar values or values of more complex types. A tuple is an unordered set of attributes, each of which has a name and a declared data type; an attribute value is a specific valid value for the type of the attribute. An N-relation is defined as an unordered set of N-tuples, and the tuples comprise the body of the relation; the relation has a heading, which is a set of attribute definitions (their names and types); this heading is also the heading of each of its tuples.

A heading represents a predicate, and there is a one-to-one correspondence between the free variables of the predicate and the attribute names of the heading. The body of a relation represents the set of true propositions that can be formed from the predicate represented by the relation's heading. The body of a tuple with the same heading provides attribute values to instantiate the predicate into a proposition by substituting each of its free variables. When a tuple appears in a relation body, the proposition it represents is deemed to be true. Contrariwise, for every tuple whose heading is the same as the relation's but does not appear in the relation body, its proposition is deemed to be false. This assumption is known as the closed world assumption.

The relational model specifies that data is operated on by means of a relational calculus or a relational algebra. These 2 are logically equivalent; for any expression in the relational calculus, there is an equivalent one in the relational algebra, and vice versa. Relational algebra, an offshoot of first-order logic, is a set of relations closed under operators; each operator takes N relations as arguments and results in a relation. While the relational algebra provides a more procedural way for specifying database queries, in contrast the relational calculus provides a more declarative way for specifying queries.

Mechanics of Some Relational Operations

This documentation section takes a very informal (and possibly blatantly incorrect) alternate approach to describing the nature of relations, tuples, and attributes, within the context of explaining the mechanics of how some relational operations work in practice.

Herein, we shall conceptualize a relation as a long boolean expression, consisting of a string of basic boolean-valued expressions that are selectively anded or ored together. A basic boolean-valued expression, <attr>, takes the form attribute <name> is <value>. Each tuple body, <tuple>, in the relation takes the form of a chained and that connects N <attr>, one per each attribute in the relation, and each having a distinct <name>. The relation body takes the form of a chained or that connects N <tuple>, one per each tuple in the relation, and each <tuple> has the same set of <name> as the others, but the set of <value> that each <tuple> has is distinct.

Take, for example, a relation having some details about people, where each attribute is a type of detail and each tuple has details for one person:

   name is John  and age is 32 and city is Vancouver
or name is Andy  and age is 46 and city is Toronto
or name is Julia and age is 27 and city is Halifax
etc...

Or a multi-relation example involving suppliers, foods, and shipments:

   farm is Hodgesons and country is Canada
or farm is Beckers   and country is England
or farm is Wickets   and country is Canada

   food is Bananas and colour is yellow
or food is Carrots and colour is orange
or food is Oranges and colour is orange
or food is Kiwis   and colour is green
or food is Lemons  and colour is yellow

   farm is Hodgesons and food is Kiwis   and qty is 100
or farm is Hodgesons and food is Lemons  and qty is 130
or farm is Hodgesons and food is Oranges and qty is 10
or farm is Hodgesons and food is Carrots and qty is 50
or farm is Beckers   and food is Carrots and qty is 90
or farm is Beckers   and food is Bananas and qty is 120
or farm is Wickets   and food is Lemons  and qty is 30

Now a very simple pair of relations:

   x is 4 and y is 7
or x is 3 and y is 2

   y is 5 and z is 6
or y is 2 and z is 1
or y is 2 and z is 4

So now will be briefly introduced a few common fundamental relational operations, that are projection, join, union.

A projection of a relation derives a relation that has a subset of the original's attributes, and all of its tuples. Continuing the boolean expression analogy, the projected relation contains fewer and <attr> than the original. For example, lets take the projection of the food column from the shipments relation, to get, initially:

   food is Kiwis
or food is Lemons
or food is Oranges
or food is Carrots
or food is Carrots
or food is Bananas
or food is Lemons

Now, the above expression can be simplified because it now contains redundancies, and the simplified version is logically identical:

   food is Kiwis
or food is Lemons
or food is Oranges
or food is Carrots
or food is Bananas

So this projected relation has 5 tuples rather than the original 7, and saving logical redundancy is why relations never have duplicate tuples.

A join of 2 relations derives a relation that has all of the originals' attributes, and its set of tuples is fundamentally the cartesian product of those of the originals. Following our boolean analogy, we start off by pairwise connecting instances of every <tuple> of the first relation with instances of every <tuple> of the second one, with the members of each pair then being chained together with and to form a single, longer chain of and. Note that join is commutative, so it doesn't matter which of the source relations is first or second, the result is the same, as much as foo and bar is the same as bar and foo. For example, lets do a join of our 2 simplest relations:

   x is 4 and y is 7 and y is 5 and z is 6
or x is 4 and y is 7 and y is 2 and z is 1
or x is 4 and y is 7 and y is 2 and z is 4
or x is 3 and y is 2 and y is 5 and z is 6
or x is 3 and y is 2 and y is 2 and z is 1
or x is 3 and y is 2 and y is 2 and z is 4

Now, when multiple relations are connected into one such as with a join, the relational model assumes that if either of the sources have attributes with the same names as each other, then they are both describing the same things. In this case, the references to attribute y from both relations are talking about the same y. And so, any result tuples that contradict themselves, saying that y equals both one value and equals a different one, can't ever be true and are eliminated; only the tuples where the y value is identical are kept:

   x is 3 and y is 2 and y is 2 and z is 1
or x is 3 and y is 2 and y is 2 and z is 4

Moreover, this expression can be simplified by removing the redundant y attribute:

   x is 3 and y is 2 and z is 1
or x is 3 and y is 2 and z is 4

All attributes in a relation have distinct names. And if there were any identical tuples, the redundant ones would be eliminated.

A join operation has several trivializing scenarios. If the 2 source relations have no attribute names in common, the result is simply the cartesian product. If the 2 sources have all their attribute names in common, the result is the common subset or intersection of their existing sets of tuples. If one source has all the attributes of the other, but the reverse isn't true, then the result is a subset of tuples from the relation that has more attributes; this is a semijoin.

A union of 2 relations, which requires that the 2 relations have the same headings, derives another relation with the same heading, and a union of the two's set of tuples as its body, with any duplicates eliminated. In terms of our boolean analogy, a union is simply chaining together the entirety of each relation's boolean expression with an or, and then eliminating redundancies from the result.

A full list of all the relational operators having more formal (but Muldis D specific) descriptions occurs further below in this document; that list does not use the aforementioned boolean analogies.

MULDIS D

Muldis D is a computationally / Turing complete (and industrial strength) high-level programming language with fully integrated database functionality; you can use it to define, query, and update relational databases. The language's paradigm is a mixture of declarative, imperative, functional, and object-oriented. It is primarily focused on providing reliability, consistency, portability, and ease of use and extension. (Logically, speed of execution can not be declared as a Muldis D quality because such a quality belongs to an implementation alone; however, the language should lend itself to making fast ones.)

The language is rigorously defined and requires users to be explicit, which leaves little room for ambiguity and related bugs. When something is specified in Muldis D, its semantics should be well known and fully portable (not implementation dependent). If a conforming implementation (usually a Muldis::DB Engine class) can't provide a specified behaviour, code using it will refuse to run at all, rather than silently changing its semantics; this also helps users to avoid bugs. Moreover, Muldis D generally disallows any details of an implementation's "physical representation" or other internals to leak through into the language; eg, there is no "varchar" vs "char", simply "text". Users should not have to know about this level of detail, and implementers should be free to adaptively pick optimum ways to satisfy user requests, and change later.

Muldis D, being first and foremost a data processing language, provides a thorough means to both introspect and define all DBMS entities using just data processing operators, which is called the DBMS "catalog". The catalog is a set of system-defined relvars (relation-typed variables) which reflect the definitions of DBMS entities; users can generally update these to create, alter, or drop DBMS entities. In fact, updating the catalog relvars is the fundamental way to do data-definition tasks in Muldis D, and any other provisions for data-definition are conceptually abstractions of this. Generally speaking, users can do absolutely everything in the DBMS with just data querying and updating operations.

The design and various features of Muldis D go a long way to help both its users and implementers alike. A lot of flexibility is afforded to implementers of the language to be adaptive to changing constraints of their environment and deliver efficient solutions. This also makes things a lot easier for users of the language because they can focus on the meaning of their data rather than worrying about implementation details; users can focus on defining what needs to be accomplished rather than how to accomplish that, which relieves burdens on their creativity, and saves them time. In short, this system improves everyone's lives.

What users fundamentally write are Muldis D "routines", each consisting of one or more "statements", and in executing these, all work is done.

TYPE SYSTEM

The Muldis D type system is a formal type system, at least in intent, and works conceptually in the following manner.

There is a single universal value set/domain, named Universal, whose members are all the values that can possibly exist; Universal is the maximal data type of the entire type system. Also there is a single nullary value set/domain, named Empty, which has zero members; Empty is the minimal data type.

All Muldis D data values as individuals are eternal and immutable. All values are logically distinct, and each value occurs exactly once, and is not fixed within time or space (so doesn't have an "address"). It does not make sense to say that you are creating or destroying or copying or mutating a value. However, an eternal immutable value can make an appearance within a variable, as a variable is a named/addressable container that is fixed within time and space, and it can be created, destroyed, mutated, and multiple variables can hold appearances of the same value. So when one appears to be testing 2 values for equality, they are actually testing whether 2 value appearances are in fact the same value.

Given that all data values in Muldis D are fundamentally immutable, the term "selector" is used to describe a routine that captures an appearance of a value into a variable (or for use in a value expression); this is analogous to the task that a "constructor" routine does in a typical object-oriented language, but that the former is conceptually "selecting" an eternally existing value rather than conceptually "creating" a new one.

In the Muldis D type system, a data type is a set of values, and as with individual values, a data type is eternal and immutable. Every data type is distinct from all other data types, and no 2 data types may encompass exactly the same set of values. Every data type other than Universal and Empty has at least 1 member value, and at most 1 less value than the universal set. If 2 data types have no values in common, they are said to be disjoint.

Given 2 arbitrary data types, T1 and T2, T1 is called a supertype of T2 if its value set is a superset of that of T2, and in that situation, T2 is a subtype of T1, as its value set is a subset of that of T1. Note that every type includes itself as its own supertype and subtype, in which case, the T1 and T2 of the previous example are the same type. By contrast, if T1 and T2 are explicitly different types but otherwise have that relationship, then T1 has at least 1 value that T2 doesn't have, in which case T1 is also called a proper supertype of T2, and T2 is also called a proper subtype of T1. Given those last examples, T1 is a more general type, and T2 is a more specific type. In this way, the system-defined Universal type is a proper supertype of all other types, and the system-defined Empty type is a proper subtype of all other types. Now, if no data type, T3 exists which is both a proper subtype of T1 and a proper supertype of T2, then T1 is an immediate supertype of T2, and T2 is an immediate subtype of T1. Note that the Muldis::DB type system supports multiple inheritance, so types can form a lattice rather than a tree.

Every value has at most a single most specific type (or MST), which is cited as the general answer to the question "what is this value's type". The MST of a value is the data type containing that value which has no proper subtypes that also contain that value. Moreover, to enforce the "at most a single" requirement, which keeps answering the question a simple affair, it is mandatory in Muldis D that when any 2 data types have values in common, there must exist a data type which contains only the values that they have in common, and hence is a subtype of both. Note that a value will always implicitly assume the most specific type that exists which contains it, even if a selector for a less specific type was explicitly used to select it.

A union type is a data type that has at least 2 immediate subtypes, and every one of its values is also a value of an immediate subtype; that is, the MST of every value in a union type is not that type. An intersection type is a data type that has at least 2 immediate supertypes. A difference type is a data type that has exactly 1 immediate supertype, and that supertype is a union type such that the difference type and another peer subtype of that union type are complementary with respect to the union type; every union type value is in either the difference type or its complement, but not both. In this way, Universal is a union type of all other types, and Empty is an intersection type of all other types.

A root type is a data type for which all of its values can be selected by the same single selector, and which has no proper supertype that is a root type. All root types are mutually disjoint, so every value is a member of exactly one root type. Generally speaking, root types are the implementational foundation over which all operators and all other types are built, and the declared parameter and result types of most system-defined operators are root types. The 6 core system-defined root types are: Bool, Text, Blob, Int, Tuple, Relation. All user-defined root types are scalar types that are defined not in terms of other types except for that any components of their possreps (possible representations) have declared types. Perhaps it should be said that all root types are defined by this last sentence? A leaf type is a data type that has no proper subtypes save for Empty.

Type Identification

All values in the Muldis::DB type system are broadly categorized into 5 complementary sets called scalar values, tuple values, relation values, quasi-tuple values, and quasi-relation values; tuple and relation values are collectively known as nonscalar values; quasi-tuple and quasi-relation values are collectively known as quasi-nonscalar values. The type system has the system-defined data types named Scalar, Tuple, Relation, QuasiTuple, and QuasiRelation, which serve as maximal data types for each category, respectively. The 5 types are all mutually disjoint, and Universal is a union type over all of them.

To keep things simpler, every data type (save Universal and Empty) must be a proper subtype of exactly 1 of the 5 categories, and can not include values from several of them. Therefore, every data type is said to be either a scalar type, a tuple type, a relation type, a quasi-tuple type, or a quasi-relation type, depending which category all of its values come from. In similar fashion, a nonscalar type is generally any type that is not a scalar type, if we ignored quasi-nonscalar types, meaning it is either a tuple type or a relation type.

The identity of every scalar type is defined by its name alone, and every scalar type must have a distinct name that is explicitly defined, either by the system or by the user as is applicable. Every value of a scalar type is conceptually opaque and atomic, and its components are not known to users of that type; but even when the components are known (because they are user-defined structured types), two independently defined scalar types are completely disjoint even if their components look the same, by definition. The only way for 2 scalar types to have values in common is if one is explicitly defined, directly or indirectly, as a subtype of, or as a union type encompassing a subtype of, the other.

Every value of a nonscalar type (either a tuple type or a relation type, respectively) is conceptually transparent, and its component structure is known to all. The identity of every nonscalar type is defined by its component structure alone, and every nonscalar type must have a distinct component structure. Any two nonscalar types that have the same component structure are in fact the same type, by definition, regardless of whether they were defined independently of each other or not.

A quasi-nonscalar type is the same as a nonscalar type as far as the means of identifying it go (by its structure, not by its name), but that particular kinds of components are permitted in quasi-nonscalar types that aren't permitted in nonscalar types (and aren't permitted in scalar types).

Scalar Types

Scalar types are the only conceptually encapsulated types in Muldis D, and are like other languages' concepts of object classes where all their attributes are private, and only accessible indirectly. The definition of a scalar type comprises usually one or more named possreps or possible representations, and for each of those, at least one selector operator and usually at least one accessor or the operator.

A possrep of a type is an exhaustively complete means for users to conceptualize the structure of the type; it is like a "role" or "interface definition. A possrep has the appearance of a complete collection of (zero or more) named object attributes (of any scalar or nonscalar type) that the type could logically be implemented as, and users can use it as if it actually was implemented that way, but without the requirement that the type actually is implemented that way. If a type has multiple possreps, said possreps can differ from each other in arbitrarily large ways, but every one is individually capable of representing all of the type's values; any possrep could be used exclusively by a user when they work with its type, without diminishing what they can do. A single possrep is specific to one and only one type, so it is possible to refer to a type by simply referring to the name of one of its possreps.

Taking for example an integer data type, one of its possreps could represent an integer value as a string of binary digits, while another possrep could represent an integer value as a string of decimal digits. Or taking for example a temporal data type, one of its possreps could represent a date as an ISO 8601 formatted character string in the Gregorian calendar, and another possrep could represent it as a number of seconds since the UNIX epoch. Or taking for example a spacial data type that is a rectangle, one possrep could specify the 4 vertices as 4 (or 3) point values, and another possrep could specify fewer vertices and also specify the rectangle's width and height as numeric values.

A possrep additionally has a defined boolean-valued constraint expression (which is simply true in the trivial case), that restricts what values the possrep components can have within the context of their fellows. Taking for example a "medium polygon" data type, there could be a constraint that the area of the polygon is between 5 and 10 units.

Each possrep comprises exactly one selector operator whose named parameter set exactly matches that possrep's set of named attributes, and you select a value of the type by invoking the selector with a full set of values for the possible attributes. Each possrep also comprises an accessor operator for each of its attributes, with which users can extract the possible attribute's value.

No data type has any operators built-in to its definition except for the aforementioned selectors and accessors. All other operators that are used with a data type are expressly not built-in to the type (even if they are system-defined); the other operators do not have any access to the data type's internals, and must be defined (directly or indirectly) in terms of (that is, layered on top of) the few that are built-in, though the built-ins from any or all possreps of the type can be utilized.

With a user-defined scalar type, if the type is to have multiple possreps, then just one possrep is defined as the fundamental one, and the other possreps are defined in terms of the first, by which means the mappings between them is done. The type-defining user can later come back and redefine the type if they wish, using a different possrep as the fundamental, but assuming the redefinition has all the same values, non-defining users of the type won't know any different.

The Muldis D implementation can choose for itself as to how the scalar type is physically represented behind the scenes, either picking between any of the user-provided possreps (assuming enough information is present to derive all needed inverse functions as applicable) or using yet another one or several of its own; the implementation can work how it knows best to achieve an efficient system, and this is all hidden away from the users, who simply perceive in it what they requested.

In the context of scalar subtype/supertype relationships, the definition of a subtype can add additional possreps that are only valid for the subtype, such that users of the subtype can use both possreps defined for the subtype and the supertype, but users of the supertype can only use the possreps for the supertype, and not the subtype. Taking for example the data types of rectangle and square, the latter is a subtype of the former; a possrep for a rectangle in general comprises its center point as well as its width and its height, which also works for a square; an additional possrep that just works for a square rather than a rectangle in general comprises a center point plus its length.

As a corollary to this, all union types have none of the possreps defined by their subtypes. So the system-defined Scalar type has no possreps at all, and hence has no selectors or accessors defined for it.

Tuple Types and Relation Types

Tuple types are the fundamental heterogeneous conceptually non-encapsulated collection types in Muldis D, and are like the Pascal language's concept of a record, or the C language's concept of a struct. The definition of a tuple type comprises a set of zero or more named attributes of any scalar or nonscalar type. This set definition is called the tuple's heading.

Relation types are the fundamental homogeneous conceptually non-encapsulated collection types in Muldis D, and are like other languages' concepts of sets (or arrays where all elements are distinct), but restricted in that all elements are tuples. The definition of a relation type looks exactly like the definition of a tuple type (such that a relation has a heading even if it has no tuples), but that the definition defines every tuple in the relation, and also but that relation types can additionally have keys defined which indicate that a subset of its attributes' values are distinct between all tuples in the relation.

Generic selector and accessor operators exist that work with all tuple and relation types, so they do not need to be defined per such type.

The system-defined types Tuple and Relation (and their system-defined subtypes) are technically generic factory types, such that they themselves do not define any attribute sets, and are supertypes of all tuple and relation types that do. Beyond this special case, a pair of tuple or relation types can only have a subtype/supertype relationship if they have compatible headings, which means the attribute sets are of the same degree, the attribute names are identical, and the name-wise corresponding attributes in each heading have a valid subtype/supertype relationship; each attribute of a tuple or relation subtype is a subtype of the same-named attribute of the tuple or relation supertype.

Quasi-Tuple Types and Quasi-Relation Types

The union types Universal, Tuple, Relation (and the system-defined subtypes of the latter 2) can be used as the declared types of such as variables and routine parameters, but they can not be used as the declared types of scalar possrep or nonscalar (tuple or relation) attributes. The declared type of each of the latter must be either a scalar type, or a specific tuple or relation subtype (meaning tuple or relation types that have specific attribute sets defined for them).

If all data types were scalar or nonscalar, then it would not be possible to define operators with N-ary parameters whose declared types are any of the aforementioned 3 union types. That is, an N-ary parameter is usually relation-typed, such that the multiplicity of values that the parameter can take are each provided as a tuple of said relation; however, as relation attributes can not have said union types as their declared types, it would not be possible to implement an N-ary relational join operator, for example, since each relation being joined would probably have a different heading than the others.

Quasi-tuple types and quasi-relation types exist as a solution to this problem, such that the quasi-heading of one is allowed to include attributes whose declared types are any type at all, including the union types Universal, Tuple, Relation, QuasiTuple, QuasiRelation, and subtypes of tuple and relation without specific attribute sets.

This said, the situations in which quasi-nonscalar types may be used are limited; only quasi-nonscalar types may have quasi-nonscalar types as components; scalar and nonscalar types may not.

Also, quasi-nonscalar types only have defined for them a subset of corresponding nonscalar type operators, partly because the former are not intended to replace the latter for the majority of use cases, and partly because some of them are simply impossible to implement for quasi-nonscalars: unwrap, ungroup.

Finite Types and Infinite Types

A finite type is a data type whose cardinality (count of member values) is known to be finite, and this cardinality can be deterministically computed; moreover, every value of a finite type can be represented somehow using a finite amount of memory. This doesn't exclude the possibility that either the cardinality or individual values are larger than present-day computing hardware can handle, but even if so, they could be handled by sufficiently larger but finite resources. An infinite type is a data type that is not a finite type; its cardinality is either known to be infinity, or it is unknown.

Generally speaking, all finite types are defined either as an explicit enumeration of values (for example, the boolean type, which has exactly 2 values), or they are scalar types whose possreps have zero attributes (each one is a singleton, having exactly 1 value), or they are the tuple or relation type that has zero attributes (which has exactly 1 or 2 values, respectively), or their values are all discrete and fall into a closed range (for example, a type comprising the range of integers between 1 and 100, or a type comprising all real numbers in the same range that have a granularity of 0.001, or any IEEE floating point number of a specific bit length), or their values are length-constrained strings of finite-cardinality elements (for example, a character string that is not longer than 250 characters), or they are composite scalar or nonscalar or quasi-nonscalar types whose attributes are all of finite types themselves (for example, a type whose attributes are all Bool).

Generally speaking, all infinite types are defined either as being some open-ended natural domain (for example, the type having all integers, or the type having all prime numbers), or they are some continuous domain, whether open-ended or not (for example, the type having all real or complex numbers between 1 and 100), or they are non-length-constrained strings (for example, the set of all possible text strings), or they are composite scalar or nonscalar or quasi-nonscalar types which have at least one attribute which is itself infinite (for example, a type that has an Int attribute).

The system-defined root type Bool is finite (2 values), as is the Empty type (zero values), while all of the other 5 core system-defined root types (Text, Blob, Int, Tuple, Relation) are infinite, as are the Universal, Scalar, QuasiTuple, QuasiRelation types.

All proper subtypes of finite types are themselves finite types. Proper subtypes of infinite types can be either finite or infinite depending on how they are defined. For example, a subtype of Int whose numbers are all simply greater than 10 is infinite, but a subtype whose numbers are additionally all less than 1000 is finite. The documentation for individual system-defined data types, further below, specifies whether each of which is finite or infinite, and in the latter case, it states a most generic means to specify a finite subtype.

Note that, while it is not mandated by the language, some Muldis D implementations may legitimately choose to impose restrictions on their users such that the declared types of all persisting variables must be of finite types only.

For example that all persisting Text types must have a maximum allowed length in characters specified, or that all persisting Int types must have a least and greatest allowed value specified. This would typically happen if the implementation needs to use fixed-size fields internally, such as 32-bit integers, and it is not practical to support the possibility that a value could be of any size at all (this is often the case with SQL databases implemented in C).

On the other hand, some implementations may natively support unlimited size values, such as those written in Perl, and so these can allow persisting the plain Text or Int types, which can make things less complicated for their users.

Of course, even with implementations that require finite types, this isn't to say that the declared type can't be a very large finite type, but then the implementation can choose to use, for example, either a machine native integer or a string of digits behind the scenes for all values of the type, and can do this deterministically, depending what constraint the type defining user chose.

Universal Implicit Operators

Muldis D is universally polymorphic to at least a small degree, such that every data type without exception has both an assign update operator (for assigning a value of that type to a variable of that type) and an equal function for testing 2 values of that type for equality (as well as not_equal, for inequality). Moreover, these operators exist implicitly, so when one defines the initial possrep of a new type, they get those operators for the type at no extra cost.

This documentation is pending.

ENVIRONMENT

The Muldis::DB DBMS / virtual machine, which by definition is the environment in which Muldis D executes, conceptually resembles a hardware PC, having a command processor (CPU), standard user input and output channel, persistent read-only memory (ROM), volatile read-write memory (RAM), and persistent read-write disk or network storage.

Within this analogy, the role of the PC's user, that feeds it through standard input and accepts its standard output, is fulfilled by the application that is driving the Muldis::DB DBMS; similarly, the application itself will activate the virtual machine when wanting to use it (done in this distribution by instantiating a new Muldis::DB::Interface::DBMS object), and deactivate the virtual machine when done (letting that object expire).

When a new virtual machine is activated, the virtual machine has a default state where the CPU is ready to accept user-input commands to process, and there is a built-in (to the ROM) set of system-defined entities (data types, operators, variables, etc) which are ready to be used to define or be invoked by said user-input commands; the RAM starts out effectively empty and the persistent disk or network storage is ignored.

Following this activation, the virtual machine is mostly idle except when executing Muldis D commands that it receives via the standard input (done in this distribution by invoking methods on the DBMS object). The virtual machine effectively handles just one command at a time, and executes each separately and in the order received; any results or side-effects of each command provide a context for the next command.

At some point in time, as the result of appropriate commands, data repositories, or "depots" (either newly created or previously existing) that live in the persistent disk or network storage will be mounted within the virtual machine, at which point subsequent commands can read or update them, then later unmount them when done. Speaking in the terms of a typical database access solution like the Perl DBI, this mounting and unmounting of a repository usually corresponds to connecting to and disconnecting from a database. Speaking in the terms of a typical disk file system, this is mounting or unmounting a logical volume.

Any mounted persistent depot, as well as the temporary "application" depot which is most of the conceptual PC's RAM, is home to all user-defined data variables, data types, operators, constraints, packages, and routines; they collectively are the database that the Muldis::DB DBMS is managing. Most commands against the DBMS would typically involve reading and updating the data variables, which in typical database terms is performing queries and data manipulation. Much less frequently, you would also see "data definition" changes, namely what user-defined variables, types, etceteras exist, done fundamentally by data-updating special system-defined "catalog" variables. Any updates to a persistent depot will usually last between multiple activations of the virtual machine, while any updates to the temporary "application" depot are lost when the machine deactivates.

All virtual machine commands are subject to a collection of both system-defined and user-defined constraints (also known as business rules), which are always active over the period that they are defined. The constraints restrict what state the database can be in, and any commands which would cause the constraints to be violated will fail; this mechanism is a large part of what makes the Muldis::DB DBMS a reliable modeler of anything in reality, since it only stores values that are reasonable.

Note that in practice, the aforementioned concept of "commands" is realized by "statements" (which are grouped into "routines").

ROUTINES

There are several kinds of Muldis::DB routines, each of which is intended for, and in many cases only permitted to be used for, particular tasks. Note that for all Muldis::DB routines which have parameters, they are all named rather than positional parameters; in the case of N-ary routines, the N similar argument values come by way of a single nonscalar (or, if necessary, quasi-nonscalar) typed parameter.

function

A function is an explicitly invokable read-only operator whose invocation both results in and represents a value of a specific data type (that is the function's result type or declared type; this invocation can only exist as part of a value-expression of another routine. A function is pure and deterministic in the functional-language sense, such that all of its 0..N parameters are read-only / not subject to update, and that it can only see or influence its own lexical variables, if it has any (no globals), and that it can only invoke function and update_operator routines. The vast majority of invokable system-defined routines are function; they include all value selectors, and the typical numeric, string, and relational operators, such that you would compose a typical database "select" query out of.

update_operator

An update_operator is an explicitly invokable procedure with 1..N parameters that has at least one parameter which is subject to update, and that can only see or influence its own lexical variables (no globals); it can only be invoked as the root part of a statement in another routine. An update_operator can only invoke function and update_operator routines, and it is much like a function, including being deterministic, but that its result value is via a parameter. Most non function system-defined routines are update_operator; they include all assign operators, plus some relational-assignment short-hands such as "insert", "update", "delete".

system_service

A system_service is an explicitly invokable system-defined procedure with 0..N parameters that can reach outside of the deterministic DBMS environment in order to do non-deterministic things (besides working with depots), such as to initiate I/O of various kinds, or fetch the current date and time, or generate a random number. Given the nature of this beast, users can not define their own system_service functions but by updating the Muldis::DB source code itself. Invoking a system_service function can have side-effects outside of the DBMS, but it will not alter anything inside the DBMS aside from any of its subject-to-update parameters.

procedure

A procedure is an explicitly invokable routine with 0..N parameters that can see and update global variables, and can invoke any kind of invokable routine; it can only be invoked as the root part of a statement in another routine. The procedure and host_gate are the only explicitly invokable routines that can directly reference global containers, whether catalog or data. The vast majority of procedure that exist will be user-defined. But some system-defined routines that would otherwise be function or update_operator are procedure instead solely because they are non-deterministic; an example is an operator that derives a tuple sequence from a relation without fully sorting the tuples, because the result is fundamentally random and non-repeatable.

type_constraint

A type_constraint is an implicitly invokable routine that is associated with / is part of a data type definition and is invoked automatically when a value of that type is being selected; it asserts whether said value, which by this time is known to be acceptable to the current data type's more generic supertype, is within the data type's own more restricted domain. This routine can only see its own lexical variables (no globals). This routine has 1 read-only parameter, which is the value to examine, and it results in a Bool. The DBMS would then throw a type-constraint-violation exception if the constraint results in False, and no-op if it results in True. Conceptually speaking, a type_constraint will execute before any other kinds of constraints.

state_constraint

A state_constraint is an implicitly invokable routine that can see global variables and is invoked automatically at the end of every statement that the DBMS executes, wherein it asserts that all global variables are collectively in a valid state; it can see said variables directly, but updates none. This routine has no parameters and results in a Bool. The DBMS responds as per a type constraint; if the constraint fails, then the just-executed statement is rolled back, and an exception is thrown. Conceptually speaking, a state_constraint will execute after all type_constraint and before all transition_constraint.

transition_constraint

A transition_constraint is an implicitly invokable routine that can see global variables and is invoked automatically at the end of every statement that the DBMS executes, wherein it asserts that all global variables have collectively transitioned in a valid fashion between their before-update state and their after-update state; it can see both versions of said variables directly, but updates none. This routine has no parameters and results in a Bool. The DBMS responds as per a state constraint. Conceptually speaking, a transition_constraint will execute after all other kinds of constraints.

host_gate

A host_gate is an anonymous procedure that can not be invoked by any other routine and exists in the limbo between the DBMS and the application that is driving it; the application holds a Muldis::DB::Interface::HostGateRtn object to reference it by (the result of a Muldis::DB::Interface::DBMS object's prepare() method), and it exists only so long as that object doesn't go out of scope. A host_gate is mostly like a procedure in every other way. Main differences are that the only host_gate routines are ones that users define, and that they are the only routines allowed to have explicit isolated transaction initiation and termination statements (for optional use in a parent-most transaction); all other updating routine types have self-contained atomic blocks instead.

Note that Muldis::DB currently has no direct support for the concept of a trigger-routine that can update a database; updating virtual relvars or invoking procedure are recommended instead. As for non-updating trigger-routines, the state/transition constraint routines already perform that feature. The feature in question may be directly supported later?

This documentation is pending.

USERS AND PRIVILEGES

The Muldis::DB DBMS / virtual machine itself does not have its own set of named users where one must authenticate to use it. Rather, any concept of such users is associated with individual persistent repositories, such that you may have to authenticate in order to mount them within the virtual machine; moreover, there may be user-specific privileges for that repository that restrict what users can do in regards to its contents.

The Muldis::DB privilege system is orthogonal to the standard Muldis::DB constraint system, though both have the same effect of conditionally allowing or barring a command from executing. The constraint system is strictly charged with maintaining the logical integrity of the database, and so only comes into affect when an update of a repository or its contents are attempted; it usually ignores what users were attempting the changes. By contrast, the privilege system is strictly user-centric, and gates a lot of activities which don't involve any updates or threaten integrity.

The privilege system mainly controls, per user, what individual repository contents they are allowed to see / read from, what they are allowed to update, and what routines they are allowed to execute; it also controls other aspects of their possible activity. The concerns here are analogous to privileges on a computer's file system, or a typical SQL database.

This documentation is pending.

TRANSACTIONS AND CONCURRENCY

This official specification of the Muldis::DB DBMS includes full ACID compliance as part of the core feature set; moreover, all types of changes within a repository are subject to transactions and can be rolled back, including both data manipulation and schema manipulation; moreover, an interrupted session with a repository must result in an automatic rollback, not an automatic commit.

It is important to point out that any attempt to implement the Muldis::DB DBMS (a Muldis::DB Engine) which does not include full ACID compliance, with all aspects described above, is not a true Muldis::DB DBMS implementation, but rather is at best a partial implementation, and should be treated with suspicion concerning reliability. Of course, such partial implementations will likely be made and used, such as ones implemented over existing DBMS products that are themselves not ACID compliant, but you should see them for what they are and weigh the corruption risks of using them.

Note that the best way for an Engine to behave, if for some reason it is built in such a way and/or over an existing DBMS product that does implicit commits after, say, data-definition statements, is for it to throw an exception if data-definition is attempted within an explicit / multi-statement transaction, such that a user of that Engine can only do data-definition outside of an explicit transaction; in this way, the Engine is still following all the Muldis::DB safety rules, and hence should be relatively safe to use, even if it lacks Muldis::DB features.

Each individual instance of the Muldis::DB DBMS is a single process virtual machine, and conceptually only one thing is happening in it at a time; each individual Muldis D statement executes in sequence, following the completion or failure of its predecessor. During the life of a statement's execution, the state of the virtual machine is constant, except for any updates (and side-effects of such) that the statement makes. Breaking this down further, a statement's execution has 2 sequential phases; all reads from the environment are done in the first phase, and all writes to the environment are done in the second phase. Therefore, regardless of the complexity of the statement, and even if it is a multi-update statement, the final values of all the expressions to be assigned are determined prior to any target variables being updated. Moreover, as all functions may not have side-effects, and we don't support the concept of "trigger" routines that can perform updates, we avoid complicating the issue due to environment updates occurring during their invoker statement's first phase.

In account to situations where external processes are concurrently using the same persistent (and externally visible) repository as a Muldis::DB DBMS instance, the Muldis::DB DBMS will maintain a lock on the whole repository (or appropriate subset thereof) during any active read-only and/or for-update transaction, to ensure that the transaction sees a consistent environment during its life. The lock is a shared lock if the transaction only does reading, and it is an exclusive lock if the transaction also does writing. Speaking in terms of SQL, the Muldis::DB DBMS supports only the serializable transaction isolation level.

Note that there is currently no official support for using Muldis::DB in a multi-threaded application, where its structures are shared between threads, or where multiple thread-specific structures want to use the same repositories. But such support is expected in the future.

No multi-update statement may target both catalog and non-catalog variables. If you want to perform the equivalent of SQL's "alter" statement on a relation variable that already contains data, you must have separate statements to change the definition of the relation variable and change what data is in it, possibly more than one of each; the combination can still be wrapped in an explicit transaction for atomicity.

Transactions can be nested, by starting a new one before concluding a previous one, and the parent-most transaction has the final say on whether all of its committed children actually have a final committed effect or not. The layering of transactions can involve any combination of explicit and implicit transactions (the combination should behave intuitively).

The lifetimes of all transactions in Muldis D (except those declared in host_gate routines) are bound to specific lexical scopes, such that they begin when that scope is entered and end when that scope is exited; if the scope is exited normally, its transaction commits; if the scope terminates early due to a thrown exception, its transaction rolls back.

Each Muldis D explicitly invokable routine as a whole (being a lexical scope), whether system-defined or user-defined, is implicitly atomic, so invoking one will either succeed or have no side-effect, and the environment will remain frozen during its execution, save for the routine's own changes. The implicit transaction of a function is always read-only, and the implicit transaction of a procedure is either read-only or for-update depending on what it wants to do. Each try-block is also implicitly atomic, committing if it exits normally or rolling back if it traps an exception.

Every Muldis D statement (including multi-update statements) is atomic; all parts of that statement and its child expressions will see the same static view of the environment; if the statement is an update, either all parts of that update will succeed and commit, or none of it will (accompanied by a thrown exception) and no changes are left.

Explicit atomic statement blocks can also be declared within a routine.

Muldis D also supports the common concept of explicit open-ended transaction statements that start or end transactions which are not bound to lexical scopes; however, these statements may only be invoked within a host_gate routine, that an application invokes directly, and not in any named routines, nor within atomic statement blocks in host_gate routines.

While scope-bound transactions always occur entirely within one invocation of the DBMS by an application, the open-ended transactions are intended for transactions which last over multiple DBMS invocations of an application.

All currently mounted repositories (persistent and temporary both) are joined at the hip with respect to transactions; a commit or rollback is performed on all of them simultaneously, and a commit either succeeds for all or fails for all (a repository suddenly becoming inaccessible counts as a failure). Note that if a Muldis::DB DBMS implementation can not guarantee such synchronization between multiple repositories, then it must refuse to mount more than one repository at a time under the same virtual machine (users can still employ multiple virtual machines, that are not synchronized); by doing one of those two actions, a less capable implementation can still be considered reliable and recommendable.

Some Muldis D commands can not be executed within the context of a parent transaction; in other words, they can only be executed directly by an anonymous routine, the main examples being those that mount or unmount a persistent repository; this is because such a change in the environment mid-transaction would result in an inconsistent state.

Muldis D lets you explicitly place locks on resources that you don't want external processes to change out from under you, and these locks do not automatically expire when transactions end; or maybe they do; this feature has to be thought out more.

This documentation is pending.

SYSTEM-DEFINED DATA TYPES

These data types are built-in to Muldis D and should be available under all of its implementations. They are available for the entire time that a DBMS is active and can be used by both persistent and temporary user-defined entities.

Maximal and Minimal Data Types

Muldis D provides 2 special data types named Universal and Empty, which contain all values in the universe and no values at all, respectively; they are implicit supertypes and subtypes of all other data types, respectively.

Given the nature of Universal, it is impossible to reference/select a value whose most specific type is that. But it is valid for the declared type of any data type attribute or container to be Universal, as this says that the attribute or container is allowed to hold a value of any data type. Now it stands to reason that the default values for Universal must be of its selectable subtypes. The default value of Universal is the Bool False value.

By contrast, Empty is very limited in its use. It doesn't make sense for an attribute or container to have a declared type of Empty because no value could be held by it, and all containers in Muldis D must be holding a value. However, it is valid to declare a Relation type whose attribute types are Empty, meaning that type is a proper subtype of all other Relation with the same attribute names.

The cardinality of the Universal|Empty types is infinity|zero respectively; regarding the first one, it is impossible to define a most-generalized finite subtype.

Note that the full names of these types for referential purposes are sys.type.Universal and sys.type.Empty.

Core Scalar Data Types

These scalar data types provide the fundamentals over which anything more complicated can be implemented, and most user-defined data types are specified in terms of them; they are all non-structured.

sys.type.Bool

A Bool is a truth value, and can be either False or True. Its default value is False. The cardinality of this type is 2.

sys.type.Text

A Text is a string of characters. Its default value is the empty string. Note that there is only one system-defined character repertoire for Text types, which is the newest Unicode repertoire (5.0.0). The cardinality of this type is infinity; to define a most-generalized finite Text subtype, you must specify a maximum length in characters (that is, eg, in NFC graphemes) that the subtype's strings are.

sys.type.NEText

A NEText (non-empty text) is a proper subtype of Text where its length in characters must be more than zero; it can be any Text except for the empty string. Its default value is a single "space" character.

sys.type.Blob

A Blob is an undifferentiated string of bits. Its default value is the empty string. The cardinality of this type is infinity; to define a most-generalized finite Blob subtype, you must specify a maximum length in characters that the subtype's strings are.

sys.type.NEBlob

A NEBlob (non-empty blob) is a proper subtype of Blob where its length in bits must be at least 1; it can be any Blob except for the empty string. Its default value is a single zero bit.

sys.type.Int

An Int is a single integral number of any magnitude. Its default value is zero. The cardinality of this type is infinity; to define a most-generalized finite Int subtype, you must specify the 2 integer end-points of the inclusive range that all its values are in.

sys.type.UInt

A UInt (unsigned integer) is a proper subtype of Int where all member values are non-negative / greater than or equal to zero.

sys.type.PInt

A PInt (positive integer) is a proper subtype of UInt where all member values are positive / greater than or equal to one. Its default value is one.

Core Nonscalar Data Types

These nonscalar data types, Tuple|Relation|Database|Set|Seq|Bag|Maybe, permit transparent/user-visible compositions of multiple values into other conceptual values. Unlike with scalar types in general, every system-defined selector for nonscalar values will result in values that are of proper subtypes of the 2 nonscalar root types (the first 2), and none whose most specific type is "just" one of those 2. Moreover, every such most-specific type has explicit element types or attribute sets defined; there are no nonscalar values where the element types or attribute sets are undefined. For all nonscalar types, their cardinality is mainly or wholly dependent on the data types they are composed of.

sys.type.Tuple

A Tuple is an unordered heterogeneous collection of 0..N named attributes (the count of attributes being its degree), where all attribute names are mutually distinct, and each attribute may be of distinct selectable types; the mapping of a tuple's attribute names and their declared data types is called the tuple's heading. Its default value is a zero-attribute tuple. The cardinality of this type is equal to the product of the number of permutations drawable from the values of each of its attributes' declared data types; for a Tuple subtype to be finite, all of its attribute types must be.

sys.type.Relation

A Relation is analogous to a set of 0..N tuples where all tuples have the same heading (the degrees match and all attribute names and corresponding declared data types match), but that a Relation data type still has its own corresponding heading (attribute names and declared data types) even when it consists of zero tuples. Its default value is a zero-attribute and zero-tuple relation. Matters of its cardinality are generally the same as for Tuple. A Relation data type can also have (unique) keys each defined over a subset of its attributes, which constrain its set of values relative to there being no explicit keys, but having the keys won't turn an infinite Relation type into a finite one.

sys.type.Database

A Database is a proper subtype of Tuple where all of its attributes are Relation-typed; it is otherwise the same.

sys.type.Set

A Set is a proper subtype of Relation that has 1 attribute, and its name is value; it can be of any declared type. A Set subtype is normally used by any system-defined N-ary operators where the order of their argument elements or result is not significant, and that duplicate values are not significant. Its default value has zero tuples.

sys.type.Seq

An Seq is a proper subtype of Relation that has 2 attributes, and their names are index and value, where index is a unary key and its declared type is an UInt subtype (value can be non-unique and of any declared type). A Seq is considered dense, and all index values in one are numbered consecutively from 0 to 1 less than the count of tuples, like array indices in typical programming languages. A Seq subtype is normally used by any system-defined N-ary operators where the order of their argument elements or result is significant (and duplicate values are significant); specifically, index defines an explicit ordering for values. Its default value has zero tuples.

sys.type.Bag

A Bag is a proper subtype of Relation that has 2 attributes, and their names are value and count, where value is a unary key (that can have any declared type) and count is a PInt subtype. A Bag subtype is normally used by any system-defined N-ary operators where the order of their argument elements or result is not significant, but that duplicate values are significant; specifically, count defines an explicit count of occurrences for values. Its default value has zero tuples.

sys.type.Maybe

A Maybe is a proper subtype of Set that may have at most one element; that is, it is a unary Relation with a nullary key. Operators that work specifically with Maybe subtypes can provide a syntactic shorthand for working with sparse data; so Muldis D has something which is conceptually close to SQL's nullable types without actually having 3-valued logic; it would probably be convenient for code that round-trips SQL by way of Muldis D to use the Maybe type. Its default value has zero tuples.

Core Quasi-Nonscalar Data Types

These quasi-nonscalar data types correspond to their similarly-named (differing only by the Quasi) nonscalar data types, and their use is intended to be limited to the few situations where the corresponding nonscalar data types can't be used.

sys.type.QuasiTuple

A QuasiTuple is like a Tuple but that the declared types of its attributes can be anything at all. Its cardinality is infinite.

sys.type.QuasiRelation

A QuasiRelation is like a Relation but that the declared types of its attributes can be anything at all. Its cardinality is infinite.

sys.type.QuasiSet

A QuasiSet is a proper subtype of QuasiRelation in the corresponding manner to Set being a proper subtype of Relation. Its cardinality is infinite.

sys.type.QuasiSeq

A QuasiSeq is a proper subtype of QuasiRelation in the corresponding manner to Seq being a proper subtype of Relation. Its cardinality is infinite.

sys.type.QuasiBag

A QuasiBag is a proper subtype of QuasiRelation in the corresponding manner to Bag being a proper subtype of Relation. Its cardinality is infinite.

sys.type.QuasiMaybe

A QuasiMaybe is a proper subtype of QuasiRelation in the corresponding manner to Maybe being a proper subtype of Relation. Its cardinality is infinite.

Catalog Data Types

These scalar data types are special-purpose in nature, as they are intended for use in defining or working with catalog types, and all catalog variables are system-defined.

sys.type.Cat.EntityName

A Cat.EntityName is a canonical name of a DBMS entity or attribute thereof. This name is conceptually multi-part, with the parts forming a sequence of 1..N Text, ordered from greatest to least significance; names of attributes typically have just 1 part and so are conceptually just a Text, though often database attribute names, which are its component relation names, are multi-part (extra parts being for "schema" name-spaces and such); names of data types and operators are typically 3-4 parts, where sys is an example of part 1. Cat.EntityName has multiple possreps; one is a Seq(1..N) of Text; one is a specially encoded Text where parts are ordered from greatest to least significance, and adjacent parts are separated with a single "period" character, and any literal period or backslash characters in parts are backslash-escaped as \p and \b respectively (no other characters are escaped). Its default value consists of a single part that is the empty string.

sys.type.Cat.EntityDeclName

A Cat.EntityDeclName is a proper subtype of Cat.EntityName. It is a declared name of a DBMS entity, or the declared name of an attribute thereof, that either is defined with.

sys.type.Cat.TypeDeclName

A Cat.TypeDeclName is a proper subtype of Cat.EntityDeclName whose Text possrep is of the format (sys|nat|lex)\.type\..*; it is for type names specifically. Its default value is lex.type..

sys.type.Cat.RtnDeclName

A Cat.RtnDeclName is a proper subtype of Cat.EntityDeclName whose Text possrep is of the format (sys|nat|lex)\.rtn\..*; it is for routine names specifically. Its default value is lex.rtn..

sys.type.Cat.VarDeclName

A Cat.VarDeclName is a proper subtype of Cat.EntityDeclName whose Text possrep is of the format (sys|nat|lex)\.(cat|data)\..*; it is for variable and constant names specifically. Its default value is lex.data..

sys.type.Cat.EntityInvoName

A Cat.EntityInvoName is a proper subtype of Cat.EntityName. It is a name by which some routine invokes or references a DBMS entity. Unlike a Cat.DeclTypeName, a Cat.EntityInvoName is more likely to be partially qualified, perhaps.

sys.type.Cat.(Type|Rtn|Var)InvoName

These are subtypes of Cat.EntityInvoName whose formats are analogous to the similarly named subtypes of Cat.EntityDeclName.

This documentation is pending.

Numeric Data Types

These non-fundamental scalar data types describe common kinds of numbers that are not specifically limited to having integer values. While integers are part of the Muldis D core, other kinds of numbers are not, for various reasons, and are represented under this "Numeric" type group instead. As is usual with the Muldis::DB type system, the types mentioned below are all mutually exclusive aside from explicit subtype relationships. Of course, dealing with these types in general isn't a perfect science; they stand to be revised or rewritten.

sys.type.Num.Rat

A Num.Rat is a single rational exact real number of any magnitude. It is conceptually an Int (numerator) divided by a PInt (denominator). Its default value is zero. The cardinality of this type is infinity; to define a most-generalized finite Num.Rat subtype, you must specify the greatest magnitude positive integer denominator that its values have, plus the the 2 integer end-points of the inclusive range of the numerator that all its values have. Common subtypes specify maximum denominators that are powers of either 2 or 10.

sys.type.Num.RatI

A Num.RatI (rational: integer) is a proper subtype of Num.Rat where the denominator is 1; every value of this type maps to exactly 1 Int value and vice-versa, so it is conceptually like an integer, without being disjoint to Num.Rat.

sys.type.Num.RatB

A Num.RatB (rational: binary) is a proper subtype of Num.Rat where the denominator is a power of 2; it is the best option to exactly represent non-integers that are conceptually binary or octal or hexadecimal.

sys.type.Num.RatD

A Num.RatD (rational: decimal) is a proper subtype of Num.Rat where the denominator is a power of 10; it is the best option to exactly represent non-integers that are conceptually the decimal numbers that lay-people work with.

sys.type.Num.FloatB32

A Num.FloatB32 implements a 32-bit binary floating-point numeric as specified in IEEE 754, having 1 bit for sign, 8 bits for exponent, and 23 bits for mantissa. As per the standard, most of its values are ordinary real numbers, but it has distinct representations for +/- zero, +/- infinity, and various kinds of NaNs. The cardinality of this type is approximately 2**32.

sys.type.Num.FloatB64

A Num.FloatB64 is the same as Num.FloatB32 but for its precision; it has 1 bit for sign, 11 bits for exponent, and 52 bits for mantissa. The cardinality of this type is approximately 2**64.

TODO: Maybe add types like FloatB128, fuzzy or interval numbers, complex numbers, and so on.

Temporal Data Types

These non-fundamental scalar data types describe common kinds of temporal artifacts according to modern calendars. They come in a variety of precisions and epochs so that users can pick one that most accurately represents what they actually know about their data. Of course, dealing with these types in general isn't a perfect science; they stand to be revised or rewritten.

sys.type.Temporal.Duration

A Temporal.Duration is a single amount of time, with precision to arbitrary fractions of a second. It is not fixed to any date or time and is agnostic to both the calendar and whether or not a time-zone offset is known. Its default value is zero. The cardinality of this type is infinity; to define a most-generalized finite Temporal.Duration subtype, you must specify the maximum amount of time that its values may be.

sys.type.Temporal.DurationOfDays

A Temporal.DurationOfDays is the same as Temporal.Duration in all respects but that its precision is only to the whole terrestrial day. The cardinality of this type is infinity; to define a most-generalized finite Temporal.DurationOfDays subtype, you must specify the maximum amount of time that its values may be.

sys.type.Temporal.DateTime

A Temporal.DateTime is a single specific time on a specific date, with precision to arbitrary fractions of a second. It does incorporate an explicit terrestrial time-zone offset (relative to UTC), so you use it when you do know the time-zone and it is significant (which is usually). It is conceptually calendar-agnostic. The default value of Temporal.DateTime is the Perl 6 epoch, namely 2000-1-1T0:0:0 in the Gregorian calendar, with a time-zone offset of zero. The cardinality of this type is infinity; to define a most-generalized finite Temporal.DateTime subtype, you must specify the earliest and latest datetimes it includes, and also its least magnitude fraction of a second.

sys.type.Temporal.DateTimeNoTZ

A Temporal.DateTimeNoTZ is the same as Temporal.DateTime in its precision, but it does not incorporate an explicit terrestrial time-zone offset interval, and so it is conceptually ambiguous within an interval of about 25 hours; you use it when you do not know the time-zone or it is not significant (which is not usually). Its default value is 2000-1-1T0:0:0 in the Gregorian calendar. Matters of its cardinality are the same as for Temporal.DateTime.

sys.type.Temporal.Date

A Temporal.Date is the same as Temporal.DateTime in all respects but that its precision is only to the whole terrestrial day. Its default value is 2000-1-1 in the Gregorian calendar. The cardinality of this type is infinity; to define a most-generalized finite Temporal.Date subtype, you must specify the earliest and latest dates it includes.

sys.type.Temporal.DateNoTZ

A Temporal.DateNoTZ is the same as Temporal.Date in all respects but that its differences correspond to the differences between Temporal.DateTime and Temporal.DateTimeNoTZ. Matters of its cardinality are the same as for Temporal.Date.

sys.type.Temporal.Time

A Temporal.Time is a single specific time that isn't on any day in particular, and isn't part of any calendar in particular, with a precision to arbitrary fractions of a second; its allowed range is between zero seconds (inclusive) and 1 terrestrial day (exclusive). It does incorporate an explicit time-zone offset interval as per Temporal.DateTime. Its default value is 0:0:0. The cardinality of this type is infinity; to define a most-generalized finite Temporal.Time subtype, you must specify its least magnitude fraction of a second.

sys.type.Temporal.TimeNoTZ

A Temporal.TimeNoTZ is the same as Temporal.Time in all respects but that its differences correspond to the differences between Temporal.DateTime and Temporal.DateTimeNoTZ. Matters of its cardinality are the same as for Temporal.Time.

Spatial Data Types

These non-fundamental scalar data types describe common kinds of spatial or geometric figures. Of course, dealing with these types in general isn't a perfect science; they stand to be revised or rewritten.

This documentation is pending.

SYSTEM-DEFINED ROUTINES

These routines are built-in to Muldis D and should be available under all of its implementations. They are available for the entire time that a DBMS is active and can be used by both persistent and temporary user-defined entities.

This documentation is pending.

ENTITY NAMES

All entities that exist at some given time within a DBMS environment can be explicitly referenced in some manner for definition and/or use; there are no orphans. At the very least, every kind of DBMS entity is defined in one or more catalog relvars; its interface and/or implementation can be observed and possibly updated therein.

Note that the following namespaces assume that a program that is written in Muldis D executes possibly either standalone or a peer-to-peer process that can have its global variables made visible to other processes, or have others' made visible to it. Or in other words, the program can both manage its own database and be a DBMS client, and the program can either just use the DBMS itself or be a server of it.

sys.*

Under here are all non-lexical system-defined, hardwired, readonly, eternal entities.

sys.type.*

These are the invocation-names of system-defined data types.

sys.rtn.*

These are the invocation-names of system-defined (explicitly invokable) routines.

sys.cat

This is the read-only system catalog that describes all system-defined entities, including data types, operators, and all catalogs.

sys.data.*

These are some system-defined constants that hold values commonly useful in user-defined routines; providing the constants this way is an alternative to defining niladic functions which result in them. Moreover, some system-defined functions that do have arguments may also or alternately have analagous 2+ degree relcons here for use as lookup tables.

nat.*

Under here are all non-lexical user-defined entities that are either private to the current program, or that fundamentally live with the current program but are shareable with peer programs (where the peer programs are clients and the current program a server), or that are the current program's naturalized perception of entities that fundamentally live with peer programs (where the peer programs are servers and the current program a client), or perhaps in a disk file instead. If the current program is primarily viewed by users as a "DBMS server" or "transient RAM-based embedded DBMS", then the "database" they are using (via their client programs) is probably of the second entity group. If the current program is primarily viewed by users as a "DBMS client" or "persistent file-based embedded DBMS", then the "database" they are using is probably of the third entity group. In the third case, it is very likely that some details of this perception are coded into the current program itself and that the peer program has a different perception that excludes those details; in that case, the peer program is somewhat of a slave of the current program, as is applicable. As far as non-lexical container/variable entities go, only those whose type is Database can be actually shared between peer programs, and any other non-lexical containers/variables, if any are strictly private to the current program.

nat.type.*

These are the invocation-names of native user-defined data types.

nat.rtn.*

These are the invocation-names of native user-defined explicitly invokable routines.

nat.cat

This is the user-updateable catalog that describes all native user-defined entities, including data types, operators, data dbvars, and any other private non-lexical variables that might exist.

nat.data.*

These are the user-updateable native data dbvars and other non-lexical containers/variables themselves, that the current program conceptually or actually stores all of its transient and persistent data in.

lex.*

Under here are all lexical entities whose invokability or life exists solely within the scope of an executing routine; the definitions of those entities are part of the *.cat that defines the routines themselves. The lex.* namespace is further subdivided the same way as nat.* is, into type|rtn|cat|data, with behaviour being the same but that the effects are lexically scoped. Most of the time, only lex.data.* are actually used, but the others permit one to generate new variables and operators etc at runtime, which only exist in a temporary lexical scope. Note that, it may turn out later that lexically scoped data types are a bad idea, in which case we may make them just global (though temporary) ... or maybe this won't be a problem at all ... still needs thinking.

mnt

This is a special user-updateable catalog which controls the mounting and unmounting of depots; it is minimalist and does little else besides that; most meta-data seen/updated here is specific to the Muldis::DB Engine in use.

foreign.*

Under here are all non-lexical user-defined dbvar (catalog and data) entities for which the current program is primarily viewed by users as a "DBMS client" or "persistent file-based embedded DBMS"; specifically, these are a cleaned-up current program perception of how the foreign program sees its own entities, which may or may not be the same as how the current program's naturalized version under nat.* is. Generally speaking, when the current program is reverse-engineering or scanning the remote program, or the disk files, the results of that appear under foreign.*, and not under nat.*; that's not to say that the current program can't subsequently update its nat.* catalog to match, but doing so is strictly optional, and typically done just by generic DBMS utility programs, rather than programs that do a specific job like payroll or genealogy. There is a distinct foreign.<depot>.* name-space for each connection to the other program, or for each disk file, which are considered by Muldis::DB as a depot. Note that there is no foreign.<depot>.type.* and foreign.<depot>.rtn.* and foreign.<depot>.data.*, as the current program may not invoke those under those names, though their descriptions are available under foreign.<depot>.cat; they are only invokable via nat.* perceptions of them. Note that, while cleaned up, a number of implementation-specific details may leak through here, possibly defined in a non-Muldis D language, if the peer program is not implemented itself by Muldis::DB and some of its concepts won't automatically express in Muldis::DB native terms, without user interpretation.

foreign.<depot>.cat

This is the probably reverse-engineered catalog of the foreign DBMS program, or disk file that this depot represents; it is directly user-updateable as much as that makes sense.

interp.*

Under here are Muldis::DB Engine specific mapping details that bridge between corresponding nat.cat.* and foreign.<depot>.cat entities, sometimes using non-Muldis D but Engine-specific language. These details interpret between the foreign entities and their native perceptions.

interp.<depot>.cat

This is a catalog, possibly reverse-engineered or possibly coded into the current program, that defines the mapping specs or routines to mediate a single nat.cat.* and foreign.<depot>.cat pair.

This documentation is pending.

CATALOGS

The Muldis::DB catalog relcons and relvars collectively reflect and/or control all entities in a DBMS. Given that the catalog provides complete descriptions of both the interface and implementation of each DBMS entity, for user-defined entities, and just the interface for system-defined entities, understanding these is akin to understanding the native grammar of Muldis D. This grammar is extremely simple by intention, but at a cost of being a little more verbose.

Muldis D has closely corresponding representations between its 3 main variants, which are catalog relations (what routines inside the DBMS see), hierarchical AST nodes (what the application driving the DBMS typically sees, ala Muldis::DB::AST), and string-form Muldis D code that users interacting with Muldis::DB via a shell interface would use. The string-form would be parsed into the AST, and the AST be flattened into the relations; similarly, the relations can be unflattened into the AST, and string-form code be generated from the AST if desired.

Third-party wrappers over Muldis::DB can provide additional interfaces for their users, such as a SQL DBMS emulator, and they take care of parsing or remapping that to Muldis D, most commonly its AST variant.

Catalog Relcons For System-Defined Entities

This section describes the structure of all cont.sys.<unq_name> catalog relcons, which themselves describe all system-defined DBMS entities in a computer-readable manner.

This documentation is pending.

Catalog Relvars For Depot Appearance Control

This section describes the structure of all cont.mnt.<unq_name> special catalog relvars, which reflect and control which depots are currently mounted in the DBMS. Users update these to open or close client-server DBMS engine connections, or to attach or detach file-based database files, create or delete the depots themselves, or associate, disassociate, create, or delete shared memory based depots, mount or unmount filesystem-based depots, etc. Updating these relvars has side-effects in making the entities belonging to a depot, named *.db.*, appear in or disappear from view. Details stored here include analogies to DSNs, database file names, DBMS server names and addresses, authentication details like login names and passwords. What details are stored per depot can vary significantly depending on which Muldis::DB Engine implements the DBMS, but this variance is limited to just cont.mnt.depot_detail. Note that it is forbidden to update any mnt relvars while a multi-statement transaction is active, because a transaction subjugates all entities concurrently visible or mounted in a DBMS, such that they must all commit or rollback as a unit.

This documentation is pending.

Catalog Relvars for User-Defined Entities

This section describes the structure of all cont.cat.app.<unq_name> and cont.cat.db.<depot>.<unq_name> general catalog relvars, the set of <unq_name> for each of which is identical, that reflect and control user-defined entities, including data types, routines, non-lexical variables (which are all relvars, real or virtual), state constraints, etc. Users update these to create or drop their relvars, data types, routines, constraints, etc. Updating these catalog relvars has side-effects in making global data relvars, named *.data.*, appear, disappear, or change in structure.

This documentation is pending.

SEE ALSO

Go to Muldis::DB for the majority of distribution-internal references, and Muldis::DB::SeeAlso for the majority of distribution-external references.

AUTHOR

Darren Duncan (perl@DarrenDuncan.net)

LICENSE AND COPYRIGHT

This file is part of the Muldis::DB framework.

Muldis::DB is Copyright © 2002-2007, Darren Duncan.

See the LICENSE AND COPYRIGHT of Muldis::DB for details.

ACKNOWLEDGEMENTS

The ACKNOWLEDGEMENTS in Muldis::DB apply to this file too.