NAME
Language::MuldisD::Basics - 10,000 Mile View of Muldis D
VERSION
This document is Language::MuldisD::Basics version 0.8.0.
PREFACE
This document is part of the Muldis D language specification, whose root document is Language::MuldisD; you should read that root document before you read this one, which provides subservient details.
DESCRIPTION
This document provides a 10,000 mile view of the Muldis D language. It provides the basics of how the language is designed and works, as a foundation upon which to understand the other parts of the language spec.
NOTES ON TERMINOLOGY
There are a few terms that the Muldis D documentation uses which may have different meanings than what you may be used to, so here are a few notes to clarify what they mean in this document. Similarly, there are some terms used in the industry that are expressly not used here so to help avoid confusion given what meaning is often attributed to them.
- type / data type
-
The term type as a noun always refers to a data type; the term is not used to indicate classifications of other things; eg, kind or other terms will be used for such instead, to avoid confusion. The terms class and domain are not used in this documentation to mean type.
- value, variable, constant
-
A value is unique, eternal, immutable, and is not fixed in time or space (it has no address). A variable is fixed in time and space (it does have an address); it holds an appearance of a value; it is neither unique nor eternal nor immutable in the general case. A constant is a variable which is defined to not mutate after initially being set. Terms like object are not used in this documentation for any aspects of Muldis D since their meaning in practice is both ambiguous and wide-reaching, and could refer to both values and variables depending on usage context.
- text, character
-
A text is a string composed of Unicode characters, where a character is an abstract concept that usually is a grapheme or language-independent grapheme, but could potentially be a codepoint or language-specific grapheme. This documentation only uses the term character in an abstract sense, and no part of the Muldis D API is defined using that term. Rather, any operators or constraints that work with sub-strings of text will be specified in terms like NFC grapheme.
- tuple
-
A tuple is an unordered heterogeneous collection of 0..N elements that are keyed by the element's name; each element is a name-value pair, and all names in the tuple are distinct. While tuple legitimately refers to the same thing as the Muldis D term sequence in other contexts, it does not in this documentation. Terms like record or row are not used in this documentation, the latter in particular because it implies ordered.
- relation, relvar, relcon
-
A relation is like an unordered homogeneous set of tuple where all member tuples have identical degree and name-sets, but that a relation data type knows what its allowed names are even if it has no tuples. Like with tuple, the term relation legitimately refers to a set or "ordered tuple" in other contexts, but it does not in this documentation. Terms like record set or row set or table are not used in this documentation, the last 2 in particular because they imply a significance to the order of tuples, where there is none in a relation. Moreover, the term domain does not mean the same thing as relation, and neither does the term function; those terms have distinct meanings here. Note that the term relvar is short hand for relation-typed variable, and relcon is short hand for relation-typed constant. Note also that a relational database is called that because it is composed of relations, and not just because its relations can be joined or be associated through foreign key constraints.
- function
-
A function is a routine whose invocation is used as a value expression, and it conceptually serves as a map between the domains of its parameters and its result value. A function is not the same as a relation, though both can be used as maps between values. Besides their conceptual difference in Muldis D as a routine vs a value, a selected relation value in Muldis D is always finite, and hence so is the cardinality of the map it can provide; whereas, a function can have an infinite map size.
- database / relational database, dbvar, dbcon
-
Within this documentation, the actually more generic term database will be used to refer exclusively to a relational database, so you should read the former as if it were the latter. A database is a tuple, all of whose (distinctly named) attributes are each relation-typed or database-typed (a recursion whose leaves are all relations); one holds all user data that is being maintained as an interconnected unit. A database-typed variable, aka a dbvar, is managed by a DBMS/RDBMS, and such is what is more informally referred to outside this documentation as a "database". Whenever a user is "using a database", they are reading or updating a dbvar. Examples of databases are genealogy records, financial records, and a CMS' data. A database is not a program. A database-typed constant is a dbcon.
- catalog
-
A catalog is a special kind of dbvar or dbcon whose relations hold meta-data about the normal databases that hold user data (and about themselves too); updating a catalog dbvar has the side-effect of changing the structure of the associated normal database. This meta-data describes all user-defined data types and operators, plus base and viewed relations, stored with and used with the database.
- depot / repository
-
A depot or repository is a local abstraction of a typically external storage system which holds 1 database variable and 1 associated catalog, plus perhaps other details that assist the mapping of the abstraction to the actuality.
- DBMS / RDBMS
-
Within this documentation, the actually more generic term DBMS will be used to refer exclusively to a RDBMS (Relational Database Management System), so you should read the former as if it were the latter. A RDBMS is a computer program that manages relational database variables, associated catalogs, and depots in general. Muldis D aspires to or does define one, and likewise are various other TTM-inspired programs like Rel and Duro; most other DBMS-like programs are technically non-relational, including all SQL DBMSs such as Oracle, PostgreSQL and SQLite, though they usually give lip-service to the relational data model and approximate a RDBMS to varying degrees.
- sequence
-
Within this documentation, a sequence generically refers to an ordered collection of 0..N elements. The term array is not used in this documentation because that word's actual meaning is more broad, and includes both matrices plus unordered collections of name-value pairs. Note that a sequence may be used simply to maintain a simple collection in order, though the actual order of its elements may not always be significant. Sometimes sequence also refers specifically to the
Seq
data type, which is a particular binary relation. - selector
-
A selector is a routine that captures an appearance of a value for use in a variable or expression. The term constructor is not used in this documentation because all values in Muldis D are conceptually eternal and immutable, so it does not make sense to say that we are "building" one; we are "selecting" one.
- fail
-
Within this documentation, if a routine is said to fail under some circumstance, such as with certain arguments, that can mean either or both of the routine throwing an exception at runtime, or failing to compile in the first place (which is a thrown exception at compile time); the latter is more likely to happen if the compiler can detect that certain arguments will always be unacceptable, and the former usually happens just if a problem can likely not be caught at compile time.
This documentation is pending.
INTERPRETATION OF THE RELATIONAL MODEL
The relational model of data is based on predicate logic and set theory.
The model assumes that all data is represented as mathematical N-ary relations, an N-ary relation being a subset of the cartesian product of N data types. Reasoning about such data is done in two-valued predicate logic, meaning there are 2 possible evaluations for each proposition, either true or false.
The basic relational building block is the data type, which can consist of either scalar values or values of more complex types. A tuple is an unordered set of attributes, each of which has a name and a declared data type; an attribute value is a specific valid value for the type of the attribute. An N-relation is defined as an unordered set of N-tuples, and the tuples comprise the body of the relation; the relation has a heading, which is a set of attribute definitions (their names and types); this heading is also the heading of each of its tuples.
A heading represents a predicate, and there is a one-to-one correspondence between the free variables of the predicate and the attribute names of the heading. The body of a relation represents the set of true propositions that can be formed from the predicate represented by the relation's heading. The body of a tuple with the same heading provides attribute values to instantiate the predicate into a proposition by substituting each of its free variables. When a tuple appears in a relation body, the proposition it represents is deemed to be true. Contrariwise, for every tuple whose heading is the same as the relation's but does not appear in the relation body, its proposition is deemed to be false. This assumption is known as the closed world assumption.
The relational model specifies that data is operated on by means of a relational calculus or a relational algebra. These 2 are logically equivalent; for any expression in the relational calculus, there is an equivalent one in the relational algebra, and vice versa. Relational algebra, an offshoot of first-order logic, is a set of relations closed under operators; each operator takes N relations as arguments and results in a relation. While the relational algebra provides a more procedural way for specifying database queries, in contrast the relational calculus provides a more declarative way for specifying queries.
Mechanics of Some Relational Operations
This documentation section takes a very informal (and possibly blatantly incorrect) alternate approach to describing the nature of relations, tuples, and attributes, within the context of explaining the mechanics of how some relational operations work in practice.
Herein, we shall conceptualize a relation as a long boolean expression, consisting of a string of basic boolean-valued expressions that are selectively anded or ored together. A basic boolean-valued expression, <attr>
, takes the form attribute <name> is <value>
. Each tuple body, <tuple>
, in the relation takes the form of a chained and
that connects N <attr>
, one per each attribute in the relation, and each having a distinct <name>
. The relation body takes the form of a chained or
that connects N <tuple>
, one per each tuple in the relation, and each <tuple>
has the same set of <name>
as the others, but the set of <value>
that each <tuple>
has is distinct.
Take, for example, a relation having some details about people, where each attribute is a type of detail and each tuple has details for one person:
name is John and age is 32 and city is Vancouver
or name is Andy and age is 46 and city is Toronto
or name is Julia and age is 27 and city is Halifax
etc...
Or a multi-relation example involving suppliers, foods, and shipments:
farm is Hodgesons and country is Canada
or farm is Beckers and country is England
or farm is Wickets and country is Canada
food is Bananas and colour is yellow
or food is Carrots and colour is orange
or food is Oranges and colour is orange
or food is Kiwis and colour is green
or food is Lemons and colour is yellow
farm is Hodgesons and food is Kiwis and qty is 100
or farm is Hodgesons and food is Lemons and qty is 130
or farm is Hodgesons and food is Oranges and qty is 10
or farm is Hodgesons and food is Carrots and qty is 50
or farm is Beckers and food is Carrots and qty is 90
or farm is Beckers and food is Bananas and qty is 120
or farm is Wickets and food is Lemons and qty is 30
Now a very simple pair of relations:
x is 4 and y is 7
or x is 3 and y is 2
y is 5 and z is 6
or y is 2 and z is 1
or y is 2 and z is 4
So now will be briefly introduced a few common fundamental relational operations, that are projection, join, union.
A projection of a relation derives a relation that has a subset of the original's attributes, and all of its tuples. Continuing the boolean expression analogy, the projected relation contains fewer and <attr>
than the original. For example, lets take the projection of the food
column from the shipments relation, to get, initially:
food is Kiwis
or food is Lemons
or food is Oranges
or food is Carrots
or food is Carrots
or food is Bananas
or food is Lemons
Now, the above expression can be simplified because it now contains redundancies, and the simplified version is logically identical:
food is Kiwis
or food is Lemons
or food is Oranges
or food is Carrots
or food is Bananas
So this projected relation has 5 tuples rather than the original 7, and saving logical redundancy is why relations never have duplicate tuples.
A join of 2 relations derives a relation that has all of the originals' attributes, and its set of tuples is fundamentally the cartesian product of those of the originals. Following our boolean analogy, we start off by pairwise connecting instances of every <tuple>
of the first relation with instances of every <tuple>
of the second one, with the members of each pair then being chained together with and
to form a single, longer chain of and
. Note that join is commutative, so it doesn't matter which of the source relations is first or second, the result is the same, as much as foo and bar
is the same as bar and foo
. For example, lets do a join of our 2 simplest relations:
x is 4 and y is 7 and y is 5 and z is 6
or x is 4 and y is 7 and y is 2 and z is 1
or x is 4 and y is 7 and y is 2 and z is 4
or x is 3 and y is 2 and y is 5 and z is 6
or x is 3 and y is 2 and y is 2 and z is 1
or x is 3 and y is 2 and y is 2 and z is 4
Now, when multiple relations are connected into one such as with a join, the relational model assumes that if either of the sources have attributes with the same names as each other, then they are both describing the same things. In this case, the references to attribute y
from both relations are talking about the same y
. And so, any result tuples that contradict themselves, saying that y
equals both one value and equals a different one, can't ever be true and are eliminated; only the tuples where the y
value is identical are kept:
x is 3 and y is 2 and y is 2 and z is 1
or x is 3 and y is 2 and y is 2 and z is 4
Moreover, this expression can be simplified by removing the redundant y
attribute:
x is 3 and y is 2 and z is 1
or x is 3 and y is 2 and z is 4
All attributes in a relation have distinct names. And if there were any identical tuples, the redundant ones would be eliminated.
A join operation has several trivializing scenarios. If the 2 source relations have no attribute names in common, the result is simply the cartesian product. If the 2 sources have all their attribute names in common, the result is the common subset or intersection of their existing sets of tuples. If one source has all the attributes of the other, but the reverse isn't true, then the result is a subset of tuples from the relation that has more attributes; this is a semijoin.
A union of 2 relations, which requires that the 2 relations have the same headings, derives another relation with the same heading, and a union of the two's set of tuples as its body, with any duplicates eliminated. In terms of our boolean analogy, a union is simply chaining together the entirety of each relation's boolean expression with an or
, and then eliminating redundancies from the result.
A full list of all the relational operators having more formal (but Muldis D specific) descriptions occurs in the Language::MuldisD::Core document; that list does not use the aforementioned boolean analogies.
MULDIS D
Muldis D is a computationally / Turing complete (and industrial strength) high-level programming language with fully integrated database functionality; you can use it to define, query, and update relational databases. The language's paradigm is a mixture of declarative, functional, imperative, and object-oriented. It is primarily focused on providing reliability, consistency, portability, and ease of use and extension. (Logically, speed of execution can not be declared as a Muldis D quality because such a quality belongs to an implementation alone; however, the language should lend itself to making fast implementations.)
The language is rigorously defined and requires users to be explicit, which leaves little room for ambiguity and related bugs. When something is specified in Muldis D, its semantics should be well known and fully portable (not implementation dependent). If a conforming implementation (usually a Muldis DB Engine class) can't provide a specified behaviour, code using it will refuse to run at all, rather than silently changing its semantics; this also helps users to avoid bugs. Moreover, Muldis D generally disallows any details of an implementation's "physical representation" or other internals to leak through into the language; eg, there is no "varchar" vs "char", simply "text". Users should not have to know about this level of detail, and implementers should be free to adaptively pick optimum ways to satisfy user requests, and change later.
Muldis D, being first and foremost a data processing language, provides a thorough means to both introspect and define all DBMS entities using just data processing operators, which is called the DBMS "catalog". The catalog is a set of system-defined relvars (relation-typed variables) which reflect the definitions of DBMS entities; users can generally update these to create, alter, or drop DBMS entities. In fact, updating the catalog relvars is the fundamental way to do data-definition tasks in Muldis D, and any other provisions for data-definition are conceptually abstractions of this. Generally speaking, users can do absolutely everything in the DBMS with just data querying and updating operations.
The design and various features of Muldis D go a long way to help both its users and implementers alike. A lot of flexibility is afforded to implementers of the language to be adaptive to changing constraints of their environment and deliver efficient solutions. This also makes things a lot easier for users of the language because they can focus on the meaning of their data rather than worrying about implementation details; users can focus on defining what needs to be accomplished rather than how to accomplish that, which relieves burdens on their creativity, and saves them time. In short, this system improves everyone's lives.
What users fundamentally write are Muldis D "routines", each consisting of one or more "statements", and in executing these, all work is done.
Representation
Muldis D has 2 closely corresponding main representation formats, which are called Concrete Muldis D and Abstract Muldis D; these are analogous to the natural code strings of a typical programming language, and the abstract syntax trees that they naturally parse into, respectively.
Concrete Muldis D is the natural form that one would code in if they were writing a self-contained application (or component) in Muldis D which was compiled using a separate process into its own executable (or library), which includes situations where Muldis D is its own Parrot (http://www.parrotcode.org/) hosted language (a prospect which is desired to be implemented in the near future). Concrete Muldis D would also be used by an interactive shell interface over the Muldis DB (specifically Muldis::DB::Interface) implementation of Muldis D, when users submit commands at runtime, or in any other situation where it makes sense to take input in that form.
Abstract Muldis D is the natural form that one would code in if they were primarily writing their application in a separate host language, such as Perl, and any Muldis D code was being specified in terms of host language code, such as Perl arrays, hashes, and scalars. Abstract Muldis D code consists of quasi-hierarchical but actually relational collection values, typically catalog tuples. Abstract Muldis D is the only representation format used by the API the Muldis DB (specifically Muldis::DB::Interface) implementation of Muldis D, and is what any Perl code typically should be using. When generating Muldis D code from arbitrary Perl data structures (which includes the work of, eg, SQL DBMS emulators), the Abstract form is the easiest to use and the least error prone since no values have to be escaped or stitched together as strings, which prevents many injection security holes. Abstract Muldis D is also what is used when Muldis D code is defined to generate/prepare and execute other Muldis D code at runtime (by reading or updating the meta-model / system catalog), which is "data definition".
See Language::MuldisD::Core first for details of the Muldis D meta-model, which is also the grammar of Abstract Muldis D; see Language::MuldisD::Grammar for the grammar of Concrete Muldis D; the latter document says how to parse Concrete Muldis D into Abstract Muldis D; the former document explains the meaning of both in terms of the Abstract; see Language::MuldisD::PerlHosted for Perl Hosted Abstract Muldis D.
TYPE SYSTEM
The Muldis D type system is a formal type system, at least in intent, and works conceptually in the following manner.
There is a single universal value set/domain, named Universal
, whose members are all the values that can possibly exist; Universal
is the maximal data type of the entire type system. Also there is a single nullary value set/domain, named Empty
, which has zero members; Empty
is the minimal data type.
All Muldis D data values as individuals are eternal and immutable. All values are logically distinct, and each value occurs exactly once, and is not fixed within time or space (so doesn't have an "address"). It does not make sense to say that you are creating or destroying or copying or mutating a value. However, an eternal immutable value can make an appearance within a variable, as a variable is a named/addressable container that is fixed within time and space, and it can be created, destroyed, mutated, and multiple variables can hold appearances of the same value. So when one appears to be testing 2 values for equality, they are actually testing whether 2 value appearances are in fact the same value.
Given that all data values in Muldis D are fundamentally immutable, the term "selector" is used to describe a routine that captures an appearance of a value into a variable (or for use in a value expression); this is analogous to the task that a "constructor" routine does in a typical object-oriented language, but that the former is conceptually "selecting" an eternally existing value rather than conceptually "creating" a new one.
In the Muldis D type system, a data type is a set of values, and as with individual values, a data type is eternal and immutable. Every data type is distinct from all other data types, and no 2 data types may encompass exactly the same set of values. Every data type other than Universal
and Empty
has at least 1 member value, and at most 1 less value than the universal set. If 2 data types have no values in common, they are said to be disjoint.
Given 2 arbitrary data types, T1 and T2, T1 is called a supertype of T2 if its value set is a superset of that of T2, and in that situation, T2 is a subtype of T1, as its value set is a subset of that of T1. Note that every type includes itself as its own supertype and subtype, in which case, the T1 and T2 of the previous example are the same type. By contrast, if T1 and T2 are explicitly different types but otherwise have that relationship, then T1 has at least 1 value that T2 doesn't have, in which case T1 is also called a proper supertype of T2, and T2 is also called a proper subtype of T1. Given those last examples, T1 is a more general type, and T2 is a more specific type. In this way, the system-defined Universal
type is a proper supertype of all other types, and the system-defined Empty
type is a proper subtype of all other types. Now, if no data type, T3 exists which is both a proper subtype of T1 and a proper supertype of T2, then T1 is an immediate supertype of T2, and T2 is an immediate subtype of T1. Note that the Muldis D type system supports multiple inheritance, so types can form a lattice rather than a tree.
Every value has at most a single most specific type (or MST), which is cited as the general answer to the question "what is this value's type". The MST of a value is the data type containing that value which has no proper subtypes that also contain that value. Moreover, to enforce the "at most a single" requirement, which keeps answering the question a simple affair, it is mandatory in Muldis D that when any 2 data types have values in common, there must exist a data type which contains only the values that they have in common, and hence is a subtype of both. Note that a value will always implicitly assume the most specific type that exists which contains it, even if a selector for a less specific type was explicitly used to select it.
A union type is a data type that has at least 2 immediate subtypes, and every one of its values is also a value of an immediate subtype; that is, the MST of every value in a union type is not that type. An intersection type is a data type that has at least 2 immediate supertypes. In this way, Universal
is a union type of all other types, and Empty
is an intersection type of all other types.
A difference type is a data type that has exactly 1 immediate supertype, and that supertype is a union type such that the difference type and another peer subtype of that union type are complementary with respect to the union type; every union type value is in either the difference type or its complement, but not both. An exclusion type is like a union type except that it only consists of the values that are members of exactly an odd number of its immediate subtypes. A negation type is a type that consists of only the values that aren't members of a single other type; it is like a difference type where the common supertype is Universal
.
A root type is a data type for which all of its values can be selected by the same single selector, and which has no proper supertype that is a root type. All root types are mutually disjoint, so every value is a member of exactly one root type. Generally speaking, root types are the implementational foundation over which all operators and all other types are built, and the declared parameter and result types of most system-defined operators are root types. The 6 most important system-defined root types are: Bool
, Int
, Blob
, Text
, Tuple
, Relation
. All user-defined root types are scalar types that are defined not in terms of other types except for that any components of their possreps (possible representations) have declared types. Perhaps it should be said that all root types are defined by this last sentence? A leaf type is a data type that has no proper subtypes save for Empty
.
A complete type is a data type that is fully defined, and for which it would be possible to have values that are of just that data type, if it didn't have proper subtypes. An incomplete type or parameterized type is a data type that is not fully defined, but serves as a template by which complete types can be defined; there can never be values that are just of a parameterized type. The most important complete types are Bool
, Int
, Blob
, Text
; the most important incomplete types are Tuple
, Relation
. For that matter, any implicit supertypes such as Universal
and Scalar
could be considered incomplete types, but that they are not parameterized.
Type Identification
All values in the Muldis D type system are broadly categorized into 5 complementary sets called scalar values, tuple values, relation values, quasi-tuple values, and quasi-relation values; tuple and relation values are collectively known as nonscalar values; quasi-tuple and quasi-relation values are collectively known as quasi-nonscalar values. The type system has the system-defined data types named Scalar
, Tuple
, Relation
, QuasiTuple
, and QuasiRelation
, which serve as maximal data types for each category, respectively. The 5 types are all mutually disjoint, and Universal
is a union type over all of them.
Most data types each consist exclusively of values from exactly one of the above 5 categories, and each such type does not include values from several of them. Therefore, every such data type is said to be either a scalar type, a tuple type, a relation type, a quasi-tuple type, or a quasi-relation type, depending which category all of its values come from. In similar fashion, a nonscalar type is generally any type that is not a scalar type, if we ignored quasi-nonscalar types, meaning it is either a tuple type or a relation type.
A remnant type is any type having at least 2 values, where at least 2 of the values are not allowed to be in a same single type of one of the other 5 categories, according to their type definition rules. The remnant category is the complement category to all the others in that every possible proper subset of the values of Universal
can now be represented by a type that fits in one of the 6 categories, save Empty
itself.
The identity of every scalar type is defined by its name alone, and every scalar type must have a distinct name that is explicitly defined, either by the system or by the user as is applicable. Every value of a scalar type is conceptually opaque and atomic, and its components are not known to users of that type; but even when the components are known (because they are user-defined structured types), two independently defined scalar types are completely disjoint even if their components look the same, by definition. The only way for 2 scalar types to have values in common is if one is explicitly defined, directly or indirectly, as a subtype of, or as a union type encompassing a subtype of, the other.
Every value of a nonscalar type (either a tuple type or a relation type, respectively) is conceptually transparent, and its component structure is known to all. The identity of every nonscalar type is defined by its component structure alone, and every nonscalar type must have a distinct component structure. Any two nonscalar types that have the same component structure are in fact the same type, by definition, regardless of whether they were defined independently of each other or not.
A quasi-nonscalar type is the same as a nonscalar type as far as the means of identifying it go (by its structure, not by its name), but that particular kinds of components are permitted in quasi-nonscalar types that aren't permitted in nonscalar types (and aren't permitted in scalar types).
A remnant type is always defined in terms of one or more other types, and it can never be a root type with defined components. The identity of every remnant type is defined only in terms of it being, directly or indirectly, a union or negation of other non-remnant types. As per with nonscalars, several independently defined remnant types can be considered the same one.
To keep things simpler, every data type in Muldis D has a name by which it is referenced, even nonscalar and quasi-nonscalar types; however, the names of types that are not scalar types are simply convenient aliases for their true identities, which are their structures (the convenience allows various Muldis D catalog features to be designed and implemented more easily).
Scalar Types
Scalar types are the only conceptually encapsulated types in Muldis D, and are like other languages' concepts of object classes where all their attributes are private, and only accessible indirectly. The definition of a scalar type comprises usually one or more named possreps or possible representations, and for each of those, at least one selector operator and usually at least one accessor or the operator.
A possrep of a type is an exhaustively complete means for users to conceptualize the structure of the type; it is like a "role" or "interface definition. A possrep has the appearance of a complete collection of (zero or more) named object attributes (of any scalar or nonscalar type) that the type could logically be implemented as, and users can use it as if it actually was implemented that way, but without the requirement that the type actually is implemented that way. If a type has multiple possreps, said possreps can differ from each other in arbitrarily large ways, but every one is individually capable of representing all of the type's values; any possrep could be used exclusively by a user when they work with its type, without diminishing what they can do. A single possrep is specific to one and only one type, so it is possible to refer to a type by simply referring to the name of one of its possreps.
Taking for example an integer data type, one of its possreps could represent an integer value as a string of binary digits, while another possrep could represent an integer value as a string of decimal digits. Or taking for example a temporal data type, one of its possreps could represent a date as an ISO 8601 formatted character string in the Gregorian calendar, and another possrep could represent it as a number of seconds since the UNIX epoch. Or taking for example a spacial data type that is a rectangle, one possrep could specify the 4 vertices as 4 (or 3) point values, and another possrep could specify fewer vertices and also specify the rectangle's width and height as numeric values.
A possrep additionally has a defined boolean-valued constraint expression (which is simply true in the trivial case), that restricts what values the possrep components can have within the context of their fellows. Taking for example a "medium polygon" data type, there could be a constraint that the area of the polygon is between 5 and 10 units.
Each possrep comprises exactly one selector operator whose named parameter set exactly matches that possrep's set of named attributes, and you select a value of the type by invoking the selector with a full set of values for the possible attributes. Each possrep also comprises an accessor operator for each of its attributes, with which users can extract the possible attribute's value.
No data type has any operators built-in to its definition except for the aforementioned selectors and accessors. All other operators that are used with a data type are expressly not built-in to the type (even if they are system-defined); the other operators do not have any access to the data type's internals, and must be defined (directly or indirectly) in terms of (that is, layered on top of) the few that are built-in, though the built-ins from any or all possreps of the type can be utilized.
With a user-defined scalar type, if the type is to have multiple possreps, then just one possrep is defined as the fundamental one, and the other possreps are defined in terms of the first, by which means the mappings between them is done. The type-defining user can later come back and redefine the type if they wish, using a different possrep as the fundamental, but assuming the redefinition has all the same values, non-defining users of the type won't know any different.
The Muldis D implementation can choose for itself as to how the scalar type is physically represented behind the scenes, either picking between any of the user-provided possreps (assuming enough information is present to derive all needed inverse functions as applicable) or using yet another one or several of its own; the implementation can work how it knows best to achieve an efficient system, and this is all hidden away from the users, who simply perceive in it what they requested.
In the context of scalar subtype/supertype relationships, the definition of a subtype can add additional possreps that are only valid for the subtype, such that users of the subtype can use both possreps defined for the subtype and the supertype, but users of the supertype can only use the possreps for the supertype, and not the subtype. Taking for example the data types of rectangle and square, the latter is a subtype of the former; a possrep for a rectangle in general comprises its center point as well as its width and its height, which also works for a square; an additional possrep that just works for a square rather than a rectangle in general comprises a center point plus its length.
As a corollary to this, all union types have none of the possreps defined by their subtypes. So the system-defined Scalar
type has no possreps at all, and hence has no selectors or accessors defined for it.
Tuple Types and Relation Types
Tuple types are the fundamental heterogeneous conceptually non-encapsulated collection types in Muldis D, and are like the Pascal language's concept of a record, or the C language's concept of a struct. The definition of a tuple type comprises a set of zero or more named attributes of any scalar or nonscalar type. This set definition is called the tuple's heading.
Relation types are the fundamental homogeneous conceptually non-encapsulated collection types in Muldis D, and are like other languages' concepts of sets (or arrays where all elements are distinct), but restricted in that all elements are tuples. The definition of a relation type looks exactly like the definition of a tuple type (such that a relation has a heading even if it has no tuples), but that the definition defines every tuple in the relation, and also but that relation types can additionally have keys defined which indicate that a subset of its attributes' values are distinct between all tuples in the relation.
Generic selector and accessor operators exist that work with all tuple and relation types, so they do not need to be defined per such type.
The system-defined types Tuple
and Relation
(and their system-defined subtypes) are technically generic factory types, such that they themselves do not define any attribute sets, and are supertypes of all tuple and relation types that do. Beyond this special case, a pair of tuple or relation types can only have a subtype/supertype relationship if they have compatible headings, which means the attribute sets are of the same degree, the attribute names are identical, and the name-wise corresponding attributes in each heading have a valid subtype/supertype relationship; each attribute of a tuple or relation subtype is a subtype of the same-named attribute of the tuple or relation supertype.
Quasi-Tuple Types and Quasi-Relation Types
The union types Universal
, Tuple
, Relation
(and the system-defined subtypes of the latter 2), and remnant types, can be used as the declared types of such as variables and routine parameters, but they can not be used as the declared types of scalar possrep or nonscalar (tuple or relation) attributes. The declared type of each of the latter must be either a scalar type, or a specific tuple or relation subtype (meaning tuple or relation types that have specific attribute sets defined for them).
If all data types were scalar or nonscalar, then it would not be possible to define operators with N-ary parameters whose declared types are any of the aforementioned 3 union types plus the remnant types. That is, an N-ary parameter is usually relation-typed, such that the multiplicity of values that the parameter can take are each provided as a tuple of said relation; however, as relation attributes can not have said union types as their declared types, it would not be possible to implement an N-ary relational join operator, for example, since each relation being joined would probably have a different heading than the others.
Quasi-tuple types and quasi-relation types exist as a solution to this problem, such that the quasi-heading of one is allowed to include attributes whose declared types are any type at all, including the union types Universal
, Tuple
, Relation
, QuasiTuple
, QuasiRelation
, and subtypes of tuple and relation without specific attribute sets, and remnant types.
This said, the situations in which quasi-nonscalar types may be used are limited; only quasi-nonscalar types may have quasi-nonscalar types as components; scalar and nonscalar types may not.
Also, quasi-nonscalar types only have defined for them a subset of corresponding nonscalar type operators, partly because the former are not intended to replace the latter for the majority of use cases, and partly because some of them are simply impossible to implement for quasi-nonscalars: unwrap
, ungroup
.
Remnant Types
Generally speaking, a remnant type is what results when one defines a union type over 2 other types whose values are mutually incompatible for use in the same relation attribute, such as 2 relation types with different degrees or attribute names. Generally speaking, a remnant type is the declared type of each attribute of a quasi-nonscalar type, when said attribute isn't one of the special system-defined maximal types. A remnant type also results from defining a negation type over a non-remnant type.
Finite Types and Infinite Types
A finite type is a data type whose cardinality (count of member values) is known to be finite, and this cardinality can be deterministically computed; moreover, every value of a finite type can be represented somehow using a finite amount of memory. This doesn't exclude the possibility that either the cardinality or individual values are larger than present-day computing hardware can handle, but even if so, they could be handled by sufficiently larger but finite resources. An infinite type is a data type that is not a finite type; its cardinality is either known to be infinity, or it is unknown.
Generally speaking, all finite types are defined either as an explicit enumeration of values (for example, the boolean type, which has exactly 2 values), or they are scalar types whose possreps have zero attributes (each one is a singleton, having exactly 1 value), or they are the tuple or relation type that has zero attributes (which has exactly 1 or 2 values, respectively), or their values are all discrete and fall into a closed range (for example, a type comprising the range of integers between 1 and 100, or a type comprising all real numbers in the same range that have a granularity of 0.001, or any IEEE floating point number of a specific bit length), or their values are length-constrained strings of finite-cardinality elements (for example, a character string that is not longer than 250 characters), or they are composite scalar or nonscalar or quasi-nonscalar types whose attributes are all of finite types themselves (for example, a type whose attributes are all Bool
).
Generally speaking, all infinite types are defined either as being some open-ended natural domain (for example, the type having all integers, or the type having all prime numbers), or they are some continuous domain, whether open-ended or not (for example, the type having all real or complex numbers between 1 and 100), or they are non-length-constrained strings (for example, the set of all possible text strings), or they are composite scalar or nonscalar or quasi-nonscalar types which have at least one attribute which is itself infinite (for example, a type that has an Int attribute).
The system-defined root type Bool
is finite (2 values), as is the Empty
type (zero values), while all of the other 5 most important system-defined root types (Int
, Blob
, Text
, Tuple
, Relation
) are infinite, as are the Universal
, Scalar
, QuasiTuple
, QuasiRelation
types.
All proper subtypes of finite types are themselves finite types. Proper subtypes of infinite types can be either finite or infinite depending on how they are defined. For example, a subtype of Int
whose numbers are all simply greater than 10 is infinite, but a subtype whose numbers are additionally all less than 1000 is finite. The documentation for individual system-defined data types, further below, specifies whether each of which is finite or infinite, and in the latter case, it states a most generic means to specify a finite subtype.
Note that, while it is not mandated by the language, some Muldis D implementations may legitimately choose to impose restrictions on their users such that the declared types of all persisting variables must be of finite types only.
For example that all persisting Text
types must have a maximum allowed length in characters specified, or that all persisting Int
types must have a least and greatest allowed value specified. This would typically happen if the implementation needs to use fixed-size fields internally, such as 32-bit integers, and it is not practical to support the possibility that a value could be of any size at all (this is often the case with SQL databases implemented in C).
On the other hand, some implementations may natively support unlimited size values, such as those written in Perl, and so these can allow persisting the plain Text
or Int
types, which can make things less complicated for their users.
Of course, even with implementations that require finite types, this isn't to say that the declared type can't be a very large finite type, but then the implementation can choose to use, for example, either a machine native integer or a string of digits behind the scenes for all values of the type, and can do this deterministically, depending what constraint the type defining user chose.
Universal Implicit Operators
Muldis D is universally polymorphic to at least a small degree, such that every data type without exception has both an assign
update operator (for assigning a value of that type to a variable of that type) and an is_equal
function for testing 2 values of that type for equality (as well as is_not_equal
, for inequality). Moreover, these operators exist implicitly, so when one defines the initial possrep of a new type, they get those operators for the type at no extra cost.
This documentation is pending.
ENVIRONMENT
The Muldis D DBMS / virtual machine, which by definition is the environment in which Muldis D executes, conceptually resembles a hardware PC, having a command processor (CPU), standard user input and output channel, persistent read-only memory (ROM), volatile read-write memory (RAM), and persistent read-write disk or network storage.
Within this analogy, the role of the PC's user, that feeds it through standard input and accepts its standard output, is fulfilled by the application that is driving the Muldis D DBMS; similarly, the application itself will activate the virtual machine when wanting to use it (done in this distribution by instantiating a new Muldis::DB::Interface::DBMS
object), and deactivate the virtual machine when done (letting that object expire).
When a new virtual machine is activated, the virtual machine has a default state where the CPU is ready to accept user-input commands to process, and there is a built-in (to the ROM) set of system-defined entities (data types, operators, variables, etc) which are ready to be used to define or be invoked by said user-input commands; the RAM starts out effectively empty and the persistent disk or network storage is ignored.
Following this activation, the virtual machine is mostly idle except when executing Muldis D commands that it receives via the standard input (done in this distribution by invoking methods on the DBMS object). The virtual machine effectively handles just one command at a time, and executes each separately and in the order received; any results or side-effects of each command provide a context for the next command.
At some point in time, as the result of appropriate commands, data repositories, or "depots" (either newly created or previously existing) that live in the persistent disk or network storage will be mounted within the virtual machine, at which point subsequent commands can read or update them, then later unmount them when done. Speaking in the terms of a typical database access solution like the Perl DBI, this mounting and unmounting of a repository usually corresponds to connecting to and disconnecting from a database. Speaking in the terms of a typical disk file system, this is mounting or unmounting a logical volume.
Any mounted persistent depot, as well as the temporary "application" depot which is most of the conceptual PC's RAM, is home to all user-defined data variables, data types, operators, constraints, packages, and routines; they collectively are the database that the Muldis D DBMS is managing. Most commands against the DBMS would typically involve reading and updating the data variables, which in typical database terms is performing queries and data manipulation. Much less frequently, you would also see "data definition" changes, namely what user-defined variables, types, etceteras exist, done fundamentally by data-updating special system-defined "catalog" variables. Any updates to a persistent depot will usually last between multiple activations of the virtual machine, while any updates to the temporary "application" depot are lost when the machine deactivates.
All virtual machine commands are subject to a collection of both system-defined and user-defined constraints (also known as business rules), which are always active over the period that they are defined. The constraints restrict what state the database can be in, and any commands which would cause the constraints to be violated will fail; this mechanism is a large part of what makes the Muldis D DBMS a reliable modeler of anything in reality, since it only stores values that are reasonable.
Note that in practice, the aforementioned concept of "commands" is realized by "statements" (which are grouped into "routines").
ROUTINES
There are several kinds of Muldis D routines, each of which is intended for, and in many cases only permitted to be used for, particular tasks. Note that for all Muldis D routines which have parameters, they are all named rather than positional parameters; in the case of N-ary routines, the N similar argument values come by way of a single nonscalar (or, if necessary, quasi-nonscalar) typed parameter.
They following hierarchy should briefly illustrate how the kinds of routines are similar or dissimilar, but it expressly does not indicate substitutability:
routine
functional
function
inner_function
type_constraint
transition_constraint
imperative
deterministic
update_operator
inner_update_operator
nondeterministic
system_service
procedure
inner_procedure
main
Specifically, the routine kinds are all of the leaf nodes in the above hierarchy, and every Muldis D routine is designated as exactly one of those; the non-leaf nodes are not routine kinds.
Note that, in an environment where Muldis D is being hosted under another language, the other language may only directly invoke these 4 kinds of routines: function
, update_operator
, system_service
, procedure
.
function
-
A
function
is an explicitly invokable read-only operator whose invocation both results in and represents a value of a specific data type (that is the function's result type or declared type; this invocation can only exist as part of a value-expression of another routine; the body of a function is also itself a single value-expression (though its parts can be named for internal reuse). Afunction
is pure and deterministic in the functional-language sense, such that all of its 0..N parameters are read-only / not subject to update, it has no lexical variables at all, and that it can only see its own parameters, if it has any; it can not see any global variables of any kind, and that it can only invokefunction
(and localinner_function
) routines. Afunction
invocation is trivially atomic, since it doesn't conceptually update anything. The vast majority of invokable system-defined routines arefunction
; they include all value selectors, and the typical numeric, string, and relational operators, such that you would compose a typical database "select" query out of. inner_function
-
A
inner_function
is the same as afunction
, but that it is quasi-lexically scoped within another, non-lexical routine, and it is only visible within or invokable by either its parent routine or its siblinginner_function
. Conceptually speaking, ainner_function
is part of the definition of the body of the parent routine (like a value expression in general), but is isolated into a namedfunction
-like entity for technical / language design reasons. type_constraint
-
A
type_constraint
is the same as ainner_function
, but that it is part of the definition of a data type (every data type composes exactly one of these) rather than a routine, it is invoked automatically by the DBMS when a value of that type is being selected, and it always results in aBool
. The parameters of atype_constraint
carry information about the value selection attempt, and thetype_constraint
results in eitherBool:true
if the described value would be a member of the data type, orBool:false
if not; in the latter case, the DBMS would then throw a type-constraint-violation exception (resulting in a transaction rollback where applicable), or in the former case, it would consider the selection a success. If the data type being selected of is defined as a scalar root type, then the parameter list matches its initial/core possrep's component attributes list (their names and declared types), and the arguments are the candidate values for those attributes; or, if the data type being selected of is defined as a restriction of one other data type, then the parameter list has exactly one parameter whose name istopic
and whose declared type is that other data type. Note that, because Muldis D requires dbvars to be defined over named data types, all state constraints for a database, including uniqueness keys or foreign keys or other state-constraining business rules, are normally defined as thetype_constraint
for the type which that database is. Conceptually speaking, atype_constraint
will execute as the beginning part of a statement, prior to any attempt to update any variable's state or affect the environment. transition_constraint
-
A
transition_constraint
is the same as atype_constraint
except that it is part of the definition of a variable rather than of a data type, and it is invoked automatically by the DBMS when that variable is being updated. Atransition_constraint
takes exactly 2 parameters, whose names arebefore
andafter
, and whose declared types are both the same as that of the variable; it returnsBool:true
if the variable is allowed (according to current business rules) to transition directly from thebefore
state to theafter
state, orBool:false
if not; in the latter case, the DBMS would then throw a transition-constraint-violation exception (resulting in a transaction rollback of at least the statement that attempted the update), or in the former case, it would consider the update a success (barring other causes for failure). Conceptually speaking, atransition_constraint
will execute as the ending part of a statement, right at the moment of trying to update any variable's state, with the result of a value expression or otherwise; in the case of a multi-update statement, all the updates would happen simultaneously, so a transition failure for any update would prevent all that statement's updates from occurring. update_operator
-
An
update_operator
is the same as afunction
, but that it is imperative rather than functional (its invocation does not result in or represent a value, and it is invoked as the root part of another routine's statement rather than in an expression), and it has at least one parameter which is subject to update; any result that it produces is returned by updating said parameters. The body of an updater is a single statement (plus any support expressions) that invokes one or moreupdate_operator
(recursively down to some system-defined variable assignment operator); if invoking several, it is a multi-update statement. Anupdate_operator
can only invokefunction
andupdate_operator
(and localinner_(function|update_operator)
) routines. Despite being imperative, anupdate_operator
has no lexical variables save for its subject-to-update parameters, and it just assigns to those as its last/only action; but like functions, it does have named expression nodes whose use can ease program writing like lexical variables would have. Anupdate_operator
invocation is implicitly atomic, and a failure in the middle of one will at least rollback any partial update that it may conceptually have done. Most invokable system-defined imperative routines areupdate_operator
; they include allassign
operators, plus some relational-assignment short-hands such as "assign_insert", "assign_update", "assign_delete". inner_update_operator
-
A
inner_update_operator
is to anupdate_operator
what ainner_function
is to afunction
. system_service
-
A
system_service
is an explicitly invokable system-defined procedure with 0..N parameters that can reach outside of the deterministic DBMS environment in order to do non-deterministic things (besides working with depots), such as to initiate I/O of various kinds, or fetch the current date and time, or generate a random number. Given the nature of this beast, users can not define their ownsystem_service
routines but by updating the Muldis D implementation's source code itself. Invoking asystem_service
can have side-effects outside of the DBMS, but it will not alter anything inside the DBMS aside from any of its subject-to-update parameters. procedure
-
A
procedure
is an explicitly invokable routine with 0..N parameters that can directly see and update global variables (both catalog and data), and is generally the only kind of routine that can; every call chain that is meant to work with a persisting (global) dbvar must generally include aprocedure
invocation. The body of aprocedure
consists of 0..N statements which conceptually run in sequence (not concurrently). Aprocedure
can invoke every kind of explicitly invokable routine. Aprocedure
invocation is not implicitly atomic; unless a wider-scope explicit transaction is active, an abortedprocedure
will leave an incomplete update (though not one that violates any constraints or leaves the system in an inconsistant state), because each of its statements had conceptually auto-committed; so Muldis D does support batch operations where partial completion or interruptability is acceptable. A procedure can define explicit (lexically scoped) transactions over multiple consecutive statements, and is generally the only kind of routine that can; specifically, those statements are parcelled together into a separateinner_procedure
, and then that is invoked indirectly by way of a new statement in the first procedure that invokes the system-definedtry_catch
procedure. The vast majority ofprocedure
that exist will be user-defined. But some system-defined routines that would otherwise befunction
orupdate_operator
areprocedure
instead solely because they are non-deterministic; an example is an operator that derives a tuple sequence from a relation without fully sorting the tuples, because the result is fundamentally random and non-repeatable. inner_procedure
-
A
inner_procedure
is to anprocedure
what ainner_function
is to afunction
. main
-
A
main
is the single anonymous procedure that is the "main program" of a non-hosted Concrete Muldis D application. Amain
is the same as aprocedure
, but that it can not be invoked by any other Muldis D routine, it can not live in any depot, and it can not have any parameters. In a mixed-language application, where Muldis D code is invoked by another host language, there is no Muldis Dmain
at all, since it would be redundant with host langauge routines.
Note that Muldis D currently has no direct support for the concept of a trigger-routine that can update a database; updating virtual relvars or invoking procedure
are recommended instead. As for non-updating trigger-routines, the type/transition constraint routines already perform that feature. The feature in question may be directly supported later?
This documentation is pending.
USERS AND PRIVILEGES
The Muldis D DBMS / virtual machine itself does not have its own set of named users where one must authenticate to use it. Rather, any concept of such users is associated with individual persistent repositories, such that you may have to authenticate in order to mount them within the virtual machine; moreover, there may be user-specific privileges for that repository that restrict what users can do in regards to its contents.
The Muldis D privilege system is orthogonal to the standard Muldis D constraint system, though both have the same effect of conditionally allowing or barring a command from executing. The constraint system is strictly charged with maintaining the logical integrity of the database, and so only comes into affect when an update of a repository or its contents are attempted; it usually ignores what users were attempting the changes. By contrast, the privilege system is strictly user-centric, and gates a lot of activities which don't involve any updates or threaten integrity.
The privilege system mainly controls, per user, what individual repository contents they are allowed to see / read from, what they are allowed to update, and what routines they are allowed to execute; it also controls other aspects of their possible activity. The concerns here are analogous to privileges on a computer's file system, or a typical SQL database.
This documentation is pending.
TRANSACTIONS AND CONCURRENCY
This official specification of the Muldis D DBMS includes full ACID compliance as part of the core feature set; moreover, all types of changes within a repository are subject to transactions and can be rolled back, including both data manipulation and schema manipulation; moreover, an interrupted session with a repository must result in an automatic rollback, not an automatic commit. (But changes that occur outside the DBMS environment, such as by a system_service
, or by a host language routine, are generally not affected by transactions at all.)
It is important to point out that any attempt to implement Muldis D (what a Muldis DB Engine does) which does not include full ACID compliance, with all aspects described above, is not a true Muldis D implementation, but rather is at best a partial implementation, and should be treated with suspicion concerning reliability. Of course, such partial implementations will likely be made and used, such as ones implemented over existing DBMS products that are themselves not ACID compliant, but you should see them for what they are and weigh the corruption risks of using them.
Note that the best way for an Engine to behave, if for some reason it is built in such a way and/or over an existing DBMS product that does implicit commits after, say, data-definition statements, is for it to throw an exception if data-definition is attempted within an explicit / multi-statement transaction, such that a user of that Engine can only do data-definition outside of an explicit transaction; in this way, the Engine is still following all the Muldis D safety rules, and hence should be relatively safe to use, even if it lacks Muldis D features.
Each individual instance of the Muldis D DBMS is a single process virtual machine, and conceptually only one thing is happening in it at a time; each individual Muldis D statement executes in sequence, following the completion or failure of its predecessor. During the life of a statement's execution, the state of the virtual machine is constant, except for any updates (and side-effects of such) that the statement makes. Breaking this down further, a statement's execution has 2 sequential phases; all reads from the environment are done in the first phase, and all writes to the environment are done in the second phase. Therefore, regardless of the complexity of the statement, and even if it is a multi-update statement, the final values of all the expressions to be assigned are determined prior to any target variables being updated. Moreover, as all functions may not have side-effects, and we don't support the concept of "trigger" routines that can perform updates, we avoid complicating the issue due to environment updates occurring during their invoker statement's first phase.
In account to situations where external processes are concurrently using the same persistent (and externally visible) repository as a Muldis D DBMS instance, the Muldis D DBMS will maintain a lock on the whole repository (or appropriate subset thereof) during any active read-only and/or for-update transaction, to ensure that the transaction sees a consistent environment during its life. The lock is a shared lock if the transaction only does reading, and it is an exclusive lock if the transaction also does writing. Speaking in terms of SQL, the Muldis D DBMS supports only the serializable transaction isolation level.
Note that there is currently no official support for using Muldis D in a multi-threaded application, where its structures are shared between threads, or where multiple thread-specific structures want to use the same repositories. But such support is expected in the future.
No multi-update statement may target both catalog and non-catalog variables. If you want to perform the equivalent of SQL's "alter" statement on a relation variable that already contains data, you must have separate statements to change the definition of the relation variable and change what data is in it, possibly more than one of each; the combination can still be wrapped in an explicit transaction for atomicity.
Transactions can be nested, by starting a new one before concluding a previous one, and the parent-most transaction has the final say on whether all of its committed children actually have a final committed effect or not. There are no "autonomous transactions" within the DBMS.
Transactions in Muldis D come in both implicit and explicit varieties, but the implicit transactions only exist (that is, only have an effect) when there are no explicit transaction active.
The most generalized way to specify an explicit transaction within Muldis D is to take the statements comprising it and isolate them into their own procedure
(or inner_procedure
), then invoke that by way of the system-defined try_catch
procedure; the invoked procedure is conceptually an exception-trapping try block. A procedure invoked through try_catch
is wrapped in a new child transaction that is tied to its lexical scope. The transaction will begin when that scope is entered and end when that scope is exited; if the scope is exited normally, its transaction commits; if the scope terminates early due to a thrown exception, its transaction rolls back. In the latter case, try_catch
will catch that exception, so the rollback doesn't proceed further than itself, but that if the catch block it then invokes also throws (or re-throws) an exception, that is not caught here. In a pure Muldis D application, this lexically-scoped exception handling mechanism is the only kind of generalized explicit transaction.
In a mixed-language application, when Muldis D routines are invoked by a host language, the host language is allowed to specify further parent-most explicit transactions within the DBMS that are not bound to the lexical scope of a block, using distinct transaction initiation and termination statements. Such open-ended transactions are intended for transactions which last over multiple DBMS invocations of an application (whereas Muldis D scope-bound transactions always occur entirely within one invocation of the DBMS by a host language). But it is a recommended best practice that host language code will associate the invocation of said statements with its own lexical scopes, such as its own try-catch constructs.
An implicit transaction is associated with the lexical scope of every Muldis D update_operator
and system_service
, and by extension, every Muldis D statement that is an invocation of said. Or more accurately, an update operation (including a multi-update operation) is implicitly atomic, and will either succeed and commit as a whole, or fail and rollback as a whole. This is as if every update operator invocation was surrounded by its own try block, except that any thrown exceptions are not caught. Similarly, every function
and \w+_constraint
is trivially a transaction, though since these never update anything, all that really means is that they see a consistent view of their environment.
By contrast, every procedure
(and inner_procedure
) and main
is neither implicitly a transaction nor atomic (except when externally wrapped in one), so you can use a procedure to define an operation where you want to keep partial results of a failure.
Since failures are always accompanied by thrown exceptions, a failure will unwind the call stack and rollback any active transactions one nesting layer at a time until either a try block is exited, which halts the unwinding, or the application exits, rolling back all remaining active transactions.
If no explicit transactions are active at all when a failure occurs, then each non-procedure-invoking statement in a procedure is the parent-most transaction, and so a failure part-way through said procedure will result in the prior-completed statements to be fully committed, and only the failed statement to have left no state change. At this point, a pure Muldis D application will have exited, and a mixed-language application will have either exited or caught an exception in a host-language try block.
All currently mounted repositories (persistent and temporary both) are joined at the hip with respect to transactions; a commit or rollback is performed on all of them simultaneously, and a commit either succeeds for all or fails for all (a repository suddenly becoming inaccessible counts as a failure). Note that if a Muldis D implementation can not guarantee such synchronization between multiple repositories, then it must refuse to mount more than one repository at a time under the same virtual machine (users can still employ multiple virtual machines, that are not synchronized); by doing one of those two actions, a less capable implementation can still be considered reliable and recommendable.
Some Muldis D commands can not be executed within the context of a parent transaction; in other words, they can only be executed directly by a procedure
etc or the host language, the main examples being those that mount or unmount a persistent repository; this is because such a change in the environment mid-transaction would result in an inconsistent state.
Muldis D lets you explicitly place locks on resources that you don't want external processes to change out from under you, and these locks do not automatically expire when transactions end; or maybe they do; this feature has to be thought out more.
This documentation is pending.
ENTITY NAMES
All entities that exist at some given time within a DBMS environment can be explicitly referenced in some manner for definition and/or use; there are no orphans. At the very least, every kind of DBMS entity is defined in one or more catalog relvars; its interface and/or implementation can be observed and possibly updated therein.
Note that the following namespaces assume that a program that is written in Muldis D executes possibly either standalone or a peer-to-peer process that can have its global variables made visible to other processes, or have others' made visible to it. Or in other words, the program can both manage its own dbvars and be a DBMS client, and the program can either just use the DBMS itself or be a server of it.
Note that all entity names in Muldis D are case-sensitive, as with character strings in general. Implementations should take special care to compensate for any case-insensitive storage system they might use.
This is the hierarchy of invocation namespaces of DBMS entities:
cat # system catalog describing everything; all but .system updateable
cat.system
cat.native
cat.mount
cat.foreign
cat.interp
sys # system-defined types and routines
sys.Core
sys.Core.<package>.<type>
sys.Core.<package>.<routine>
sys.<extension>
sys.<extension>.<package>.<type>
sys.<extension>.<package>.<routine>
app # user-defined entities local to this vm/app, not a depot
app.<var>
app.<type>
app.<routine>
app.<package>
app.<package>.<type>
app.<package>.<routine>
glo # global namespace to group currently mounted depots w mount names
glo.<depot>(.<schema>){0,}
glo.<depot>(.<schema>){0,}.<relvar>
glo.<depot>(.<schema>){0,}.<type>
glo.<depot>(.<schema>){0,}.<routine>
glo.<depot>(.<schema>){0,}.<package>
glo.<depot>(.<schema>){0,}.<package>.<type>
glo.<depot>(.<schema>){0,}.<package>.<routine>
dep # entities in a depot ref their own depot with this
dep(.<schema>){0,}
dep(.<schema>){0,}.<relvar>
dep(.<schema>){0,}.<type>
dep(.<schema>){0,}.<routine>
dep(.<schema>){0,}.<package>
dep(.<schema>){0,}.<package>.<type>
dep(.<schema>){0,}.<package>.<routine>
sch # entities in a schema ref their own schema with this
sch.<relvar>
sch.<type>
sch.<routine>
sch.<package>
sch.<package>.<type>
sch.<package>.<routine>
pkg # entities in a package ref their own package with this
pkg.<type>
pkg.<routine>
inn # entities in a main|inner routine ref child|sib rtns with this
inn.<routine>
lex # entities in a rtn ref own lexical params|exprs|vars with this
lex.param.<param>
lex.expr.<expr>
lex.var.<var>
This documentation is pending.
SEE ALSO
Go to Language::MuldisD for the majority of distribution-internal references, and Language::MuldisD::SeeAlso for the majority of distribution-external references.
AUTHOR
Darren Duncan (perl@DarrenDuncan.net
)
LICENSE AND COPYRIGHT
This file is part of the formal specification of the Muldis D language.
Muldis D is Copyright © 2002-2007, Darren Duncan.
See the LICENSE AND COPYRIGHT of Language::MuldisD for details.
ACKNOWLEDGEMENTS
The ACKNOWLEDGEMENTS in Language::MuldisD apply to this file too.