—=pod
=encoding utf8
=head1 NAME
Language::MuldisD::Basics -
10,000 Mile View of Muldis D
=head1 VERSION
This document is Language::MuldisD::Basics version 0.7.0.
=head1 PREFACE
This document is part of the Muldis D language specification, whose root
document is L<Language::MuldisD>; you should read that root document
before you read this one, which provides subservient details.
=head1 DESCRIPTION
This document provides a 10,000 mile view of the Muldis D language. It
provides the basics of how the language is designed and works, as a
foundation upon which to understand the other parts of the language spec.
=head1 NOTES ON TERMINOLOGY
There are a few terms that the Muldis D documentation uses which may have
different meanings than what you may be used to, so here are a few notes to
clarify what they mean in this document. Similarly, there are some terms
used in the industry that are expressly not used here so to help avoid
confusion given what meaning is often attributed to them.
=over
=item type / data type
The term I<type> as a noun always refers to a I<data type>; the term is not
used to indicate classifications of other things; eg, I<kind> or other
terms will be used for such instead, to avoid confusion. The terms
I<class> and I<domain> are not used in this documentation to mean I<type>.
=item value, variable, constant
A I<value> is unique, eternal, immutable, and is not fixed in time or space
(it has no address). A I<variable> is fixed in time and space (it does
have an address); it holds an appearance of a value; it is neither unique
nor eternal nor immutable in the general case. A I<constant> is a variable
which is defined to not mutate after initially being set. Terms like
I<object> are not used in this documentation for any aspects of Muldis D
since their meaning in practice is both ambiguous and wide-reaching, and
could refer to both values and variables depending on usage context.
=item text, character
A I<text> is a string composed of Unicode characters, where a I<character>
is an abstract concept that usually is a I<grapheme> or
I<language-independent grapheme>, but could potentially be a I<codepoint>
or I<language-specific grapheme>. This documentation only uses the term
I<character> in an abstract sense, and no part of the Muldis D API is
defined using that term. Rather, any operators or constraints that work
with sub-strings of text will be specified in terms like I<NFC grapheme>.
=item tuple
A I<tuple> is an unordered heterogeneous collection of 0..N elements that
are keyed by the element's name; each element is a name-value pair, and all
names in the tuple are distinct. While I<tuple> legitimately refers to the
same thing as the Muldis D term I<sequence> in other contexts, it does
not in this documentation. Terms like I<record> or I<row> are not used in
this documentation, the latter in particular because it implies ordered.
=item relation, relvar, relcon
A I<relation> is like an unordered homogeneous set of I<tuple> where all
member tuples have identical degree and name-sets, but that a relation data
type knows what its allowed names are even if it has no tuples. Like with
I<tuple>, the term I<relation> legitimately refers to a set or "ordered
tuple" in other contexts, but it does not in this documentation. Terms
like I<record set> or I<row set> or I<table> are not used in this
documentation, the last 2 in particular because they imply a significance
to the order of tuples, where there is none in a relation. Moreover, the
term I<domain> does not mean the same thing as I<relation>, and neither
does the term I<function>; those terms have distinct meanings here. Note
that the term I<relvar> is short hand for I<relation-typed variable>, and
I<relcon> is short hand for I<relation-typed constant>. Note also that a
I<relational database> is called that I<because> it is composed of
relations, and I<not> just because its relations can be joined or be
associated through foreign key constraints.
=item function
A I<function> is a routine whose invocation is used as a value expression,
and it conceptually serves as a map between the domains of its parameters
and its result value. A I<function> is not the same as a I<relation>,
though both can be used as maps between values. Besides their conceptual
difference in Muldis D as a routine vs a value, a selected I<relation>
value in Muldis D is always finite, and hence so is the cardinality of
the map it can provide; whereas, a function can have an infinite map size.
=item database / relational database, dbvar, dbcon
Within this documentation, the actually more generic term I<database> will
be used to refer exclusively to a I<relational database>, so you should
read the former as if it were the latter. A I<database> is a tuple, all of
whose (distinctly named) attributes are each relation-typed or
database-typed (a recursion whose leaves are all relations); one holds all
user data that is being maintained as an interconnected unit. A
database-typed variable, aka a I<dbvar>, is managed by a DBMS/RDBMS, and
such is what is more informally referred to outside this documentation as a
"database". Whenever a user is "using a database", they are reading or
updating a dbvar. Examples of databases are genealogy records, financial
records, and a CMS' data. A I<database> is I<not> a program. A
database-typed constant is a I<dbcon>.
=item catalog
A I<catalog> is a special kind of dbvar or dbcon whose relations hold
meta-data about the normal databases that hold user data (and about
themselves too); updating a catalog dbvar has the side-effect of changing
the structure of the associated normal database. This meta-data describes
all user-defined data types and operators, plus base and viewed relations,
stored with and used with the database.
=item depot / repository
A I<depot> or I<repository> is a local abstraction of a typically external
storage system which holds 1 database variable and 1 associated catalog,
plus perhaps other details that assist the mapping of the abstraction to
the actuality.
=item DBMS / RDBMS
Within this documentation, the actually more generic term I<DBMS> will be
used to refer exclusively to a I<RDBMS> (Relational Database Management
System), so you should read the former as if it were the latter. A
I<RDBMS> is a computer program that manages relational database variables,
associated catalogs, and depots in general. Muldis D aspires to or does
define one, and likewise are various other I<TTM>-inspired programs like
Rel and Duro; most other DBMS-like programs are technically non-relational,
including all SQL DBMSs such as Oracle, PostgreSQL and SQLite, though they
usually give lip-service to the relational data model and approximate a
RDBMS to varying degrees.
=item sequence
Within this documentation, a I<sequence> generically refers to an ordered
collection of 0..N elements. The term I<array> is not used in this
documentation because that word's actual meaning is more broad, and
includes both matrices plus unordered collections of name-value pairs.
Note that a sequence may be used simply to maintain a simple collection in
order, though the actual order of its elements may not always be
significant. Sometimes I<sequence> also refers specifically to the C<Seq>
data type, which is a particular binary relation.
=item selector
A I<selector> is a routine that captures an appearance of a value for use
in a variable or expression. The term I<constructor> is not used in this
documentation because all values in Muldis D are conceptually eternal and
immutable, so it does not make sense to say that we are "building" one; we
are "selecting" one.
=item fail
Within this documentation, if a routine is said to I<fail> under some
circumstance, such as with certain arguments, that can mean either or both
of the routine throwing an exception at runtime, or failing to compile in
the first place (which is a thrown exception at compile time); the latter
is more likely to happen if the compiler can detect that certain arguments
will always be unacceptable, and the former usually happens just if a
problem can likely not be caught at compile time.
=back
I<This documentation is pending.>
=head1 INTERPRETATION OF THE RELATIONAL MODEL
The relational model of data is based on predicate logic and set theory.
The model assumes that all data is represented as mathematical N-ary
I<relations>, an N-ary relation being a subset of the cartesian product of
N I<data types>. Reasoning about such data is done in two-valued predicate
logic, meaning there are 2 possible evaluations for each proposition,
either I<true> or I<false>.
The basic relational building block is the data type, which can consist of
either scalar values or values of more complex types. A I<tuple> is an
unordered set of I<attributes>, each of which has a name and a declared
data type; an attribute value is a specific valid value for the type of the
attribute. An N-relation is defined as an unordered set of N-tuples, and
the tuples comprise the I<body> of the relation; the relation has a
I<heading>, which is a set of attribute definitions (their names and
types); this heading is also the heading of each of its tuples.
A heading represents a predicate, and there is a one-to-one correspondence
between the free variables of the predicate and the attribute names of the
heading. The body of a relation represents the set of true propositions
that can be formed from the predicate represented by the relation's
heading. The body of a tuple with the same heading provides attribute
values to instantiate the predicate into a proposition by substituting each
of its free variables. When a tuple appears in a relation body, the
proposition it represents is deemed to be true. Contrariwise, for every
tuple whose heading is the same as the relation's but does not appear in
the relation body, its proposition is deemed to be false. This assumption
is known as the I<closed world assumption>.
The relational model specifies that data is operated on by means of a
relational calculus or a relational algebra. These 2 are logically
equivalent; for any expression in the relational calculus, there is an
equivalent one in the relational algebra, and vice versa. Relational
algebra, an offshoot of first-order logic, is a set of relations closed
under operators; each operator takes N relations as arguments and results
in a relation. While the relational algebra provides a more procedural way
for specifying database queries, in contrast the relational calculus
provides a more declarative way for specifying queries.
=head2 Mechanics of Some Relational Operations
This documentation section takes a very informal (and possibly blatantly
incorrect) alternate approach to describing the nature of relations,
tuples, and attributes, within the context of explaining the mechanics of
how some relational operations work in practice.
Herein, we shall conceptualize a relation as a long boolean expression,
consisting of a string of basic boolean-valued expressions that are
selectively anded or ored together. A basic boolean-valued expression, C<<
<attr> >>, takes the form C<< attribute <name> is <value> >>. Each tuple
body, C<< <tuple> >>, in the relation takes the form of a chained C<and>
that connects N C<< <attr> >>, one per each attribute in the relation, and
each having a distinct C<< <name> >>. The relation body takes the form of
a chained C<or> that connects N C<< <tuple> >>, one per each tuple in the
relation, and each C<< <tuple> >> has the same set of C<< <name> >> as the
others, but the set of C<< <value> >> that each C<< <tuple> >> has is
distinct.
Take, for example, a relation having some details about people, where each
attribute is a type of detail and each tuple has details for one person:
name is John and age is 32 and city is Vancouver
or name is Andy and age is 46 and city is Toronto
or name is Julia and age is 27 and city is Halifax
etc...
Or a multi-relation example involving suppliers, foods, and shipments:
farm is Hodgesons and country is Canada
or farm is Beckers and country is England
or farm is Wickets and country is Canada
food is Bananas and colour is yellow
or food is Carrots and colour is orange
or food is Oranges and colour is orange
or food is Kiwis and colour is green
or food is Lemons and colour is yellow
farm is Hodgesons and food is Kiwis and qty is 100
or farm is Hodgesons and food is Lemons and qty is 130
or farm is Hodgesons and food is Oranges and qty is 10
or farm is Hodgesons and food is Carrots and qty is 50
or farm is Beckers and food is Carrots and qty is 90
or farm is Beckers and food is Bananas and qty is 120
or farm is Wickets and food is Lemons and qty is 30
Now a very simple pair of relations:
x is 4 and y is 7
or x is 3 and y is 2
y is 5 and z is 6
or y is 2 and z is 1
or y is 2 and z is 4
So now will be briefly introduced a few common fundamental relational
operations, that are projection, join, union.
A projection of a relation derives a relation that has a subset of the
original's attributes, and all of its tuples. Continuing the boolean
expression analogy, the projected relation contains fewer C<< and <attr> >>
than the original. For example, lets take the projection of the C<food>
column from the shipments relation, to get, initially:
food is Kiwis
or food is Lemons
or food is Oranges
or food is Carrots
or food is Carrots
or food is Bananas
or food is Lemons
Now, the above expression can be simplified because it now contains
redundancies, and the simplified version is logically identical:
food is Kiwis
or food is Lemons
or food is Oranges
or food is Carrots
or food is Bananas
So this projected relation has 5 tuples rather than the original 7, and
saving logical redundancy is why relations never have duplicate tuples.
A join of 2 relations derives a relation that has all of the originals'
attributes, and its set of tuples is fundamentally the cartesian product of
those of the originals. Following our boolean analogy, we start off by
pairwise connecting instances of every C<< <tuple> >> of the first relation
with instances of every C<< <tuple> >> of the second one, with the members
of each pair then being chained together with C<and> to form a single,
longer chain of C<and>. Note that join is commutative, so it doesn't
matter which of the source relations is first or second, the result is the
same, as much as C<foo and bar> is the same as C<bar and foo>. For
example, lets do a join of our 2 simplest relations:
x is 4 and y is 7 and y is 5 and z is 6
or x is 4 and y is 7 and y is 2 and z is 1
or x is 4 and y is 7 and y is 2 and z is 4
or x is 3 and y is 2 and y is 5 and z is 6
or x is 3 and y is 2 and y is 2 and z is 1
or x is 3 and y is 2 and y is 2 and z is 4
Now, when multiple relations are connected into one such as with a join,
the relational model assumes that if either of the sources have attributes
with the same names as each other, then they are both describing the same
things. In this case, the references to attribute C<y> from both relations
are talking about the same C<y>. And so, any result tuples that contradict
themselves, saying that C<y> equals both one value and equals a different
one, can't ever be true and are eliminated; only the tuples where the C<y>
value is identical are kept:
x is 3 and y is 2 and y is 2 and z is 1
or x is 3 and y is 2 and y is 2 and z is 4
Moreover, this expression can be simplified by removing the redundant C<y>
attribute:
x is 3 and y is 2 and z is 1
or x is 3 and y is 2 and z is 4
All attributes in a relation have distinct names. And if there were any
identical tuples, the redundant ones would be eliminated.
A join operation has several trivializing scenarios. If the 2 source
relations have no attribute names in common, the result is simply the
cartesian product. If the 2 sources have all their attribute names in
common, the result is the common subset or intersection of their existing
sets of tuples. If one source has all the attributes of the other, but the
reverse isn't true, then the result is a subset of tuples from the relation
that has more attributes; this is a semijoin.
A union of 2 relations, which requires that the 2 relations have the same
headings, derives another relation with the same heading, and a union of
the two's set of tuples as its body, with any duplicates eliminated. In
terms of our boolean analogy, a union is simply chaining together the
entirety of each relation's boolean expression with an C<or>, and then
eliminating redundancies from the result.
A full list of all the relational operators having more formal (but Muldis
D specific) descriptions occurs in the L<Language::MuldisD::Core>
document; that list does I<not> use the aforementioned boolean analogies.
=head1 MULDIS D
Muldis D is a computationally / Turing complete (and industrial strength)
high-level programming language with fully integrated database
functionality; you can use it to define, query, and update relational
databases. The language's paradigm is a mixture of declarative,
functional, imperative, and object-oriented. It is primarily focused on
providing reliability, consistency, portability, and ease of use and
extension. (Logically, speed of execution can not be declared as a Muldis
D quality because such a quality belongs to an implementation alone;
however, the language should lend itself to making fast implementations.)
The language is rigorously defined and requires users to be explicit, which
leaves little room for ambiguity and related bugs. When something is
specified in Muldis D, its semantics should be well known and fully
portable (not implementation dependent). If a conforming implementation
(usually a Muldis DB Engine class) can't provide a specified behaviour,
code using it will refuse to run at all, rather than silently changing its
semantics; this also helps users to avoid bugs. Moreover, Muldis D
generally disallows any details of an implementation's "physical
representation" or other internals to leak through into the language; eg,
there is no "varchar" vs "char", simply "text". Users should not have to
know about this level of detail, and implementers should be free to
adaptively pick optimum ways to satisfy user requests, and change later.
Muldis D, being first and foremost a data processing language, provides a
thorough means to both introspect and define all DBMS entities using just
data processing operators, which is called the DBMS "catalog". The catalog
is a set of system-defined relvars (relation-typed variables) which reflect
the definitions of DBMS entities; users can generally update these to
create, alter, or drop DBMS entities. In fact, updating the catalog
relvars is the fundamental way to do data-definition tasks in Muldis D,
and any other provisions for data-definition are conceptually abstractions
of this. Generally speaking, users can do absolutely everything in the
DBMS with just data querying and updating operations.
The design and various features of Muldis D go a long way to help both its
users and implementers alike. A lot of flexibility is afforded to
implementers of the language to be adaptive to changing constraints of
their environment and deliver efficient solutions. This also makes things
a lot easier for users of the language because they can focus on the
meaning of their data rather than worrying about implementation details;
users can focus on defining what needs to be accomplished rather than how
to accomplish that, which relieves burdens on their creativity, and saves
them time. In short, this system improves everyone's lives.
What users fundamentally write are Muldis D "routines", each consisting of
one or more "statements", and in executing these, all work is done.
=head2 Representation
Muldis D has 2 closely corresponding main representation formats, which are
called I<Concrete Muldis D> and I<Abstract Muldis D>; these are analogous
to the natural code strings of a typical programming language, and the
abstract syntax trees that they naturally parse into, respectively.
Concrete Muldis D is the natural form that one would code in if they were
writing a self-contained application (or component) in Muldis D which was
compiled using a separate process into its own executable (or library),
which includes situations where Muldis D is its own Parrot
(L<http://www.parrotcode.org/>) hosted language (a prospect which is
desired to be implemented in the near future). Concrete Muldis D would
also be used by an interactive shell interface over the Muldis DB
(specifically L<Muldis::DB::Interface>) implementation of Muldis D, when
users submit commands at runtime, or in any other situation where it makes
sense to take input in that form.
Abstract Muldis D is the natural form that one would code in if they were
primarily writing their application in a separate host language, such as
Perl, and any Muldis D code was being specified in terms of host language
code, such as Perl arrays, hashes, and scalars. Abstract Muldis D code
consists of quasi-hierarchical but actually relational collection values,
typically catalog tuples. Abstract Muldis D is the only representation
format used by the API the Muldis DB (specifically
L<Muldis::DB::Interface>) implementation of Muldis D, and is what any Perl
code typically should be using. When generating Muldis D code from
arbitrary Perl data structures (which includes the work of, eg, SQL DBMS
emulators), the Abstract form is the easiest to use and the least error
prone since no values have to be escaped or stitched together as strings,
which prevents many injection security holes. Abstract Muldis D is also
what is used when Muldis D code is defined to generate/prepare and execute
other Muldis D code at runtime (by reading or updating the meta-model /
system catalog), which is "data definition".
See L<Language::MuldisD::Core> first for details of the Muldis D
meta-model, which is also the grammar of Abstract Muldis D; see
L<Language::MuldisD::Grammar> for the grammar of Concrete Muldis D; the
latter document says how to parse Concrete Muldis D into Abstract Muldis D;
the former document explains the meaning of both in terms of the Abstract;
see L<Language::MuldisD::PerlHosted> for Perl Hosted Abstract Muldis D.
=head1 TYPE SYSTEM
The Muldis D type system is a formal type system, at least in intent, and
works conceptually in the following manner.
There is a single universal value set/domain, named C<Universal>, whose
members are all the values that can possibly exist; C<Universal> is the
maximal data type of the entire type system. Also there is a single
nullary value set/domain, named C<Empty>, which has zero members; C<Empty>
is the minimal data type.
All Muldis D data values as individuals are eternal and immutable. All
values are logically distinct, and each value occurs exactly once, and is
not fixed within time or space (so doesn't have an "address"). It does not
make sense to say that you are creating or destroying or copying or
mutating a I<value>. However, an eternal immutable value can make an
I<appearance> within a I<variable>, as a variable I<is> a named/addressable
container that is fixed within time and space, and it can be created,
destroyed, mutated, and multiple variables can hold appearances of the same
value. So when one appears to be testing 2 values for equality, they are
actually testing whether 2 value appearances are in fact the same value.
Given that all data values in Muldis D are fundamentally immutable, the
term "selector" is used to describe a routine that captures an appearance
of a value into a variable (or for use in a value expression); this is
analogous to the task that a "constructor" routine does in a typical
object-oriented language, but that the former is conceptually "selecting"
an eternally existing value rather than conceptually "creating" a new one.
In the Muldis D type system, a I<data type> is a set of values, and as with
individual values, a data type is eternal and immutable. Every data type
is distinct from all other data types, and no 2 data types may encompass
exactly the same set of values. Every data type other than C<Universal>
and C<Empty> has at least 1 member value, and at most 1 less value than the
universal set. If 2 data types have no values in common, they are said to
be I<disjoint>.
Given 2 arbitrary data types, I<T1> and I<T2>, I<T1> is called a
I<supertype> of I<T2> if its value set is a superset of that of I<T2>, and
in that situation, I<T2> is a I<subtype> of I<T1>, as its value set is a
subset of that of I<T1>. Note that every type includes itself as its own
supertype and subtype, in which case, the I<T1> and I<T2> of the previous
example are the same type. By contrast, if I<T1> and I<T2> are explicitly
different types but otherwise have that relationship, then I<T1> has at
least 1 value that I<T2> doesn't have, in which case I<T1> is also called a
I<proper supertype> of I<T2>, and I<T2> is also called a I<proper subtype>
of I<T1>. Given those last examples, I<T1> is a I<more general> type, and
I<T2> is a I<more specific> type. In this way, the system-defined
C<Universal> type is a proper supertype of all other types, and the
system-defined C<Empty> type is a proper subtype of all other types. Now,
if no data type, I<T3> exists which is both a proper subtype of I<T1> and a
proper supertype of I<T2>, then I<T1> is an I<immediate supertype> of
I<T2>, and I<T2> is an I<immediate subtype> of I<T1>. Note that the
Muldis D type system supports multiple inheritance, so types can form a
lattice rather than a tree.
Every value has at most a single I<most specific type> (or I<MST>), which
is cited as the general answer to the question "what is this value's type".
The MST of a value is the data type containing that value which has no
proper subtypes that also contain that value. Moreover, to enforce the "at
most a single" requirement, which keeps answering the question a simple
affair, it is mandatory in Muldis D that when any 2 data types have values
in common, there must exist a data type which contains only the values that
they have in common, and hence is a subtype of both. Note that a value
will always implicitly assume the most specific type that exists which
contains it, even if a selector for a less specific type was explicitly
used to select it.
A I<union type> is a data type that has at least 2 immediate subtypes, and
every one of its values is also a value of an immediate subtype; that is,
the MST of every value in a union type is not that type. An I<intersection
type> is a data type that has at least 2 immediate supertypes. A
I<difference type> is a data type that has exactly 1 immediate supertype,
and that supertype is a union type such that the difference type and
another peer subtype of that union type are complementary with respect to
the union type; every union type value is in either the difference type or
its complement, but not both. In this way, C<Universal> is a union type of
all other types, and C<Empty> is an intersection type of all other types.
A I<root type> is a data type for which all of its values can be selected
by the same single selector, and which has no proper supertype that is a
root type. All root types are mutually disjoint, so every value is a
member of exactly one root type. Generally speaking, root types are the
implementational foundation over which all operators and all other types
are built, and the declared parameter and result types of most
system-defined operators are root types. The 6 most important
system-defined root types are: C<Bool>, C<Int>, C<Blob>, C<Text>, C<Tuple>,
C<Relation>. All user-defined root types are scalar types that are defined
not in terms of other types except for that any components of their
I<possreps> (possible representations) have declared types. I<Perhaps it
should be said that all root types are defined by this last sentence?> A
I<leaf type> is a data type that has no proper subtypes save for C<Empty>.
A I<complete type> is a data type that is fully defined, and for which it
would be possible to have values that are of just that data type, if it
didn't have proper subtypes. An I<incomplete type> or I<parameterized
type> is a data type that is not fully defined, but serves as a template by
which complete types can be defined; there can never be values that are
just of a parameterized type. The most important complete types are
C<Bool>, C<Int>, C<Blob>, C<Text>; the most important incomplete types are
C<Tuple>, C<Relation>. For that matter, any implicit supertypes such as
C<Universal> and C<Scalar> could be considered incomplete types, but that
they are not parameterized.
=head2 Type Identification
All values in the Muldis D type system are broadly categorized into 5
complementary sets called I<scalar values>, I<tuple values>, I<relation
values>, I<quasi-tuple values>, and I<quasi-relation values>; tuple and
relation values are collectively known as I<nonscalar values>; quasi-tuple
and quasi-relation values are collectively known as I<quasi-nonscalar
values>. The type system has the system-defined data types named
C<Scalar>, C<Tuple>, C<Relation>, C<QuasiTuple>, and C<QuasiRelation>,
which serve as maximal data types for each category, respectively. The 5
types are all mutually disjoint, and C<Universal> is a union type over all
of them.
To keep things simpler, every data type (save C<Universal> and C<Empty>)
must be a proper subtype of exactly 1 of the 5 categories, and can not
include values from several of them. Therefore, every data type is said to
be either a I<scalar type>, a I<tuple type>, a I<relation type>, a
I<quasi-tuple type>, or a I<quasi-relation type>, depending which category
all of its values come from. In similar fashion, a I<nonscalar type> is
generally any type that is not a scalar type, if we ignored quasi-nonscalar
types, meaning it is either a tuple type or a relation type.
The identity of every scalar type is defined by its name alone, and every
scalar type must have a distinct name that is explicitly defined, either by
the system or by the user as is applicable. Every value of a scalar type
is conceptually opaque and atomic, and its components are not known to
users of that type; but even when the components are known (because they
are user-defined structured types), two independently defined scalar types
are completely disjoint even if their components look the same, by
definition. The only way for 2 scalar types to have values in common is if
one is explicitly defined, directly or indirectly, as a subtype of, or as a
union type encompassing a subtype of, the other.
Every value of a nonscalar type (either a tuple type I<or> a relation type,
respectively) is conceptually transparent, and its component structure is
known to all. The identity of every nonscalar type is defined by its
component structure alone, and every nonscalar type must have a distinct
component structure. Any two nonscalar types that have the same component
structure are in fact the same type, by definition, regardless of whether
they were defined independently of each other or not.
A quasi-nonscalar type is the same as a nonscalar type as far as the means
of identifying it go (by its structure, not by its name), but that
particular kinds of components are permitted in quasi-nonscalar types that
aren't permitted in nonscalar types (and aren't permitted in scalar types).
To keep things simpler, every data type in Muldis D has a name by which it
is referenced, even nonscalar and quasi-nonscalar types; however, the names
of types that are not scalar types are simply convenient aliases for their
true identities, which are their structures (the convenience allows various
Muldis D catalog features to be designed and implemented more easily).
=head2 Scalar Types
Scalar types are the only conceptually encapsulated types in Muldis D, and
are like other languages' concepts of object classes where all their
attributes are private, and only accessible indirectly. The definition of
a scalar type comprises usually one or more named I<possreps> or I<possible
representations>, and for each of those, at least one I<selector> operator
and usually at least one I<accessor> or I<the> operator.
A I<possrep> of a type is an exhaustively complete means for users to
conceptualize the structure of the type; it is like a "role" or "interface
definition. A possrep has the appearance of a complete collection of (zero
or more) named object attributes (of any scalar or nonscalar type) that the
type could logically be implemented as, and users can use it as if it
actually was implemented that way, but without the requirement that the
type actually is implemented that way. If a type has multiple possreps,
said possreps can differ from each other in arbitrarily large ways, but
every one is individually capable of representing all of the type's values;
any possrep could be used exclusively by a user when they work with its
type, without diminishing what they can do. A single possrep is specific
to one and only one type, so it is possible to refer to a type by simply
referring to the name of one of its possreps.
Taking for example an integer data type, one of its possreps could
represent an integer value as a string of binary digits, while another
possrep could represent an integer value as a string of decimal digits. Or
taking for example a temporal data type, one of its possreps could
represent a date as an ISO 8601 formatted character string in the Gregorian
calendar, and another possrep could represent it as a number of seconds
since the UNIX epoch. Or taking for example a spacial data type that is a
rectangle, one possrep could specify the 4 vertices as 4 (or 3) point
values, and another possrep could specify fewer vertices and also specify
the rectangle's width and height as numeric values.
A possrep additionally has a defined boolean-valued constraint expression
(which is simply I<true> in the trivial case), that restricts what values
the possrep components can have within the context of their fellows.
Taking for example a "medium polygon" data type, there could be a
constraint that the area of the polygon is between 5 and 10 units.
Each possrep comprises exactly one selector operator whose named parameter
set exactly matches that possrep's set of named attributes, and you
select a value of the type by invoking the selector with a full set of
values for the possible attributes. Each possrep also comprises an
accessor operator for each of its attributes, with which users can extract
the possible attribute's value.
No data type has any operators built-in to its definition except for the
aforementioned selectors and accessors. All other operators that are used
with a data type are expressly I<not> built-in to the type (even if they
are system-defined); the other operators do not have any access to the data
type's internals, and must be defined (directly or indirectly) in terms of
(that is, layered on top of) the few that are built-in, though the
built-ins from any or all possreps of the type can be utilized.
With a user-defined scalar type, if the type is to have multiple possreps,
then just one possrep is defined as the fundamental one, and the other
possreps are defined in terms of the first, by which means the mappings
between them is done. The type-defining user can later come back and
redefine the type if they wish, using a different possrep as the
fundamental, but assuming the redefinition has all the same values,
non-defining users of the type won't know any different.
The Muldis D implementation can choose for itself as to how the scalar
type is physically represented behind the scenes, either picking between
any of the user-provided possreps (assuming enough information is present
to derive all needed inverse functions as applicable) or using yet another
one or several of its own; the implementation can work how it knows best to
achieve an efficient system, and this is all hidden away from the users,
who simply perceive in it what they requested.
In the context of scalar subtype/supertype relationships, the definition of
a subtype can add additional possreps that are only valid for the subtype,
such that users of the subtype can use both possreps defined for the
subtype and the supertype, but users of the supertype can only use the
possreps for the supertype, and not the subtype. Taking for example the
data types of rectangle and square, the latter is a subtype of the former;
a possrep for a rectangle in general comprises its center point as well as
its width and its height, which also works for a square; an additional
possrep that just works for a square rather than a rectangle in general
comprises a center point plus its length.
As a corollary to this, all union types have none of the possreps defined
by their subtypes. So the system-defined C<Scalar> type has no possreps at
all, and hence has no selectors or accessors defined for it.
=head2 Tuple Types and Relation Types
Tuple types are the fundamental heterogeneous conceptually non-encapsulated
collection types in Muldis D, and are like the Pascal language's concept
of a record, or the C language's concept of a struct. The definition of a
tuple type comprises a set of zero or more named I<attributes> of any
scalar or nonscalar type. This set definition is called the tuple's
I<heading>.
Relation types are the fundamental homogeneous conceptually
non-encapsulated collection types in Muldis D, and are like other
languages' concepts of sets (or arrays where all elements are distinct),
but restricted in that all elements are tuples. The definition of a
relation type looks exactly like the definition of a tuple type (such that
a relation has a I<heading> even if it has no tuples), but that the
definition defines every tuple in the relation, and also but that relation
types can additionally have I<keys> defined which indicate that a subset of
its attributes' values are distinct between all tuples in the relation.
Generic selector and accessor operators exist that work with all tuple and
relation types, so they do not need to be defined per such type.
The system-defined types C<Tuple> and C<Relation> (and their system-defined
subtypes) are technically generic factory types, such that they themselves
do not define any attribute sets, and are supertypes of all tuple and
relation types that do. Beyond this special case, a pair of tuple or
relation types can only have a subtype/supertype relationship if they have
compatible headings, which means the attribute sets are of the same
degree, the attribute names are identical, and the name-wise corresponding
attributes in each heading have a valid subtype/supertype relationship;
each attribute of a tuple or relation subtype is a subtype of the
same-named attribute of the tuple or relation supertype.
=head2 Quasi-Tuple Types and Quasi-Relation Types
The union types C<Universal>, C<Tuple>, C<Relation> (and the system-defined
subtypes of the latter 2) can be used as the declared types of such as
variables and routine parameters, but they can not be used as the declared
types of scalar possrep or nonscalar (tuple or relation) attributes. The
declared type of each of the latter must be either a scalar type, or a
specific tuple or relation subtype (meaning tuple or relation types that
have specific attribute sets defined for them).
If all data types were scalar or nonscalar, then it would not be possible
to define operators with N-ary parameters whose declared types are any of
the aforementioned 3 union types. That is, an N-ary parameter is usually
relation-typed, such that the multiplicity of values that the parameter can
take are each provided as a tuple of said relation; however, as relation
attributes can not have said union types as their declared types, it would
not be possible to implement an N-ary relational join operator, for
example, since each relation being joined would probably have a different
heading than the others.
Quasi-tuple types and quasi-relation types exist as a solution to this
problem, such that the I<quasi-heading> of one is allowed to include
attributes whose declared types are any type at all, including the union
types C<Universal>, C<Tuple>, C<Relation>, C<QuasiTuple>, C<QuasiRelation>,
and subtypes of tuple and relation without specific attribute sets.
This said, the situations in which quasi-nonscalar types may be used are
limited; only quasi-nonscalar types may have quasi-nonscalar types as
components; scalar and nonscalar types may not.
Also, quasi-nonscalar types only have defined for them a subset of
corresponding nonscalar type operators, partly because the former are not
intended to replace the latter for the majority of use cases, and partly
because some of them are simply impossible to implement for
quasi-nonscalars: C<unwrap>, C<ungroup>.
=head2 Finite Types and Infinite Types
A I<finite type> is a data type whose cardinality (count of member values)
is known to be finite, and this cardinality can be deterministically
computed; moreover, every value of a finite type can be represented somehow
using a finite amount of memory. This doesn't exclude the possibility that
either the cardinality or individual values are larger than present-day
computing hardware can handle, but even if so, they could be handled by
sufficiently larger but finite resources. An I<infinite type> is a data
type that is not a finite type; its cardinality is either known to be
infinity, or it is unknown.
Generally speaking, all finite types are defined either as an explicit
enumeration of values (for example, the boolean type, which has exactly 2
values), or they are scalar types whose possreps have zero attributes (each
one is a singleton, having exactly 1 value), or they are the tuple or
relation type that has zero attributes (which has exactly 1 or 2 values,
respectively), or their values are all discrete and fall into a closed
range (for example, a type comprising the range of integers between 1 and
100, or a type comprising all real numbers in the same range that have a
granularity of 0.001, or any IEEE floating point number of a specific bit
length), or their values are length-constrained strings of
finite-cardinality elements (for example, a character string that is not
longer than 250 characters), or they are composite scalar or nonscalar or
quasi-nonscalar types whose attributes are all of finite types themselves
(for example, a type whose attributes are all C<Bool>).
Generally speaking, all infinite types are defined either as being some
open-ended natural domain (for example, the type having all integers, or
the type having all prime numbers), or they are some continuous domain,
whether open-ended or not (for example, the type having all real or complex
numbers between 1 and 100), or they are non-length-constrained strings (for
example, the set of all possible text strings), or they are composite
scalar or nonscalar or quasi-nonscalar types which have at least one
attribute which is itself infinite (for example, a type that has an I<Int>
attribute).
The system-defined root type C<Bool> is finite (2 values), as is the
C<Empty> type (zero values), while all of the other 5 most important
system-defined root types (C<Int>, C<Blob>, C<Text>, C<Tuple>, C<Relation>)
are infinite, as are the C<Universal>, C<Scalar>, C<QuasiTuple>,
C<QuasiRelation> types.
All proper subtypes of finite types are themselves finite types. Proper
subtypes of infinite types can be either finite or infinite depending on
how they are defined. For example, a subtype of C<Int> whose numbers are
all simply greater than 10 is infinite, but a subtype whose numbers are
additionally all less than 1000 is finite. I<The documentation for
individual system-defined data types, further below, specifies whether each
of which is finite or infinite, and in the latter case, it states a most
generic means to specify a finite subtype.>
Note that, while it is not mandated by the language, some Muldis D
implementations may legitimately choose to impose restrictions on their
users such that the declared types of all persisting variables must be of
finite types only.
For example that all persisting C<Text> types must have a maximum allowed
length in characters specified, or that all persisting C<Int> types must
have a least and greatest allowed value specified. This would typically
happen if the implementation needs to use fixed-size fields internally,
such as 32-bit integers, and it is not practical to support the possibility
that a value could be of any size at all (this is often the case with SQL
databases implemented in C).
On the other hand, some implementations may natively support unlimited size
values, such as those written in Perl, and so these can allow persisting
the plain C<Text> or C<Int> types, which can make things less complicated
for their users.
Of course, even with implementations that require finite types, this isn't
to say that the declared type can't be a very large finite type, but then
the implementation can choose to use, for example, either a machine native
integer or a string of digits behind the scenes for all values of the type,
and can do this deterministically, depending what constraint the type
defining user chose.
=head2 Universal Implicit Operators
Muldis D is universally polymorphic to at least a small degree, such that
every data type without exception has both an C<assign> update operator
(for assigning a value of that type to a variable of that type) and an
C<is_equal> function for testing 2 values of that type for equality (as
well as C<is_not_equal>, for inequality). Moreover, these operators exist
implicitly, so when one defines the initial possrep of a new type, they get
those operators for the type at no extra cost.
I<This documentation is pending.>
=head1 ENVIRONMENT
The Muldis D DBMS / virtual machine, which by definition is the
environment in which Muldis D executes, conceptually resembles a hardware
PC, having a command processor (CPU), standard user input and output
channel, persistent read-only memory (ROM), volatile read-write memory
(RAM), and persistent read-write disk or network storage.
Within this analogy, the role of the PC's user, that feeds it through
standard input and accepts its standard output, is fulfilled by the
application that is driving the Muldis D DBMS; similarly, the application
itself will activate the virtual machine when wanting to use it (done in
this distribution by instantiating a new C<Muldis::DB::Interface::DBMS>
object), and deactivate the virtual machine when done (letting that object
expire).
When a new virtual machine is activated, the virtual machine has a default
state where the CPU is ready to accept user-input commands to process, and
there is a built-in (to the ROM) set of system-defined entities (data
types, operators, variables, etc) which are ready to be used to define or
be invoked by said user-input commands; the RAM starts out effectively
empty and the persistent disk or network storage is ignored.
Following this activation, the virtual machine is mostly idle except when
executing Muldis D commands that it receives via the standard input (done
in this distribution by invoking methods on the DBMS object). The virtual
machine effectively handles just one command at a time, and executes each
separately and in the order received; any results or side-effects of each
command provide a context for the next command.
At some point in time, as the result of appropriate commands, data
repositories, or "depots" (either newly created or previously existing)
that live in the persistent disk or network storage will be mounted within
the virtual machine, at which point subsequent commands can read or update
them, then later unmount them when done. Speaking in the terms of a
typical database access solution like the Perl DBI, this mounting and
unmounting of a repository usually corresponds to connecting to and
disconnecting from a database. Speaking in the terms of a typical disk
file system, this is mounting or unmounting a logical volume.
Any mounted persistent depot, as well as the temporary "application" depot
which is most of the conceptual PC's RAM, is home to all user-defined data
variables, data types, operators, constraints, packages, and routines; they
collectively are the database that the Muldis D DBMS is managing. Most
commands against the DBMS would typically involve reading and updating the
data variables, which in typical database terms is performing queries and
data manipulation. Much less frequently, you would also see "data
definition" changes, namely what user-defined variables, types, etceteras
exist, done fundamentally by data-updating special system-defined "catalog"
variables. Any updates to a persistent depot will usually last
between multiple activations of the virtual machine, while any updates to
the temporary "application" depot are lost when the machine deactivates.
All virtual machine commands are subject to a collection of both
system-defined and user-defined constraints (also known as business rules),
which are always active over the period that they are defined. The
constraints restrict what state the database can be in, and any commands
which would cause the constraints to be violated will fail; this mechanism
is a large part of what makes the Muldis D DBMS a reliable modeler of
anything in reality, since it only stores values that are reasonable.
Note that in practice, the aforementioned concept of "commands" is realized
by "statements" (which are grouped into "routines").
=head1 ROUTINES
There are several kinds of Muldis D routines, each of which is intended
for, and in many cases only permitted to be used for, particular tasks.
Note that for all Muldis D routines which have parameters, they are all
named rather than positional parameters; in the case of N-ary routines, the
N similar argument values come by way of a single nonscalar (or, if
necessary, quasi-nonscalar) typed parameter.
=over
=item C<function>
A C<function> is an explicitly invokable read-only operator whose
invocation both results in and represents a value of a specific data type
(that is the function's I<result type> or I<declared type>; this invocation
can only exist as part of a value-expression of another routine; the body
of a function is also itself a single value-expression (though its parts
can be named for internal reuse). A C<function> is pure and deterministic
in the functional-language sense, such that all of its 0..N parameters are
read-only / not subject to update, it has no lexical variables at all, and
that it can only see its own parameters, if it has any; it can not see any
global variables of any kind, and that it can only invoke C<function>
routines. The vast majority of invokable system-defined routines are
C<function>; they include all value selectors, and the typical numeric,
string, and relational operators, such that you would compose a typical
database "select" query out of.
=item C<update_operator>
An C<update_operator> is an explicitly invokable procedure with 1..N
parameters that has at least one parameter which is subject to update, and
that can only see or influence its own lexical variables (no globals); it
can only be invoked as the root part of a statement in another routine. An
C<update_operator> can only invoke C<function> and C<update_operator>
routines, and it is much like a C<function>, including being deterministic,
but that its result value is via a parameter. Most non C<function>
system-defined routines are C<update_operator>; they include all C<assign>
operators, plus some relational-assignment short-hands such as
"assign_insert", "assign_update", "assign_delete".
=item C<system_service>
A C<system_service> is an explicitly invokable system-defined procedure
with 0..N parameters that can reach outside of the deterministic DBMS
environment in order to do non-deterministic things (besides working with
depots), such as to initiate I/O of various kinds, or fetch the current
date and time, or generate a random number. Given the nature of this
beast, users can not define their own C<system_service> functions but by
updating the Muldis D implementation's source code itself. Invoking a
C<system_service> function can have side-effects outside of the DBMS, but
it will not alter anything inside the DBMS aside from any of its
subject-to-update parameters.
=item C<procedure>
A C<procedure> is an explicitly invokable routine with 0..N parameters that
can see and update global variables, and can invoke any kind of invokable
routine; it can only be invoked as the root part of a statement in another
routine. The C<procedure> is the only explicitly invokable routine that
can directly reference global containers, whether catalog or data (the
non-invokable C<main> can too). The vast majority of C<procedure> that
exist will be user-defined. But some system-defined routines that would
otherwise be C<function> or C<update_operator> are C<procedure> instead
solely because they are non-deterministic; an example is an operator that
derives a tuple sequence from a relation without fully sorting the tuples,
because the result is fundamentally random and non-repeatable.
=item C<type_constraint>
A C<type_constraint> is an implicitly invokable routine that is associated
with / is part of a data type definition and is invoked automatically when
a value of that type is being selected; it asserts whether said value,
which by this time is known to be acceptable to the current data type's
more generic supertype, is within the data type's own more restricted
domain. This routine can only see its own lexical variables (no globals).
This routine has 1 read-only parameter, which is the value to examine, and
it results in a Bool. The DBMS would then throw a
type-constraint-violation exception if the constraint results in False, and
no-op if it results in True. Conceptually speaking, a C<type_constraint>
will execute before any other kinds of constraints.
=item C<state_constraint>
A C<state_constraint> is an implicitly invokable routine that can see
global variables and is invoked automatically at the end of every statement
that the DBMS executes, wherein it asserts that all global variables are
collectively in a valid state; it can see said variables directly, but
updates none. This routine has no parameters and results in a Bool. The
DBMS responds as per a type constraint; if the constraint fails, then the
just-executed statement is rolled back, and an exception is thrown.
Conceptually speaking, a C<state_constraint> will execute after all
C<type_constraint> and before all C<transition_constraint>.
=item C<transition_constraint>
A C<transition_constraint> is an implicitly invokable routine that can see
global variables and is invoked automatically at the end of every statement
that the DBMS executes, wherein it asserts that all global variables have
collectively transitioned in a valid fashion between their before-update
state and their after-update state; it can see both versions of said
variables directly, but updates none. This routine has no parameters and
results in a Bool. The DBMS responds as per a state constraint.
Conceptually speaking, a C<transition_constraint> will execute after all
other kinds of constraints.
=item C<main>
A C<main> is the single anonymous procedure that is the "main program" of a
non-hosted Concrete Muldis D application. A C<main> is the same as a
C<procedure>, but that it can not be invoked by any other Muldis D routine,
it can not live in any depot, and it can not have any parameters. In a
mixed-language application, where Muldis D code is invoked by another host
language, there is no Muldis D C<main> at all, since it would be redundant
with host langauge routines.
=back
Note that Muldis D currently has no direct support for the concept of a
trigger-routine that can update a database; updating virtual relvars or
invoking C<procedure> are recommended instead. As for non-updating
trigger-routines, the state/transition constraint routines already perform
that feature. I<The feature in question may be directly supported later?>
I<This documentation is pending.>
=head1 USERS AND PRIVILEGES
The Muldis D DBMS / virtual machine itself does not have its own set of
named users where one must authenticate to use it. Rather, any concept of
such users is associated with individual persistent repositories, such that
you may have to authenticate in order to mount them within the virtual
machine; moreover, there may be user-specific privileges for that
repository that restrict what users can do in regards to its contents.
The Muldis D privilege system is orthogonal to the standard Muldis D
constraint system, though both have the same effect of conditionally
allowing or barring a command from executing. The constraint system is
strictly charged with maintaining the logical integrity of the database,
and so only comes into affect when an update of a repository or its
contents are attempted; it usually ignores what users were attempting the
changes. By contrast, the privilege system is strictly user-centric, and
gates a lot of activities which don't involve any updates or threaten
integrity.
The privilege system mainly controls, per user, what individual repository
contents they are allowed to see / read from, what they are allowed to
update, and what routines they are allowed to execute; it also controls
other aspects of their possible activity. The concerns here are analogous
to privileges on a computer's file system, or a typical SQL database.
I<This documentation is pending.>
=head1 TRANSACTIONS AND CONCURRENCY
This official specification of the Muldis D DBMS includes full ACID
compliance as part of the core feature set; moreover, all types of changes
within a repository are subject to transactions and can be rolled back,
including both data manipulation and schema manipulation; moreover, an
interrupted session with a repository must result in an automatic rollback,
not an automatic commit. (But changes that occur outside the DBMS
environment, such as by a C<system_service>, or by a host language routine,
are generally not affected by transactions at all.)
I<It is important to point out that any attempt to implement Muldis D (what
a Muldis DB Engine does) which does not include full ACID compliance, with
all aspects described above, is not a true Muldis D implementation, but
rather is at best a partial implementation, and should be treated with
suspicion concerning reliability. Of course, such partial implementations
will likely be made and used, such as ones implemented over existing DBMS
products that are themselves not ACID compliant, but you should see them
for what they are and weigh the corruption risks of using them.>
I<Note that the best way for an Engine to behave, if for some reason it is
built in such a way and/or over an existing DBMS product that does implicit
commits after, say, data-definition statements, is for it to throw an
exception if data-definition is attempted within an explicit /
multi-statement transaction, such that a user of that Engine can only do
data-definition outside of an explicit transaction; in this way, the Engine
is still following all the Muldis D safety rules, and hence should be
relatively safe to use, even if it lacks Muldis D features.>
Each individual instance of the Muldis D DBMS is a single process virtual
machine, and conceptually only one thing is happening in it at a time; each
individual Muldis D statement executes in sequence, following the
completion or failure of its predecessor. During the life of a statement's
execution, the state of the virtual machine is constant, except for any
updates (and side-effects of such) that the statement makes. Breaking this
down further, a statement's execution has 2 sequential phases; all reads
from the environment are done in the first phase, and all writes to the
environment are done in the second phase. Therefore, regardless of the
complexity of the statement, and even if it is a multi-update statement,
the final values of all the expressions to be assigned are determined prior
to any target variables being updated. Moreover, as all functions may not
have side-effects, and we don't support the concept of "trigger" routines
that can perform updates, we avoid complicating the issue due to
environment updates occurring during their invoker statement's first phase.
In account to situations where external processes are concurrently using
the same persistent (and externally visible) repository as a Muldis D
DBMS instance, the Muldis D DBMS will maintain a lock on the whole
repository (or appropriate subset thereof) during any active read-only
and/or for-update transaction, to ensure that the transaction sees a
consistent environment during its life. The lock is a shared lock if the
transaction only does reading, and it is an exclusive lock if the
transaction also does writing. Speaking in terms of SQL, the Muldis D
DBMS supports only the serializable transaction isolation level.
I<Note that there is currently no official support for using Muldis D in
a multi-threaded application, where its structures are shared between
threads, or where multiple thread-specific structures want to use the same
repositories. But such support is expected in the future.>
No multi-update statement may target both catalog and non-catalog
variables. If you want to perform the equivalent of SQL's "alter"
statement on a relation variable that already contains data, you must have
separate statements to change the definition of the relation variable and
change what data is in it, possibly more than one of each; the combination
can still be wrapped in an explicit transaction for atomicity.
Transactions can be nested, by starting a new one before concluding a
previous one, and the parent-most transaction has the final say on whether
all of its committed children actually have a final committed effect or
not. There are no "autonomous transactions" within the DBMS.
Transactions in Muldis D come in both implicit and explicit varieties, but
the implicit transactions only exist (that is, only have an effect) when
there are no explicit transaction active.
The normal way to specify an explicit transaction is to define a I<try>
block, since a I<try> block doubles as a transaction whose lifetime is
bound to the lexical scope of that block. The transaction will begin when
that scope is entered and end when that scope is exited; if the scope is
exited normally, its transaction commits; if the scope terminates early due
to a thrown exception, its transaction rolls back. Moreover, in a pure
Muldis D application, a I<try> block is the I<only> kind of explicit
transaction.
In a mixed-language application, when Muldis D routines are invoked by a
host language, the host language is allowed to specify further parent-most
explicit transactions within the DBMS that are not bound to the lexical
scope of a block, using distinct transaction initiation and termination
statements. Such open-ended transactions are intended for transactions
which last over multiple DBMS invocations of an application (whereas Muldis
D scope-bound transactions always occur entirely within one invocation of
the DBMS by a host language). But it is a recommended best practice that
host language code will associate the invocation of said statements with
its own lexical scopes, such as its own I<try-catch> constructs.
An implicit transaction is associated with the lexical scope of every
Muldis D C<update_operator> and C<system_service>, and by extension, every
Muldis D statement that is an invocation of said. Or more accurately, an
update operation (including a multi-update operation) is implicitly atomic,
and will either succeed and commit as a whole, or fail and rollback as a
whole. This is as if every update operator invocation was surrounded by its
own I<try> block, except that any thrown exceptions are not caught.
Similarly, every C<function> and C<\w+_constraint> has an implicit
transaction, though since these never update anything, all that really
means is that they see a consistent view of their environment.
By contrast, every C<procedure> and C<main> is neither implicitly a
transaction nor atomic (except for portions wherein that have explicit
I<try> blocks), so you can use a procedure to define an operation where you
want to keep partial results of a failure.
Since failures are always accompanied by thrown exceptions, a failure will
unwind the call stack and rollback any active transactions one nesting
layer at a time until either a I<try> block is exited, which halts the
unwinding, or the application exits, rolling back all remaining active
transactions.
If no explicit transactions are active at all when a failure occurs, then
each non-procedure-invoking statement in a procedure is the parent-most
transaction, and so a failure part-way through said procedure will result
in the prior-completed statements to be fully committed, and only the
failed statement to have left no state change. At this point, a pure
Muldis D application will have exited, and a mixed-language application
will have either exited or caught an exception in a host-language I<try>
block.
All currently mounted repositories (persistent and temporary both) are
joined at the hip with respect to transactions; a commit or rollback is
performed on all of them simultaneously, and a commit either succeeds for
all or fails for all (a repository suddenly becoming inaccessible counts as
a failure). I<Note that if a Muldis D implementation can not guarantee
such synchronization between multiple repositories, then it must refuse to
mount more than one repository at a time under the same virtual machine
(users can still employ multiple virtual machines, that are not
synchronized); by doing one of those two actions, a less capable
implementation can still be considered reliable and recommendable.>
Some Muldis D commands can not be executed within the context of a parent
transaction; in other words, they can only be executed directly by an
anonymous routine, the main examples being those that mount or unmount a
persistent repository; this is because such a change in the environment
mid-transaction would result in an inconsistent state.
I<Muldis D lets you explicitly place locks on resources that you don't
want external processes to change out from under you, and these locks do
not automatically expire when transactions end; or maybe they do; this
feature has to be thought out more.>
I<This documentation is pending.>
=head1 ENTITY NAMES
All entities that exist at some given time within a DBMS environment can be
explicitly referenced in some manner for definition and/or use; there are
no orphans. At the very least, every kind of DBMS entity is defined in one
or more catalog relvars; its interface and/or implementation can be
observed and possibly updated therein.
Note that the following namespaces assume that a program that is written in
Muldis D executes possibly either standalone or a peer-to-peer process
that can have its global variables made visible to other processes, or have
others' made visible to it. Or in other words, the program can both manage
its own dbvars and be a DBMS client, and the program can either just use
the DBMS itself or be a server of it.
This is the hierarchy of invocation namespaces of DBMS entities:
cat # system catalog describing everything; all but .system updateable
cat.system
cat.native
cat.mount
cat.foreign
cat.interp
sys # system-defined types and routines
sys.Core
sys.Core.<package>.<type>
sys.Core.<package>.<routine>
sys.<extension>
sys.<extension>.<package>.<type>
sys.<extension>.<package>.<routine>
app # user-defined entities local to this vm/app, not a depot
app.<var>
app.<type>
app.<routine>
app.<package>
app.<package>.<type>
app.<package>.<routine>
glo # global namespace to group currently mounted depots w mount names
glo.<depot>(.<schema>){0,}
glo.<depot>(.<schema>){0,}.<relvar>
glo.<depot>(.<schema>){0,}.<type>
glo.<depot>(.<schema>){0,}.<routine>
glo.<depot>(.<schema>){0,}.<package>
glo.<depot>(.<schema>){0,}.<package>.<type>
glo.<depot>(.<schema>){0,}.<package>.<routine>
dep # entities in a depot ref their own depot with this
dep(.<schema>){0,}
dep(.<schema>){0,}.<relvar>
dep(.<schema>){0,}.<type>
dep(.<schema>){0,}.<routine>
dep(.<schema>){0,}.<package>
dep(.<schema>){0,}.<package>.<type>
dep(.<schema>){0,}.<package>.<routine>
sch # entities in a schema ref their own schema with this
sch.<relvar>
sch.<type>
sch.<routine>
sch.<package>
sch.<package>.<type>
sch.<package>.<routine>
pkg # entities in a package ref their own package with this
pkg.<type>
pkg.<routine>
lex # entities in a routine or routine group ref their own with this
lex.param.<param>
lex.var.<var>
lex.func.<func>
lex.proc.<proc>
=head2 Temp Old Entity Names Docs
=over
=item C<sys.*>
Under here are all non-lexical system-defined, hardwired, readonly, eternal
entities.
=item C<sys.type.*>
These are the invocation-names of system-defined data types.
=item C<sys.rtn.*>
These are the invocation-names of system-defined (explicitly invokable)
routines.
=item C<sys.cat>
This is the read-only system catalog that describes all system-defined
entities, including data types, operators, and all catalogs.
=item C<sys.data.*>
These are some system-defined constants that hold values commonly useful in
user-defined routines; providing the constants this way is an alternative
to defining niladic functions which result in them. Moreover, some
system-defined functions that do have arguments may also or alternately
have analagous 2+ degree relcons here for use as lookup tables.
=item C<nat.*>
Under here are all non-lexical user-defined entities that are either
private to the current program, or that fundamentally live with the current
program but are shareable with peer programs (where the peer programs are
clients and the current program a server), or that are the current
program's naturalized perception of entities that fundamentally live with
peer programs (where the peer programs are servers and the current program
a client), or perhaps in a disk file instead. If the current program is
primarily viewed by users as a "DBMS server" or "transient RAM-based
embedded DBMS", then the "database" they are using (via their client
programs) is probably of the second entity group. If the current program
is primarily viewed by users as a "DBMS client" or "persistent file-based
embedded DBMS", then the "database" they are using is probably of the third
entity group. In the third case, it is very likely that some details of
this perception are coded into the current program itself and that the peer
program has a different perception that excludes those details; in that
case, the peer program is somewhat of a slave of the current program, as is
applicable. As far as non-lexical container/variable entities go, only
those whose type is C<Database> can be actually shared between peer
programs, and any other non-lexical containers/variables, if any are
strictly private to the current program.
=item C<nat.type.*>
These are the invocation-names of native user-defined data types.
=item C<nat.rtn.*>
These are the invocation-names of native user-defined explicitly invokable
routines.
=item C<nat.cat>
This is the user-updateable catalog that describes all native user-defined
entities, including data types, operators, data dbvars, and any other
private non-lexical variables that might exist.
=item C<nat.data.*>
These are the user-updateable native data dbvars and other non-lexical
containers/variables themselves, that the current program conceptually
or actually stores all of its transient and persistent data in.
=item C<lex.*>
Under here are all lexical entities whose invokability or life exists
solely within the scope of an executing routine; the definitions of those
entities are part of the C<*.cat> that defines the routines themselves.
=item C<lex.param.*>
These are the invocation names within a routine of that routine's
parameters, both its read-only and updatable ones.
=item C<lex.expr.*>
These are the invocation names within a routine of that routine's
internally reusable value-expressions.
=item C<lex.var.*>
These are the invocation names within a routine of that routine's lexical
data variables and constants which are not parameters.
=item C<lex.block.*>
I<TODO: Some namespace like this for statements or blocks of statements
that are scoped by if/else, or loops, or try/catch, etc.>
=item C<mnt>
This is a special user-updateable catalog which controls the mounting and
unmounting of depots; it is minimalist and does little else besides that;
most meta-data seen/updated here is specific to the Muldis D implementation
in use.
=item C<foreign.*>
Under here are all non-lexical user-defined dbvar (catalog and data)
entities for which the current program is primarily viewed by users as a
"DBMS client" or "persistent file-based embedded DBMS"; specifically, these
are a cleaned-up current program perception of how the foreign program sees
its own entities, which may or may not be the same as how the current
program's naturalized version under C<nat.*> is. Generally speaking, when
the current program is reverse-engineering or scanning the remote program,
or the disk files, the results of that appear under C<foreign.*>, and not
under C<nat.*>; that's not to say that the current program can't
subsequently update its C<nat.*> catalog to match, but doing so is strictly
optional, and typically done just by generic DBMS utility programs, rather
than programs that do a specific job like payroll or genealogy. There is a
distinct foreign.<depot>.* name-space for each connection to the other
program, or for each disk file, which are considered by Muldis D as a
I<depot>. Note that there is no C<< foreign.<depot>.type.* >> and C<<
foreign.<depot>.rtn.* >> and C<< foreign.<depot>.data.* >>, as the current
program may not invoke those under those names, though their descriptions
are available under C<< foreign.<depot>.cat >>; they are only invokable via
nat.* perceptions of them. Note that, while cleaned up, a number of
implementation-specific details may leak through here, possibly defined in
a non-Muldis D language, if the peer program is not implemented itself by
Muldis D and some of its concepts won't automatically express in
Muldis D native terms, without user interpretation.
=item C<< foreign.<depot>.cat >>
This is the probably reverse-engineered catalog of the foreign DBMS
program, or disk file that this depot represents; it is directly
user-updateable as much as that makes sense.
=item C<interp.*>
Under here are Muldis D implementation specific mapping details that bridge
between corresponding C<nat.cat.*> and C<< foreign.<depot>.cat >> entities,
sometimes using non-Muldis D but Engine-specific language. These details
interpret between the foreign entities and their native perceptions.
=item C<< interp.<depot>.cat >>
This is a catalog, possibly reverse-engineered or possibly coded into the
current program, that defines the mapping specs or routines to mediate a
single C<nat.cat.*> and C<< foreign.<depot>.cat >> pair.
=back
I<This documentation is pending.>
=head1 CATALOGS
The Muldis D catalog relcons and relvars collectively reflect and/or
control all entities in a DBMS. Given that the catalog provides complete
descriptions of both the interface and implementation of each DBMS entity,
for user-defined entities, and just the interface for system-defined
entities, understanding these is akin to understanding the native grammar
of Muldis D. This grammar is extremely simple by intention, but at a cost
of being a little more verbose.
=head2 Catalog Relcons For System-Defined Entities
This section describes the structure of all C<< cont.sys.<unq_name> >>
catalog relcons, which themselves describe all system-defined DBMS entities
in a computer-readable manner.
I<This documentation is pending.>
=head2 Catalog Relvars For Depot Appearance Control
This section describes the structure of all C<< cont.mnt.<unq_name> >>
special catalog relvars, which reflect and control which depots are
currently mounted in the DBMS. Users update these to open or close
client-server DBMS engine connections, or to attach or detach file-based
database files, create or delete the depots themselves, or associate,
disassociate, create, or delete shared memory based depots, mount or
unmount filesystem-based depots, etc. Updating these relvars has
side-effects in making the entities belonging to a depot, named C<*.db.*>,
appear in or disappear from view. Details stored here include analogies to
DSNs, database file names, DBMS server names and addresses, authentication
details like login names and passwords. What details are stored per depot
can vary significantly depending on which Muldis D implementation is in
use, but this variance is limited to just C<cont.mnt.depot_detail>. Note
that it is forbidden to update any C<mnt> relvars while a multi-statement
transaction is active, because a transaction subjugates all entities
concurrently visible or mounted in a DBMS, such that they must all commit
or rollback as a unit.
I<This documentation is pending.>
=head2 Catalog Relvars for User-Defined Entities
This section describes the structure of all C<< cont.cat.app.<unq_name> >>
and C<< cont.cat.db.<depot>.<unq_name> >> general catalog relvars, the set
of C<< <unq_name> >> for each of which is identical, that reflect and
control user-defined entities, including data types, routines, non-lexical
variables (which are all relvars, real or virtual), state constraints, etc.
Users update these to create or drop their relvars, data types, routines,
constraints, etc. Updating these catalog relvars has side-effects in
making global data relvars, named C<*.data.*>, appear, disappear, or change
in structure.
I<This documentation is pending.>
=head1 SEE ALSO
Go to L<Language::MuldisD> for the majority of distribution-internal
references, and L<Language::MuldisD::SeeAlso> for the majority of
distribution-external references.
=head1 AUTHOR
Darren Duncan (C<perl@DarrenDuncan.net>)
=head1 LICENSE AND COPYRIGHT
This file is part of the formal specification of the Muldis D language.
Muldis D is Copyright © 2002-2007, Darren Duncan.
See the LICENSE AND COPYRIGHT of L<Language::MuldisD> for details.
=head1 ACKNOWLEDGEMENTS
The ACKNOWLEDGEMENTS in L<Language::MuldisD> apply to this file too.
=cut