NAME

DBIx::DataModel::Doc::Design - Architecture and design principles

DOCUMENTATION CONTEXT

WARNING : Version 2 of DBIx::DataModel is a major refactoring from versions 1.*, with a number of incompatible changes in the API (classes renamed, arguments renamed or reorganized, etc. -- see DBIx::DataModel::Doc::Delta_2.0). The documentation below has not been fully updated yet to reflect these changes.

This chapter is part of the DBIx::DataModel manual.

This chapter covers the basic architecture of DBIx::DataModel, and the main motivating principles for proposing yet another ORM. Read it if you are currently evaluating whether DBIx::DataModel is suitable for your context, or if you want to globally understand how it works. Skip it and jump to the QUICKSTART chapter if you want to directly start using the framework.

GENERAL ARCHITECTURE

Classes

The following picture shows the main classes in DBIx::DataModel lass hierarchy :

       FRAMEWORK CLASSES
       =================

 +-----------------+               +----------------------------+
 | DBIx::DataModel |               | DBIx::DataModel::Statement |
 +-----------------+               +----------------------------+

 +-------------------------+   +-------------------------+
 | DBIx::DataModel::Schema |	 | DBIx::DataModel::Source |
 +--^----------------------+ 	 +---^-----------------^---+
    |                             /                   \
    |                            /                     \
    |     +---------------------^--+  +-----------------^-----+
    |     | DBIx::DataModel::Table |  | DBIx::DataModel::Join |
    |     +-----------------^------+  +--------------------^--+
    |                       |                             /
====|=======================|============================/=======
    |  APPLICATION CLASSES  |                           /
    |  ===================  |                          /
    |                       |                         /
+---^------+     +----------^--------+               /
| MySchema |     | MySchema::Table_n |-+            /
+----------+     +--+---------^------+ |-+         /
                    +--+-------\--^----+ |        /
                       +--------\--\--^--+       /
                                 \  \  \        /
                                +-^--^--^------^------+
                                | auto_generated_join +-+
                                +--+------------------+ |-+
                                   +--+-----------------+ |
                                      +-------------------+

The top half of the picture represents the parent classes distributed with DBIx::DataModel. The bottom half represents derived classes created for a given application. Most objects created during the lifetime of the application will be either instances of those application-specific classes (tables and views), or instances of the DBIx::DataModel::Statement class.

The entry class DBIx::DataModel is just a façade interface to DBIx::DataModel::Schema. The helper class DBIx::DataModel::Statement implements the "select" method.

Subclasses of DBIx::DataModel::Schema are created by the Schema method in DBIx::DataModel; in most cases only one such class will be needed, unless the application talks to several databases simultaneously.

Subclasses of DBIx::DataModel::Source::Table represent tables in the database and are created by the Table() method in DBIx::DataModel::Schema.

Subclasses of DBIx::DataModel::View represent specific SQL queries, in particular queries that join several tables. They may be created explicitly by calling the View method in DBIx::DataModel::Schema; but in most cases they will be indirectly created through calls to the join method. View subclasses use multiple inheritance : they inherit first from DBIx::DataModel::View, but also from the supplied list of parent tables. As a result, instances of such views can exploit all role methods of their parent tables.

Instances

Data rows retrieved from the database are encapsulated as instances of the application-specific Table and View subclasses. Methods in those objects are either various ways to navigate through the associations in the database and retrieve related rows, or methods to modify the data.

A request to the database is encapsulated as an instance of DBIx::DataModel::Statement. This instance has methods for preparing the SQL query, binding parameters to it, executing the query, and getting at the resulting data rows. Statement instances are usually short-lived and confined to specific internal parts of the application, while table or view instances are usually transmitted to the presentation layers of the application, in order to exploit the data within reports, forms, etc. Data rows know from which source they were created, because they are blessed into table or view classses; but they do not know from which statement they were queried.

The following picture shows relationships between classes and instances :

FRAMEWORK CLASSES                  +----------------------------+
=================                  | DBIx::DataModel::Statement |
                                   +----------------------------+
                                                            |
============================================================|=====
APPLICATION CLASSES                                         |
===================                                         |
                                                            |
+----------+     +-------------------+                      |
| MySchema |     | MySchema::Table_n |-+                    |
+----------+     +--+----------------+ |-+                  |
                    +--+---------------+ |                  |
                       +-----------------+                  |
                           |                                |
                           |    +---------------------+     |
                           |    | auto_generated_view +-+   |
                           |    +--+------------------+ |   |
                           |       +--------------------+   |
                           |         |                      |
===========================|=========|======================|=====
INSTANCES                  |         |                      |
=========              +-----+    +-----+          +-----------+
                       | row |    | row |          | statement |
                       +-----+    +-----+          +-----------+

In contrast with some other ORMs, Schema subclasses have no runtime instances : all information is within the schema subclass. This design is discussed below in the DESIGN FAQ section.

Meta-classes

Each application class has a metaclass, i.e. an object of a class in the DBIx::DataModel::Meta namespace.

Metaclasses hold information about application classes, and implement some of their methods. In most cases, this is totally transparent to end users; however, users can interact with metaclasses to get some information about the available tables, associations, etc., or even to change their behaviour. See TODO:MORE_DOC_ON_METACLASSES.

[TODO : PICTURE]

Polymorphic methods

Methods join and select can be applied to various kinds of objects, with various kinds of arguments, and can return various kinds of results. Polymorphism in this way is not exactly common object-oriented practice, but it has been intentionally designed so, in a attempt to "do the right thing" in different situations, while taking care of the inner details.

Polymorphic join

The join method, when applied to a schema, is a class method that dynamically generates a view (i.e. a new subclass of DBIx::DataModel::View), starting from a given class and then following the associations.

When applied to a table or a view, join is a class method that creates a statement to query one or several associated tables. The statement contains a condition to restrict the results according to the initial association.

When applied to a data row (to an instance of a table or a view), join is an instance method that works like the class method (i.e. creates a statement), but in addition binds values of the current object to the appropriated placeholders within the query. This is illustrated below in the section about statements.

In all of these cases, join is meant to produce some kind of data source from which one can later select in order to get at data rows.

Polymorphic select

The select method, when applied to a table or a view, is a class method that generates a statement object, and returns either that object, or something generated by that object (data rows, SQL code, or a low-level sth handle). Users can control the return value through the -resultAs parameter.

When applied to an already existing statement object, select is an instance method that executes that statement, and likewise returns various things depending on the -resultAs parameter.

In both cases, select is meant to start from some kind of data source, and to yield either immediate data rows or some intermediate object that later will produce data rows.

ASSOCIATION

[TODO : REWORK THE WHOLE SECTION. Below are some bits remaining from previous documentation]

Roles as additional methods in table classes

As a result of the association declaration, the Perl class $table1 will get an additional method named $role2 for accessing the associated object(s) in $table2; that method normally returns an arrayref, unless $multiplicity2 has maximum '1' (in that case the return value is a single object ref). Of course, $table2 conversely gets a method named $role1.

To understand why tables and roles are crossed, look at the UML picture :

+--------+                     +--------+
|        | *              0..1 |        |
| Table1 +---------------------+ Table2 |
|        | role1         role2 |        |
+--------+                     +--------+

so from an object of Table1, you need a method role2 to access the associated object of Table2. In your diagrams, be careful to get the role names correctly according to the UML specification. Sometimes we see UML diagrams where role names are on the wrong side, mainly because modelers have a background in Entity-Relationship or Merise methods, where it is the other way around.

Role methods perform joins within Perl (as opposed to joins directly performed within the database). That is, given a declaration

$schema->Association([qw/Employee   employee   1   /],
                     [qw/Activity   activities 0..*/]);

we can call

my activities = $an_employee->activities

which will implicitly perform a

SELECT * FROM Activity WHERE emp_id = $an_employee->{emp_id}

The role method can also accept additional parameters in SQL::Abstract::More format, exactly like the select() method. So for example

my $activities = $an_employee->activities(-columns => [qw/act_name salary/],
                                          -where   => {is_active => 'Y'});

would perform the following SQL request :

SELECT act_name, salary FROM Activity WHERE 
  emp_id = $anEmployee->{emp_id} AND
  is_active = 'Y'

If the role method is called without any parameters, and if that role was previously expanded (see expand() method), i.e. if the object hash contains an entry $obj->{$role}, then this data is reused instead of calling the database again (this is a primitive form of caching). To force a new call to the database, supply some parameters :

$emp->expand('activities');  # stores result in $emp->{activities}
$list = $emp->activities;    # returns cached $emp->{activities}
$list = $emp->activities(-columns => '*');  # default columns, but 
                                            # forces a new call to the DB

Sometimes associations are unidirectional (it does not make sense to traverse the association in both directions). In such cases, just supply 0 or an empty string (or even the string "0" or '""' or -- or "none") to one of the role names : then the corresponding role method is not generated.

Methods insert_into_...

When a role has multiplicity '*', another method named insert_into_... is also installed, that will create new objects of the associated class, taking care of the linking automatically :

$an_employee->insert_into_activities({d_begin => $today, 
                                      dpt_id  => $dpt});

This is equivalent to

$schema->table('Activity')->insert({d_begin => $today, 
                                    dpt_id  => $dpt, 
                                    emp_id  => $anEmployee->{emp_id}});

Many-to-many associations

UML conceptual models may contain associations where both roles have multiplicity '*' (so-called many-to-many associations). However, when it comes to actual database implementation, such associations need an intermediate linking table to collect couples of identifiers from both tables.

DBIx::DataModel supports many-to-many associations as a kind of syntactic sugar, translated into low-level associations with the linking table. The linking table needs to be declared first :

$schema->Table(qw/LinkTable link_table prim_key1 prim_key2/);

$schema->Association([qw/Table1     role1       0..1/],
                     [qw/LinkTable  link_table  *   /]);

$schema->Association([qw/Table2     role2       0..1/],
                     [qw/LinkTable  link_table  *   /]);

This describes a diagram like this :

+--------+                   +-------+                   +--------+
|        | 0..1            * | Link  | *            0..1 |        |
| Table1 +-------------------+  --   +-------------------+ Table2 |
|        | role1  link_table | Table | link_table  role2 |        |
+--------+                   +-------+                   +--------+

Then we can declare the many-to-many association, very much like ordinary associations, except that the last items in the argument lists are names of roles to follow, instead of names of columns to join. In the diagram above, we must follow roles link_table and role2 in order to obtain the rows of Table2 related to an instance of Table1; so we write

$schema->Association([qw/Table1  roles1  *  link_table role1/],
                     [qw/Table2  roles2  *  link_table role2/]);

which describes a diagram like this :

+--------+                    +--------+
|        | *                * |        |
| Table1 +--------------------+ Table2 |
|        | roles1      roles2 |        |
+--------+                    +--------+

The declaration has created a new method roles2 in Table1; that method is implemented by following roles linksA and role2. So for an object obj1 of Table1, the call

my $obj2_arrayref = $obj1->roles2();

will generate the following SQL :

SELECT * FROM link_table INNER JOIN table2
          ON link_table.prim_key2=table2.prim_key2
  WHERE link_table.prim_key1 = $obj->{prim_key1}

Observe that roles2() returns rows from a join, so these rows will belong both to Table2 and to Link_Table.

Many-to-many associations do not have an automatic insert_into_* method : you must explicitly insert into the link table.

Following multiple associations

In the previous section we were following two roles at once in order to implement a many-to-many association. More generally, it may be useful to follow several roles at once, joining the tables in a single SQL query. This can be done through the following methods :

  • "join" (invoked on a Schema) : create a new View that selects from several tables, filling the joins automatically

  • "join" (invoked on an instance of Table or View) : from a given object, follow a list of roles to get information from associated tables.

  • "join" (invoked on Table or View class) : from a given class, follow a list of roles to prepare getting information from associated tables; the result is a Statement object that can be bound to specific members of the initial table and then can be selected.

  • "MethodFromJoin" : add a new method in a table, that will follow a list of roles (shortcut for repeated calls to join).

STATEMENT OBJECTS

The following section is about statement objects and their role in the general DBIx::DataModel architecture.

Difference between views and statements

Both views and statements encapsulate SQL SELECT queries, so some clarification is of order.

A view is a subclass of DBIx::DataModel::View, and therefore also a subclass of DBIx::DataModel::Source. Data rows retrieved from that source become instances of the view. The view usually encapsulates a database join, and the table classes corresponding to the joined tables are also parent classes for the view (multiple inheritance). This means that instances of the view inherit all parent methods for manipulating columns, navigating through associations, etc. The view may include a WHERE clause to restrict the database query, but this is very unfrequent : the main purpose of a view is to encapsulate a join.

By contrast, a statement is an instance of DBIx::DataModel::Statement, most frequently without any subclassing, that represents a particular request to a particular data source (a datasource is a subclass of DBIx::DataModel::Source, i.e. either a table or a view), and technically encapsulates a reference to a DBI statement handle (sth). The statement object goes through a lifecycle with following steps : assembling query clauses, generating the SQL, binding values, preparing the database statement, executing the database query, retrieving the results and blessing them into appropriate classes. Statements also have pagination methods to walk through the results in chunks of several data rows.

With respect to the generated SQL, we could say in short that a view represents the FROM clause of the SQL, while a statement represents all other clauses (list of columns, WHERE, ORDER BY, GROUP BY, etc.).

Stepwise building of the SQL query

Principle

A statement object can accumulate requirements in several steps, before generating the actual database query. Therefore it is a collaborative platform where various independent software components can contribute to various parts of the final SQL.

Example

# create a statement with initial conditions on the department
my $statement = $department->join(qw/activities employee/);

# add a date condition (from config file or CGI params or whatever)
my $date = get_initial_date_from_some_external_source();
$statement->refine(-where => {d_begin => {">" => $date}});

# now issue the SQL query
my $rows = $statement->select(-columns => [qw/d_begin lastname firstname/]);

This code generates the following SQL :

SELECT d_begin, lastname, firstname
FROM   activity INNER JOIN employee
                ON activity.emp_id=employee.emp_id
WHERE  dpt_id  = $departement->{dpt_id}
  AND  d_begin > $date

Behind the scene, the join method first created a view representing the database join between activity and employee; then it created a statement object that would query that view with an initial condition on dpt_id. The refine call added a second condition on d_begin. Finally the select method specified which columns to retrieve.

Stepwise parameter binding through named placeholders

DBIx::DataModel::Statement objects have their own mechanism of placeholders and parameter binding. Of course this gets ultimately translated into usual placeholders at the DBI and database layers; but an additional layer was needed here in order to allow for stepwise building of SQL conditions as just demonstrated above.

Stepwise binding of values to placeholders requires named placeholders, as opposed to usual positional placeholders. Named placeholders are usually expressed with a question mark followed by a name, like in

$statement->refine(-where => {col1 => '?foo',
                              col2 => '?bar',
                              col3 => '?foo'});

Values are bound to these named parameters (either before or after the refine step) through the "bind" method :

$statement->bind(foo => 123, bar => 456);

If the question mark prefix ? is inconvenient, another placeholder prefix may be specified as an option to the schema creation method.

Lifecycle

Principle

The statement object goes through a sequence of states before delivering data rows. Some methods are only available in a given state, as will be explained below. At any time, the "status" method tells which is the current state of the statement object.

Status values

new

The statement has just been created. While in this state, it can accumulate new condition (new WHERE clauses that will end up in the generated SQL) through the "refine" method. Early parameter binding may also occur through the "bind" method.

sqlized

All conditions accumulated into the statement have been translated into SQL, so it is no longer possible to call the "refine" method to add new clauses. However, it is still possible to call the "bind" method to bind values to the placeholders.

prepared

The SQL has been sent to the database and a DBI sth handle has been generated.

executed

Bound values have been sent to the database and the sth is executed, ready to extract data rows.

State diagram

+--------+     select()      +--------------------+
| source |------------------>|    newStatement    |<----<-------,
+--------+                   +--------------------+     |       |
                  sqlize()      |    |   |    refine()  |       |
                 ,----<---------'    |   '------>-------'       |
                 |                   |                 bind(..) |
                 |                   '------------------>-------'
                 |
                 |           +--------------------+
                 '---->----->|  sqlizedStatement  |<----<-------,
                             +--------------------+             |
                  prepare()     |    |                 bind(..) |
                 ,----<---------'    '------------------>-------'
                 |
                 |           +--------------------+
                 '---->----->| preparedStatement  |<----<-------,
                             +--------------------+             |
                  execute()     |    |                 bind(..) |
                 ,----<---------'    '------------------>-------'
                 |
                 |           +--------------------+
                 '---->----->| executedStatement  |<----<-------,
                             +--------------------+     |       |
                                |    |   |     bind(..) |       |
                                |    |   '------>-------'       |
                                |    |                execute() |
                                |    '------------------>-------'
+-------------+  next()/all()   |
| data row(s) |-------<---------'
+-------------+

COLUMN HANDLERS

GENERAL DESIGN PRINCIPLES

Material in the previous sections presented the general architecture of DBIx::DataModel; this should be enough to easily follow the QUICKSTART chapter, or investigate more details in the REFERENCE chapter.

Now we will discuss the motivation for some design features of DBIx::DataModel, in order to explain not only how it works, but also why it was designed that way. This section can be safely skipped, unless you are interested in comparing various ORMs.

Help lower-level layers, do not hide them

DBIx::DataModel provides abstractions that help client applications to automate some common tasks; however, access to lower-level layers remains open, for cases where detailed operations are needed :

  • The generated classes contain methods that can return polymorphic results. By default, the return value is an object or a list of objects corresponding to data rows; however, these methods can also return a handle to the underlying DBI statement, or even just the generated SQL code. Hence, the client code can take control whenever any fine tuning is needed.

  • Data rows exploit the dual nature of Perl objects : on one hand they can be seen as objects, with methods to walk through the data and access related rows from other tables, but on the other hand they can also be seen as hashrefs, with usual Perl idioms for extracting keys, values or slices of data. This dual nature is important for passing data to external helper modules, such as XML generators, Perl dumps, javascript JSON, templates of the Template Toolkit, etc. Such modules need to walk on the data tree, exploring keys, values and subtrees; so they cannot work if data columns are implemented as object-oriented methods (because there is no simple way to ask for all available methods, and even if you get there, it is not possible to distinguish which of those methods encapsulate relevant data).

Let the database do the work

Use RDBMS tools to create the schema

Besides basic SQL data definition statements, RDBMS often come with their own helper tools for creating or modifying a database schema (interactive editors for tables, columns, datatypes, etc.). Therefore DBIx::DataModel provides no support in this area, and assumes that the database schema is pre-existent.

To talk to the database, the framework only needs to know a bare minimum about the schema, namely the table names, primary keys, and UML associations; but no details are required about column names or their datatypes.

Let the RDBMS check data integrity

Most RDBMS have facilities for checking or ensuring integrity rules : foreign key constraints, restricted ranges for values, cascaded deletes, etc. DBIx::DataModel can also do some validation tasks, by setting up column types with a validate handler; however, it is better advised to exploit data integrity checks within the RDBMS whenever possible.

Exploit database projections through variable-size objects

Often in ORMs, columns in the table are in 1-to-1 correspondence with attributes in the class; so any transfer between database and memory systematically includes all the columns, both for selects and for updates. Of course this has the advantage of simplicity for the programmer. However, it may be very inefficient if the client program only wants to read two columns from a very_big_table.

Furthermore, unexpected concurrency problems may occur : in a scenario such as

client1                            client2
=======			     =======
my $obj = My::Table->fetch($key);  my $obj = My::Table->fetch($key);
$obj->set(column1 => $val1);	     $obj->set(column2 => $val2);
$obj->update;                	     $obj->update;

the final state of the row should theoretically be consistent for any concurrent execution of client1 and client2. However, if the ORM layer blindly updates all columns, instead of just the changed columns, then the final value of column1 or column2 is unpredictable.

To diminish the efficiency problem, some ORMs offer the possibility to partition columns into several column groups. The ORM layer then transparently fetches the appropriate groups in several steps, depending on which columns are requested from the client. However, this might be another source of inefficiency, if the client frequently needs one column from the first group and one from the second group.

With DBIx::DataModel, the client code has precise control over which columns to transfer, because these can be specified separately at each method call. Whenever efficiency is not an issue, one can be lazy and specify nothing, in which case the SELECT columns will default to "*". Actually, the schema does not know about column names, except for primary and foreign keys, and therefore would be unable to transparently decide which columns to retrieve. Consequently, objects from a given class may be of variable size :

  my $objs_A = My::Table->select(-columns => [qw/c1 c2/],
		 	         -where   => {name => {-like => "A%"}};

  my $objs_B = My::Table->select(-columns => [qw/c3 c4 c5/],
			         -where   => {name => {-like => "B%"}};

  my $objs_C = My::Table->select(# nothing specified : defaults to '*'
                                 -where   => {name => {-like => "C%"}};

Therefore the programmer has much more freedom and control, but of course also more responsability : in this example, attempts to access column c1 in members of @$objs_B would yield an error.

Exploit database products (joins) through multiple inheritance

ORMs often have difficulties to exploit database joins, because joins contain columns from several tables at once. If tables are mapped to classes, and rows are mapped to objects of those classes, then what should be the class of a joined row ? Three approaches can be taken

  • ignore database joins altogether : all joins are performed within the ORM, on the client side. This is of course the simplest way, but also the less efficient, because many database queries are needed in order to gather all the data.

  • ask a join from the database, then perform some reverse engineering to split each resulting row into several objects (partitioning the columns).

  • create on the fly a new subclass that inherits from all joined tables : data rows then simply become objects of that new subclass. This is the approach taken by DBIx::DataModel.

High-level declarative specifications

Relationships expressed as UML associations

Relationships are expressed in a syntax designed to closely reflect how they would be pictured in a Unified Modelling Language (UML) diagram. The general form is :

$schema->Association([$class1, $role1, $multiplicity1, @columns1],
                     [$class2, $role2, $multiplicity2, @columns2]);

yielding for example the following declaration

$schema->Association([qw/Department department 1 /],
                     [qw/Activity   activities * /]);

which corresponds to UML diagram

+------------+                         +------------+
|            | 1                  0..* |            |
| Department +-------------------------+ Activities |
|            | department   activities |            |
+------------+                         +------------+

This states that there is an association between classes MySchema::Department and MySchema::Activity, with the corresponding role names (roles are used to navigate through the association in both directions), and with the corresponding multiplicities (here an activity corresponds to exactly one employee, while an employee may have many activities).

In the UML specification, role names and multiplicities are optional (as a matter of fact, many UML diagrams use association names, or even anonymous associations, instead of role names). Here, both role names and multiplicities are mandatory, because they are needed for code generation.

The association declaration is bidirectional, so it will simultaneously add features in both participating classes.

In order to generate the appropriate SQL join statements, the framework needs to know the join column names on both sides; these can be either given explicitly in the declaration, or they are guessed from the primary key of the table with multiplicity 1.

Role names declared in the association are used for a number of purposes : implementing methods for direct navigation, implementing methods for inserting new members into owner objects, and implementing multi-step navigation paths through several assocations, such as in :

$myDepartment->join(qw/activities employee spouse/)
             ->select(-columns => \@someColumns,
                      -where   => \%someCriteria);

Information known by the schema about the associations will be used to automatically generate the appropriate database joins. The kinds of joins (INNER JOIN, LEFT OUTER JOIN) are inferred from the multiplicities declared in the association. These can also be explicitly overridden by writing

...->join(qw/activities <=> employee <=> spouse/) # inner joins

...->join(qw/activities  => employee  => spouse/) # left joins

If referential integrity rules are declared within the RDBMS, then there is some overlap with what is declared here on the Perl side. However, it would not be possible to automatically deduce all association information from database metadata, because the database does not know about role names and multiplicities. A partial schema can be automatically generated using DBIx::DataModel::Schema::Generator, but it usually needs some manual additions to be really useful.

UML compositions for handling data trees

Compositions are specific kinds of associations, pictured in UML with a black diamond on the side of the composite class; in DBIx::DataModel, those are expressed by calling the schemas's Composition method instead of Association. As a result, the composite class will be able to perform cascaded insertions and deletions on data trees (for example from structured data received through an external XML or JSON file, and inserted into the database in a single method call).

The reverse is also true : the composite class is able to automatically call its own methods to gather data from associated classes and build a complete data tree in memory. This is declared through the AutoExpand method and is useful for passing structured data to external modules, like for example XML or JSON exports.

ColumnTypes

A DBIx::DataModel schema can declare some column types : these are collections of handlers (callback functions) for performing tasks such as data validation or transformation. Handlers are then attached to specific columns belonging to that column type.

The handler concept is generic and can be exploited by client programs according to the application domain. However, some handler names have a special meaning within the framework : for example, handlers named fromDB or toDB are automatically called when transfering data from or to the database. Take for example the "Percent" column type shown in the Synopsis :

# 'percent' conversion between database (0.8) and user (80)
$schema->ColumnType(Percent =>
   fromDB   => sub {$_[0] *= 100 if $_[0]},
   toDB     => sub {$_[0] /= 100 if $_[0]},
   validate => sub {$_[0] =~ /1?\d?\d/});

Note that this notion of "type" is independent from the actual datatypes defined within the database (integer, varchar, etc.). From the Perl side, these are all seen as scalar values. So a column type as defined here is just a way to specify some operations, programmed in Perl, that can be performed on the scalar values.

Autoload on demand

The default mechanism to access columns within a row is the hashref API:

do_something_with($my_row->{column_name});

However, a method call API can be turned on, which would then yield:

do_something_with($my_row->column_name());

Views within the ORM

A schema can contain View declarations, which are abstractions of SQL statements. This is exactly the same idea as database views, except that they are implemented within the ORM, not within the database. Such views can join several tables, or can specify WHERE clauses to filter the data. ORM views are useful to implement application-specific or short-lived requests, that would not be worth registering persistently within the database model. They can also be useful if you have no administration rights in the database. Of course it is also possible to access database views, because the ORM sees them as ordinary tables.

Extended SQL::Abstract API

Every method involving a SELECT in the database (either when searching rows from a table or collection of tables, or when following associations from an existing row) accepts an number of optional parameters that closely correspond to SQL clauses. The programming interface reuses what is defined in the excellent SQL::Abstract module, with some extensions. Therefore it is possible for example to specify

  • which columns to retrieve

  • which restriction criteria to apply (WHERE clause)

  • how to order the results

  • whether or not to retrieve distinct rows

  • etc.

All these parameters are specified at the statement level, and therefore may vary between subsequent calls to the same class. This is in contrast with many other ORMs where the set of columns or the ordering criteria are specified at schema definition time. As already stated above, DBIx::DataModel gives more freedom to client programs, but also more responsability.

Efficient interaction with the DBI layer

Great care has been taken to interact with the database in the most efficient way, and to leave an open access to DBI fine-tuning options. In particular :

  • At the DBI level, the fastest way to get a large number of data rows from DBI is to retrieve each row into the same memory location, using the bind_columns method. DBIx::DataModel can exploit this feature through a fast statement :

    my $statement = My::Table->select(-columns  => ...,
                                      -where    => ...,
                                      -resultAs => 'fast_statement');
    while (my $row = $statement->next) {
      work_with($row);
    }

    This code creates a single memory location for storing a data row; at each call to the next method, memory values are updated from the database, and the same location is returned.

    While being very fast, this approach also has some limitations : for example it is not possible to put such rows into an array (because the array would merely contain multiple references to the last row). So fast statements are not activated by default; regular statements create a fresh hashref for each row.

  • The client code can have fine control on statement preparation and execution, which is useful for writing efficient loops. For example, instead of writing

    my $list = My::Table->select(...);
    foreach my $obj (@$list) {
      my $related_rows = $obj->join(qw/role1 role2/)->select;
      ...
    }

    we can prepare a statement before the loop, and then just execute that statement at each iteration :

    my $statement = My::Table->join(qw/role1 role2/)->prepare;
    my $list = My::Table->select(...);
    foreach my $obj (@$list) {
      my $related_rows = $statement->execute($obj)->all;
      ...
    }
  • DBI has a prepare_cached method, that works like prepare except that the statement handle returned is stored in a hash associated with the $dbh. This can be exploited from DBIx::DataModel by stating

    $schema->dbiPrepareMethod('prepare_cached');

DEPENDENCIES

DBIx::DataModel depends on

DBI
SQL::Abstract::More

DESIGN FAQ

Here are answers to some design choices, in the form of a FAQ.

Global state in class variables ? Isn't that bad design ?

The philosophy of DBIx::DataModel is that a record is nothing more than a blessed hashref, where hash keys are column names and hash values are column values. So all information about schemas, relationships, connections, etc is indeed within a classData hashref associated with each class. This class data is a global resource, and like with other global resources in Perl (for example STDIN, %ENV, or special variables $/, $,, etc.), several clients can interact through the global state.

When used with care, interaction of several components through a global resource can be quite handy : for example STDIN does not need to be explicitly passed to every component, it is always available; furthermore, STDIN can be redirected at one specific place and all collaborating components will immediately change behaviour accordingly. However, this must be done with care, because there is also a risk of undesired "action at distance" --- maybe the other components wanted to continue reading from the real STDIN, not the redirected one !

To avoid undesired interactions through the global state, Perl offers the local construct, also known as dynamic scoping (see perlsub). Using that construct, a component can temporarily redirect STDIN, for the duration of one specific computation, and then automatically restore it to its previous state.

DBIx::DataModel uses a similar approach. The database handle is stored in the schema class, and can be changed dynamically, which will immediately affect all classes and objects related to that schema. However, state modifications in schemas can be limited to a specific scope through the localizeState method. Furthermore, the doTransaction method takes an optional $dbh argument to localize the transaction within a specific database handle. With these methods, sane management of the global state is quite easy, and since nested transactions are supported, it is perfectly possible to program transactions with cross-database operations (copying objects from one database to another, or simultaneously performing the same insert or delete in several databases).

Actually, localization is only needed for Schema subclasses, that indeed hold a mutable global state (database connection, current debug mode, current value of selectImplicitlyFor, etc.). For Table and View subclasses, all information in classData is set by "compile-time methods" (methods starting with an uppercase letter) and then stays immutable : joins, primary keys, etc.; so there is no potential conflict.

Why no accessor methods for columns ?

The philosophy of DBIx::DataModel is that a record is nothing more than a blessed hashref, where hash keys are column names and hash values are column values. So the recommended way of accessing the data is through the hashref API : this allows you to exploit all common Perl idioms, like

my @column_names = keys @$row;      # inspect hash keys
s/^\s+// foreach values @$row;      # remove leading spaces in all columns
print @{$row}{qw/col1 col2 col3/};  # print a slice
($row->{col1}, $row->{col2}) = ($row->{col2}, $row->{col1}); # swap values
@{$row}{qw/col1 col2/} = @{$row}{qw/col2 col1/};             # idem

Now if you insist, there is the Autoload() method which will give you column accessors. As the name suggests, this relies on Perl's AUTOLOAD mechanism, and therefore will be a bit slower than generating all accessors explicitly at compile time (through Class::Accessor or something similar).

Pre-compiling accessor methods is not possible in DBIx::DataModel, because column names are never known in advance : two instances of the same Table do not necessarily hold the same set of columns, depending on what was requested when doing the select().

Serialization

DBIx::DataModel includes support for the standard Storable serialization / deserialization methods freeze and thaw : so records and record trees can be written into files or sent to other processes. Dynamic subclasses for database joins are re-created on the fly during deserialization through thaw. However, there is no support for serializing database connections (this would be hazardous, and also insecure because serialization data would contain database passwords). Therefore the process performing deserialization is responsible for opening the database connection by its own means, before calling the thaw method.

TO DO

Here are some points that hopefully will be improved in a future release DBIx::DataModel :

- 'hasInvalidColumns' : should be called automatically before insert/update ?
- 'validate' record handler (not only column handlers)
- 'normalize' handler : for ex. transform empty string into null
- walk through WHERE queries and apply 'toDB' handler (not obvious!)
- more extensive and more organized testing
- add PKEYS keyword in -columns, will be automatically replaced by
  names of primary key columns of the touched tables
- design API for easy dynamic association of objects without dealing
  with the keys
- remove spouse example from doc (because can't have same table
  twice in roles)
- quoting
- dbiPrepareMethod as argument to select()
- pre/post callbacks: support arrays of handlers, refine(..) adds to the
  array
- refine(-orderBy => ..) should add to the ordering
- reflection methods (list of roles, etc.)
- update with subtrees (insert/update on dependent records. Quid: delete?)
- auto-unjoin (API for partioning columns into subobjects).
- inheritance between tables: a) support for Postgres; b) support for DB views
- support for DBI TraceLevel
- execute() : check for unbound "dbidm placeholders"
- support DISTINCT ON ...
- support find_or_create, update_or_create
- doc : Schema has only tables, pk, fk (no cols, triggers, stored procs, domains)
- doc : SQL-oriented (like Fey)
- copy idea from DBIC:Storage:DBI:MultiColumnIn

- Storable
   - hooks for YAML ?
   - think about STORABLE_attach

- doc : explain Table inheritance
   $schema->Table(..., {parents => ...});
   # ADD TESTS for table inheritance


- declare RT bug for Params::Validate : doc is confused about
  validate_with / validation_options


- savepoints

- allow tables without primary key

- various policies for guessing foreign keys : e.g proc.id = attr.proc_id

- Statement::refine() should be more intelligent
    - combine (-where, -orderBy, callbacks)
    - reject overrides when not possible
    - should add something like a 'reset' method ??

- explain how to design families of tables with an intermediate superclass

- add attribute 'db_schema_name' in Schema->new(). Then one can have
     my $dev  = Schema->new(dbh => ..., db_schema_name => 'DEV');
     my $prod = Schema->new(dbh => ..., db_schema_name => 'PROD');
  and that name is prepended to table names

- fix all doc refernces to ::View or ::Table (add Source::)

- doc glossary, introduce the 'path' term

- think API for defining column types : just in new() or allow
  to do it in later calls to a "setter" method ?

- find a way to rename _singleInsert > _single_insert, without breaking
  compatibility


Other column handlers : 
  - validate
  - normalize

- various policies for guessing foreign keys : e.g proc.id = attr.proc_id

- Statement::refine() should be more intelligent - combine (-where, -orderBy, callbacks) - reject overrides when not possible - should add something like a 'reset' method ??

- declare RT bug for Params::Validate : doc is confused about validate_with / validation_options

- doc : explain Table inheritance $schema->Table(..., {parent_tables => ...}); # ADD TESTS for table inheritance - Storable - hooks for YAML ? - think about STORABLE_attach

- update syntactic sugar 
   $class->update(-set => {hashref}, -where => [-key => @_])

- think about reuse of statements : either this should be forbidden,
  or we need a way to do it cleanly, e.g

     my $statement = $schema->table($name);
     my $result1 = $statement->select(...);
     $statement->clear;
     my $result2 = $statement->select(...);
   

1 POD Error

The following errors were encountered while parsing the POD:

Around line 108:

Non-ASCII character seen before =encoding in 'façade'. Assuming CP1252