NAME

Tangram - Orthogonal Object Persistence in Relational Databases

DESCRIPTION

Tangram is an object-relational mapper. It makes objects persist in relational databases, and provides powerful facilities for retrieving and filtering them. Tangram fully supports object-oriented programming, including polymorphism, multiple inheritance and collections. It does so in an orthogonal fashion, that is, it doesn't require your classes to implement support functions nor inherit from a utility class.

DEPENDENCIES

Tangram depends on the following modules (available from CPAN):

  • DBI

  • Set::Object

GUIDED TOUR

In this tour, we will take a simple Person hierarchy and use Tangram to add persistence.

  • Person: either a NaturalPerson or a LegalPerson

  • NaturalPerson: a person of flesh and blood. NaturalPersons have

    • a firstName (string)

    • a name (string)

    • an age (int)

    • a partner (reference to another NaturalPerson)

    • a collection of children (a Perl array)

  • LegalPerson: a company for non-lawyers; LegalPersons have:

    • a name (string)

    • a manager (a reference to a NaturalPerson)

Here is the equivalent UML diagram:

                        +---------------+
                        |    Person     |
                        | { abstract }  |
                        +---------------+
                        +---------------+
                                |
                   +------------A----------------+
                   |                             |             
         +-------------------+           +---------------+          
     +--*|   NaturalPerson   |           |  LegalPerson  |        
     |   +-------------------+manager    +---------------+         
     V   | firstName: string |1---<-----1| name: string  |        
     |   | name: string      |           +---------------+        
     +--*| age: int          |
children +-------------------+
              1       1 
              |    partner
              |       |
              +--->---+

Before we can actually store objects we need to complete two steps:

  1. Create a Schema

  2. Create a database

Create a Schema

A Schema object contains information about the persistent aspects of a system of classes.

It also gives a degree of control over the way Tangram performs the object-relational mapping, but in this tour we will use all the defaults.

Here is the Schema for Springfield:

$schema = Tangram::Schema->new(

   classes =>
   {
      Person =>
      {
         abstract => 1,
      },

      NaturalPerson =>
      {
         bases => [ qw( Person ) ],

         fields =>
         {
            string   => [ qw( firstName name ) ],
            int      => [ qw( age ) ],
            ref      => [ qw( partner ) ],
            array    => { children => 'NaturalPerson' },
         },
      },

      LegalPerson =>
      {
         bases => [ qw( Person ) ],

         fields =>
         {
            string   => [ qw( name ) ],
            ref      => [ qw( manager ) ],
         },
      },
   } );

The Schema lists all the classes that need persistence, along with their attributes and the inheritance relationships. We must provide type information for the attributes, because SQL is more typed than Perl. We also tell Tangram that Person is an abstract class, so it wastes no time attempting to retrieve objects of that exact class.

Note that Tangram cannot deduce this information by itself. While Perl makes it possible to extract the list of all the classes in an application, in general not all classes will need to persist. A class may have both persistent and non-persistent bases. As for attributes, Perl's most typical representation for objects - a hash - even allows two objects of the same class to have a different set of attributes.

For more information on creating Schemas, see Tangram::Schema.

Setting up a database

Now we need to create a database. The simplest way is to create an empty database and let Tangram initialize it. Module Tangram::Deploy provides this functionality:

	use Tangram::Deploy;
	$dbh = DBI->connect(...); 	
   	$schema->deploy( $dbh );

For more information on deploying databases, see Tangram::Deploy.

Connecting to a database

This is done by calling class method Tangram::Storage::connect. Its first argument is the schema object; the others are passed directly to DBI::connect. For example:

   	$storage = Tangram::Storage->connect( $schema,
		'DBI:ODBC:Springfield', 'homer', 'doh!' );

connects to a database named Springfield via the ODBC driver, using a specific account and password.

For more information on connecting to databases, see Tangram::Storage.

Inserting objects

Now we can begin to populate the database:

$storage->insert( NaturalPerson->new(
   firstName => 'Montgomery', name => 'Burns' ) );

This inserts a single NaturalPerson object into the database. We can insert several objects in one call:

$storage->insert(
   NaturalPerson->new( firstName => 'Patty', name => 'Bouvier' ),
   NaturalPerson->new( firstName => 'Selma', name => 'Bouvier' ) );

Sometimes Tangram saves objects implicitly:

   	my @kids = (
		NaturalPerson->new( firstName => 'Bart', name => 'Simpson' ),
		NaturalPerson->new( firstName => 'Lisa', name => 'Simpson' ) );

   	my $marge = NaturalPerson->new( firstName => 'Marge', name => 'Simpson',
		children => [ @kids ] );

   	my $homer = NaturalPerson->new( firstName => 'Homer', name => 'Simpson',
		children => [ @kids ] );

   	$homer->{partner} = $marge;
   	$marge->{partner} = $homer;
   
   	$storage->insert( $homer );

In the process of saving Homer, Tangram detects that it contains references to objects that are not persistent yet (Marge and the kids), and inserts them automatically. Note that Tangram can handle cycles: Homer and Marge refer to each other.

For more information on inserting objects, see Tangram::Storage.

Updating objects

Updating works pretty much the same as inserting:

my $maggie = NaturalPerson->new( firstName => 'Maggie', name => 'Simpson' );
push @{ $homer->{children} }, $maggie;
push @{ $marge->{children} }, $maggie;

$storage->update( $homer, $marge );

Here again Tangram detects that Maggie is not already persistent in $storage and automatically inserts it. Note that we need to update Marge explicitly because she was already persistent.

For more information on updating objects, see Tangram::Storage.

Memory management

...is still up to you. Tangram won't break in-memory cycles, it's a persistence tool, not a memory management tool. Let's make sure we don't leak objects:

$homer->{partner} = undef; # do this before $homer goes out of scope

Also, when we're finished with a storage, we must explicitly disconnect it:

$storage->disconnect();

A connected storage will hold references to those persistent objects that are present in transient storage (the memory), until it's explicitly disconnected. This requirement will go away when Perl starts supporting weak references.

Finding objects

After reconnecting to Springfield, we now want to retrieve some objects. But how do we find them? Basically there are three options

  • We know their IDs.

  • We obtain them from another object.

  • We use a query.

Loading by ID

When an object is inserted, Tangram assigns an identifier to it. IDs are numbers that uniquely identify objects in the database. insert returns the ID(s) of the object(s) it was passed:

	my $homer_id = $storage->insert( NaturalPerson->new(
   	firstName => 'Homer', name => 'Simpson' ) );

	my @twin_ids = $storage->insert(
   	NaturalPerson->new( firstName => 'Patty', name => 'Bouvier' ),
   	NaturalPerson->new( firstName => 'Selma', name => 'Bouvier' ) );

This enables us to retrieve the objects:

my $homer = $storage->load( $homer_id );
my @twins = $storage->load( @twin_ids );

For more information on loading objects by id, see Tangram::Storage.

Obtaining objects from other objects

Homer has been restored to his previous state, including his relations with his family. Thus we can say:

my $marge = $homer->{partner};
my @kids = @{ $homer->{children} };

Actually, when Tangram loads an object that contains references to other persistent objects, it doesn't retrieve the referenced objects immediately. Instead Tangram uses a defered loading mechanism. Marge is retrieved only when Homer's 'partner' field is accessed. This mechanism is almost totally transparent, we'd have to use tied to observe a non-present collection or reference.

For more information on relationships, see Tangram::Schema, Tangram::Ref, Tangram::Array, Tangram::IntrArray, Tangram::Set and Tangram::IntrSet.

select

To retrieve all the objects of a given class, we use select:

my @people = $storage->select( 'NaturalPerson' );

For more information on select(), see Tangram::Storage.

Filtering

Usually we won't want to load all the NaturalPersons, only those objects that satisfy some condition. Say, for example, that we want to load only the NaturalPersons whose name field is 'Simpson'. Here's how this can be done:

my $person = $storage->remote( 'NaturalPerson' );
my @simpsons = $storage->select( $person, $person->{name} eq 'Simpson' );

This will bring in memory only the Simpsons; Burns or the Bouvier sisters won't turn up. The filtering happens on the database server side, not in Perl space. Internally, Tangram translates the $person-{name} eq 'Simpson'> clause into a piece of SQL code that is passed down to the database.

The above example only begins to scratch the surface of Tangram's filtering capabilities. The following examples are all legal and working code:

	# find all the persons *not* named Simpson
   	my $person = $storage->remote( 'NaturalPerson' );
   	my @others = $storage->select( $person, $person->{name} ne 'Simpson' );

   	# same thing in a different way
   	my $person = $storage->remote( 'NaturalPerson' );
   	my @others = $storage->select( $person, !($person->{name} eq 'Simpson') );

   	# find all the persons who are older than me
	my $person = $storage->remote( 'NaturalPerson' );
   	my @elders = $storage->select( $person, $person->{age} > 35 );

   	# find all the Simpsons older than me
   	my $person = $storage->remote( 'NaturalPerson' );
   	my @simpsons = $storage->select( $person,
   	   	$person->{name} eq 'Simpson' & $person->{age} > 35 );

   	# find Homer's wife - note that select *must* be called in list context
   	my ($person1, $person2) = $storage->remote(
		qw( NaturalPerson NaturalPerson ));

   	my ($marge) = $storage->select( $person1,
      	$person1->{partner} == $person2
      	& $person2->{firstName} eq 'Homer' & $person2->{name} eq 'Simpson' );

   	# find Homer's wife - this time Homer is already in memory
   	my $homer = $storage->load( $homer_id );
   	my $person = $storage->remote( 'NaturalPerson' );
   	my ($marge) = $storage->select( $person,
      	$person->{partner} == $homer );

   	# find the parents of Bart Simpson
   	my ($person1, $person2) = $storage->remote(
		qw( NaturalPerson NaturalPerson ));

   	my (@parents) = $storage->select( $person1,
      	$person1->{children}->includes( $person2 )
      	& $person2->{firstName} eq 'Bart' & $person2->{name} eq 'Simpson' );

   	# find the parents of Bart Simpson - he's already loaded
   	my $bart = $storage->load( $bart_id );
   	my $person = $storage->remote( 'NaturalPerson' );
   	my (@parents) = $storage->select( $person,
      	$person->{children}->includes( $bart ) );

Note that Tangram uses a single ampersand (&) or bar (|) to represent logical conjunction or disjunction, not the usual && or ||. This is due to a limitation in Perl's operator overloading mechanism. Make sure you never forget this, because, unfortunately, using && or || in place of & or | is not even a syntax error :(

For more information on filters, see Tangram::Expr and Tangram::Remote.

Cursors

Cursors provide a way of retrieving objects one at a time. This is important is the result set is potentially large. cursor() takes the same arguments as select() and returns a Cursor objects that can be used to iterate over the result set via methods current() and next():

   	# iterate over all the NaturalPersons in storage

   	my $cursor = $storage->cursor( 'NaturalPerson' );

   	while (my $person = $cursor->current())
   	{
		# process $person
		$cursor->next();
   	}

   	$cursor->close();

The Cursor will be automatically closed when $cursor is garbage-collected, but Perl doesn't define just when that may happen :( Thus it's a good idea to explicitly close the cursor.

Each Cursor uses a separate connection to the database. Consequently you can have several cursors open at the same, all with pending results. Of course, mixing reads and writes to the same tables can result in deadlocks.

For more information on cursors, see Tangram::Storage and Tangram::Cursor.

Remote objects

At this point, most people wonder what $person exactly is and how it all works. This section attempts to give an idea of the mechanisms that are used.

In Tangram terminology, $person a remote object. Its Perl class is Tangram::Remote, but it's really a placeholder for an object of class NaturalPerson in the database, much like a table alias in SQL-speak.

When you request a remote object of a given class, Tangram arranges that the remote object looks like an object of the said class. It seems to have the same fields as a regular object, but don't be misled, it's not the real thing, it's just a way of providing a nice syntax.

If you dig it, you'll find out that a Remote is just a hash of Tangram::Expr objects. When you say $homer->{name}, an Expr is returned, which, most of the time, can be used like any ordinary Perl scalar. However, an Expr represents a value in the database, it's the equivalent of Remote, only for expressions, not for objects.

Expr objects that represent scalar values (e.g. ints, floats, strings) can be compared between them, or compared with straight Perl scalars. Reference-like Exprs can be compared between themselves and with references

Expr objects that represent collections have an include methods that take a persistent object, a Remote object or an ID.

The result of comparing Exprs (or calling include) is a Tangram::Filter that will translate into part of the SQL where-clause that will be passed to the RDBMS.

For more information on remote objects, see Tangram::Remote.

Multiple loads

What happens when we load the same object twice? Consider:

my $person = $storage->remote( 'NaturalPerson' );
my @simpsons = $storage->select( $person, $person->{name} eq 'Simpson' );

my @people = $storage->select( 'NaturalPerson' );

Obviously Homer Simpson will be retrieved by both selects. Are there two Homers in memory now? Fortunately not. There is only one copy of Homer in memory. When Tangram load an object, it checks whether an object with the same ID is alredy present. If yes, it keeps the old copy, which is desirable, since we may have changed it already.

Incidentally, this explains why a Storage will hold objects in memory - until disconnected (again, this will change when Perl supports weak references).

Transactions

Tangram supports transactions:

$storage->tx_start();
$homer->{partner} = $marge;
$marge->{partner} = $homer;
$storage->update( $homer, $marge );
$storage->tx_commit();

Both Marge and Homer will be updated, or none will. tx_rollback() drops drop the changes.

Unlike DBI, Tangram allows the nested transactions:

	$storage->tx_start();

	{
   	$storage->tx_start();
   	$patty->{partner} = $selma;
   	$selma->{partner} = $patty;
   	$storage->tx_commit();
	}

	$homer->{partner} = $marge;
	$marge->{partner} = $homer;
	$storage->update( $homer, $marge );

	$storage->tx_commit();

Tangram uses a single database transaction, but commits it only when the tx_commit()s exactly balance the tx_start()s. Thanks to this feature any piece of code can open all the transactions it needs and still cooperate smoothly with the rest of the application. If a DBI transaction is already active, it will be reused; otherwise a new one will be started.

Tangram offer a more robust alternative to the start/commit code sandwich. tx_do() calls CODEREF in a transaction. If the CODEREF dies, the transaction is rolled back; otherwise it's committed. The first example can be rewritten:

$storage->tx_do( sub {
	$homer->{partner} = $marge;
	$marge->{partner} = $homer;
	$storage->update( $homer, $marge };
	} );

For more information on remote objects, see Tangram::Storage.

Polymorphism

Up to now we've always used NaturalPerson. However, everything we've seen thus far also works in presence of polymorphism. Let's create a LegalPerson:

	$storage->insert( LegalPerson->new(
   	name => 'Springfield Nuclear Power Plant', manager => $burns ) );

we now have two kinds of Person objects in the storage: Natural- and LegalPersons. If we select all the Persons:

my @all = $storage->select( 'Person' );

...Tangram does what you would expect: it retrieves Homer and all the other persons of flesh and blood and the Nuclear Power Plant.

LICENSE & WARRANTY

Tangram is free software. You may use, modify and redistribute this module under the same terms as Perl itself.

TANGRAM COMES WITHOUT ANY WARRANTY OF ANY KIND.

SUPPORT

Please send bug reports directly to me (jll@tangram-persistence.org) or to the Tangram mailing list (users@tangram-persistence.org). Whenever possible, join a short, complete script demonstrating the problem.

Questions of general interest should should be posted either to the Tangram mailing list (users@tangram-persistence.org) or to comp.lang.perl.modules, which I monitor daily. Make sure to include 'Tangram' in the subject line.

Commercial support for Tangram is available. Visit the Tangram website (www.tangram-persistence.org) for support options or Contact me at jll@skynet.be.

ACKNOWLEDGEMENTS

I'd like to thank Paul Sharpe and the CPAN testers for helping me test Tangram on many popular platforms.

AUTHOR

Jean-Louis Leroy, jll@tangram-persistence.org

SEE ALSO

perl(1), DBI, overload, Set::Object.