NAME

ObjStore - perl extension for ObjectStore OODBMS

SYNOPSIS

use ObjStore ':ALL';

my $db = ObjStore::open(&schema_dir . "/perltest.db", 0, 0666);

try_update {
    my $top = $db->root('whiteboard') ||
              $db->root('whiteboard', new ObjStore::AV($db, 1000));
    for (my $x=1; $x < 10000; $x++) {
        my $z= $top->[$x];
        $top->[$x] ||= {
             id => $x,
             m1 => "I will not talk in ObjectStore/perl class.",
             m2 => "I will study the documentation before asking questions.",
        };
    }
    print "Very impressive.  I see you are already an expert.\n";
};

DESCRIPTION

The new SQL and the sunset of relational databases.

ObjectStore is the leading object-oriented database. It is engineered by Object Design, Inc. (http://www.odi.com) (NASDAQ: ODIS). The database uses the virtual memory mechanism to make persistent data available in the most efficient manner possible.

In case you didn't know, Object Design's Persistent Storage Engine has been licensed by Sun, Microsoft, Netscape, and Symantic for inclusion in their Java development environments.

Prior to this joining of forces,

  • ObjectStore was too radical a design decision for many applications.

  • perl5 did not have a simple way of storing complex data persistently.

Now there is an easy way to build databases, especially if you care about preserving your ideals of data encapsulation. (See below!)

API

Much of the perl API is a direct interface to the C++ API. Refer to the ObjectStore documentation for exact symantics. If you need a function that isn't available in perl, send mail to the OS/perl mailing list (see the README).

Fortunately, you probably wont need to use most of the API. It is listed below simply to make you feel more comfortable.

ObjStore

  • $name = ObjStore::release_name()

  • $major = ObjStore::release_major()

  • $minor = ObjStore::release_minor()

  • $maintenance = ObjStore::release_maintenance()

  • $yes = ObjStore::network_servers_available();

  • ObjStore::set_auto_open_mode(mode, fp, [sz]);

  • $num = ObjStore::return_all_pages();

  • $size = ObjStore::get_page_size();

  • @Servers = ObjStore::get_all_servers();

  • $in_abort = ObjStore::abort_in_progress();

  • $db = ObjStore::open($pathname, $read_only, $mode);

  • $num = ObjStore::get_n_databases();

::Server

  • $name = $s->get_host_name();

  • $is_broken = $s->connection_is_broken();

  • $s->disconnect();

  • $s->reconnect();

  • @Databases = $s->get_databases();

::Database

  • $db->close();

  • $db->destroy();

  • $db->get_default_segment_size();

  • $db->get_sector_size();

  • $db->size();

  • $db->size_in_sectors();

  • $ctime = $db->time_created();

  • $is_open = $db->is_open();

  • $db->open_mvcc();

  • $is_mvcc = $db->is_open_mvcc();

  • $read_only = $db->is_open_read_only();

  • $can_write = $db->is_writable();

  • $db->set_fetch_policy(policy[, blocksize]);

    Policy can be one of segment, page, or stream.

  • $db->set_lock_whole_segment(policy);

    Policy can be one of as_used, read, or write.

  • $db = ObjStore::Database::of($pvar);

  • $Seg = $db->create_segment();

  • $Seg = $db->get_segment($segment_number);

  • @Segments = $db->get_all_segments();

  • @Roots = $db->get_all_roots();

  • $root = $db->create_root($root_name);

  • $root = $db->find_root($root_name);

  • $value = $db->root($root_name[, $new_value]);

    This is the recommended API for roots. If the given root is not found, creates a new one. Sets the root's value if $new_value is defined. Returns the root's current value.

  • $db->destroy_root($root_name);

    Destroys the root with the given name if it exists.

::Root

  • $root->get_name();

  • $root->get_value();

  • $root->set_value($new_value);

  • $root->destroy();

::Transaction

ObjectStore transactions and exceptions are seemlessly integrated into perl. ObjectStore exceptions cause a die in perl just as perl exceptions cause a transaction abort.

try_update {
    $top = $db->root('top');
    $top->{abc} = 3;
    die "Oops!  abc should not change!";       # aborts the transaction
};

There are three types of transactions: try_read, try_update, and try_abort_only. In a read transaction, you are not allowed to modify persistent data.

    try_read {
	my $var = $db->root('top');
	$var->{abc} = 7;	# write to $var triggers die(...)
    };
  • $T = ObjStore::Transaction::get_current();

  • $type = $T->get_type();

  • $pop = $T->get_parent();

  • $T->prepare_to_commit();

  • $yes = $T->is_prepare_to_commit_invoked();

  • $yes = $T->is_prepare_to_commit_completed();

  • ObjStore::set_transaction_priority($very_low);

  • ObjStore::set_max_retries($oops);

  • ObjStore::rethrow_exceptions

  • my $oops = ObjStore::get_max_retries();

  • my $yes = ObjStore::is_lock_contention();

  • my $type = ObjStore::get_lock_status($ref);

  • my $tm = ObjStore::get_readlock_timeout();

  • my $tm = ObjStore::get_writelock_timeout();

  • ObjStore::set_readlock_timeout($tm);

  • ObjStore::set_writelock_timeout($tm);

::Segment

  • $Seg->destroy();

  • $size = $Seg->size();

  • $yes = $Seg->is_empty();

  • $yes = $Seg->is_deleted();

  • $num = $Seg->get_number();

  • $comment = $Seg->get_comment();

  • $Seg->set_comment($comment);

  • $Seg->lock_into_cache();

  • $Seg->unlock_from_cache();

  • $Seg->set_fetch_policy($policy[, $size]);

    Policy can be one of segment, page, or stream.

  • $Seg->set_lock_whole_segment($policy);

    Policy can be one of as_used, read, or write.

  • $Seg = ObjStore::Segment::of($pvar);

CREATING CONTAINERS

Databases are comprised of segments. Segments dynamically resize from very small to very big. You should split your data into lots segments when it makes sense. Segment improve locality of reference and can be a unit of locking or caching.

When you create a container you must specify the segment in which it is to be allocated. All containers are created using the form 'new ObjStore::$type($store, $cardinality)'. You may pass any persistent object in place of $store and the new container will be created in the same segment as the $store object!

Arrays

The following code snippet creates a persistent array reference with an expected cardinality of ten elements.

my $a7 = new ObjStore::AV($store, 10);

None of the usually array operations are supported except fetch and store. At least the following works:

$a7->[1] = [1,2,3,[4,5],6];

Complete array support will be available as soon as Larry and friends fix the TIEARRAY interface. (See perltie(3) or http://www.perl.com more info.)

Hashes

The following code snippet creates a persistent hash reference with an expected cardinality of ten elements.

my $h7 = new ObjStore::HV($store, 10);

An array representation is used for low cardinalities. Arrays do not scale well, but they do afford a compact representation. ObjectStore's os_Dictionary is used for large cardinalities.

Data structures can be built with the normal perl syntax:

$h7->{foo} = { 'fwaz'=> { 1=>'blort', 'snorf'=>3 }, b=>'ouph' };

Or the equally effective but unbearibly tedious:

my $h1 = $dict->{foo} ||= new ObjStore::HV($dict);
my $h2 = $h1->{fwaz} ||= new ObjStore::HV($h1);
$h2->{1}='blort';
$h2->{snorf}=3;
$h1->{b}='ouph';

Perl saves us again! Relief.

Sets

The following code snippet creates a set with an expected cardinality of ten elements.

my $set = new ObjStore::Set($store, 10);

Sets are simple collections. They do not support duplicates. The following methods are supported:

    $set->add($obj, { hello=>1 });
    $set->rm($obj);
    $yes = $set->contains($obj);
    for (my $obj = $set->first; $obj; $obj = $set->next) {
	# do something with $obj
    }

An array representation is used for low cardinalities. Arrays are not efficient, but they are compact. ObjectStore's os_set is used for large cardinalities.

Changing the membership of a set while iterating over the members has undefined results.

OSPEEK

While there is no official schema for a perl database, the ospeek utility generates a sample of data content and structure. The following output was snapped from a database that supports a CGI application we have developed. Note how circular references and pointers between objects are summarized.

Wait! No Schema?! How Can This Scale?

How can relational databases scale?! When you write down a central schema, you are violating the principle of encapsulation. This is dumb. None of the usual database operations require a central schema. Why create artificial dependencies between your classes when you can avoid it?

Lazy Evolution

Even schema evolution can be done piecemeal. Give all your objects an evolve method that insures that the representation is up-to-date.

  • Either tag your objects with version numbers,

  • Or intelligently figure out how to evolve objects by examining their current structure.

The main thing is to keep an archive of prior formats of object instances to regression test your new evolve methods. If you can do extracts to a mini-database, that would do the trick. Then just run your new code through a historical mini-database.

ospeek Example Output

ObjStore::Root Bright = Node {
   VERSION => 5,
   center => 1,
   ctime => '19970814113317',
   daily_hits => ObjStore::HV {
     19970814 => 1,
   },
   desc => 'We are what we think.  All that we are arises with our thoughts.  With our thoughts we make the world.',
   hits => 1,
   name => 'Bright',
   owner => '0',
   reflected => '0',
   rel => ObjStore::Set [
     Node {
       ctime => '19970814113317',
       daily_hits => ObjStore::HV {
         19970814 => 8,
       },
       desc => '',
       hits => 8,
       n_anon => 6,
       name => 'Joe's Store',
       owner => Node { ... }
       reflected => 1,
       rel => ObjStore::Set [
         Node { ... }
         Node {
           ctime => '19970814113317',
           desc => 'New users arrive here.',
           hits => '0',
           index => ObjStore::HV {
             Anon-4 => User {
               ctime => '19970814130657',
               daily_hits => ObjStore::HV {
                 19970814 => 1,
               },
               desc => 'Anonymous temporary login.',
               expire => '19970915130657',
               hits => 1,
               name => 'Anon-4',
               owner => User { ... }
               reflected => 3,
               rel => ObjStore::Set [
                 Node { ... }
               ],
               views => ObjStore::HV {
                 1 => User::View {
                   at => Node {
                     color => 'light green',
                     ctime => '19970814124048',
                     daily_hits => ObjStore::HV {
                       19970814 => 8,
                     },
                     desc => '',
                     hits => 8,
                     name => 'Research',
                     owner => Node { ... }
                     reflected => 1,
                     rel => ObjStore::Set [
                       Node { ... }
                       Node { ... }
                       Node { ... }
                       Node { ... }
                     ],
                     url => '',
                   },
                   prior => ObjStore::HV {
                     0 => Node { ... }
                     1 => Node { ... }
                     2 => User { ... }
                   },
                 },
                 2 => User::View {
                   at => User { ... }
                 },
               },
             },
             Anon-5 => User {
               ctime => '19970814191636',
               desc => 'Anonymous temporary login.',
               expire => '19971013191636',
               hits => '0',
               name => 'Anon-5',
               owner => User { ... }
               reflected => 3,
               rel => ObjStore::Set [
                 Node { ... }
               ],
               views => ObjStore::HV {
                 1 => User::View {
                   at => User { ... }
                   prior => ObjStore::HV {
                     0 => User { ... }
                   },
                 },
                 2 => User::View {
                   at => User { ... }
                 },
               },
             },
             Anon-6 => User {
               ctime => '19970814191724',
               desc => 'Anonymous temporary login.',
               expire => '19971013191724',
               hits => '0',
               name => 'Anon-6',
               owner => User { ... }
               reflected => 3,
               rel => ObjStore::Set [
                 Node { ... }
               ],
               views => ObjStore::HV {
                 1 => User::View { ... }
                 2 => User::View { ... }
               },
             },
             joshua => User {
               ctime => '19970814113317',
               daily_hits => ObjStore::HV {
                 19970814 => 13,
               },
               desc => '',
               dreamer => 1,
               expire => '19971013182157',
               hits => 13,
               name => 'joshua',
               owner => User { ... }
               passwd => 'zzReR55rX6.JA',
               proposals => ObjStore::Set [
                 User::Proposal {
                   about => Node { ... }
                   ctime => '19970814195243',
                   from => User { ... }
                   to => User { ... }
                 },
               ],
               reflected => 3,
               rel => ObjStore::Set [
                 Node { ... }
               ],
               url => '',
               views => ObjStore::HV {
                 1 => User::View { ... }
                 2 => User::View { ... }
               },
             },
           },
           name => 'Users',
           owner => Node { ... }
           reflected => 2,
           rel => ObjStore::Set [
             Node { ... }
             User { ... }
             User { ... }
             User { ... }
             User { ... }
           ],
         },
         Node { ... }
       ],
     },
     Node { ... }
     Node { ... }
   ],
 },

WHY IS PERL A BETTER FIT FOR DATABASES THAN SQL, C++, OR JAVA?

When you write a structure declaration in C++ or Java you are assigning both field-names, field-types, and field-order.

  struct CXX {
	char *name;
	char *title;
	double size;
  };

Programs almost always require a recompile to change any of these attributes. This is fine for small to medium size applications but is not suitable for large databases. It is too inflexible. An SQL-type language is needed.

When you create a table in SQL you are assigning only field-names and field-types.

create table CXX
(name varchar(80),
 title varchar(80),
 size double)

This is a more flexible data declaration, but SQL gives you far less expressive power than C++ or Java. Applications end up being written in C++ or Java while their data is stored in SQL. Managing the syncronization between the two languages creates a lot of extra complexity. So much so that there are many software companies that exist solely to help address this headache.

perl is better because it spans all the requirements in a single language. For example, this is similar to an SQL table:

my $h1 = { name => undef, title => undef, size => undef };

Only the field-names are specified.

To address the other side of the spectrum, Malcolm Beattie is working on a perl compiler which is currently in beta-test. Here is his brief description of a new hybrid hash-array that is supported:

An array ref $a can be dereferenced as if it were a hash
ref.  $a->{foo} looks up the key "foo" in %{$a->[0]}. The value is the
index in the true underlying array @$a. As an addition, if the array
ref is in a lexical variable tagged with a classname ("my CXX $obj" to
match your example above) then constant key dereferences of the form
$obj->{foo} are mapped to $obj->[123] at compile time by looking up
the index in %CXX::FIELDS.

For example:

my $schema_hashref = { 'field1' => 1, 'field2' => 2 };
my $arr = [$schema_hashref, 'fwaz', 'snorf'];
print "$arr->{field1} : $arr->{field2}\n";      # "fwaz : snorf"

I haven't done benchmarks yet, but considering the implementation, compiled fake hashes should make perl very competitive with Java / ObjectStore database applications in terms of raw performance.

Summary (long)

  • SQL

    All perl databases use the same flexible schema that can be examined and updated with generic tools. This is the key advantage of SQL, now available in perl.

    Perl / ObjectStore is definitely faster than SQL too. Not to mention that perl is a general purpose programming language and SQL is at best a 'query language'.

  • C++

    Special purpose data types can be coded in C++ and dynamically linked into perl. Since C++ will always be faster than Java, this gives perl an edge in the long run. perl is to C/C++ as C/C++ is to assembly language.

  • JAVA

    Java has the buzz, but!

    • Just like C++, the lack of a universal generic schema limits use to a single application at a time. Without some sort of tie mechanism, I don't see how this can be remedied.

    • All Java databases must serialize data to store it. Until Java supports persistent allocation directly, database operations will always be slower than C++.

    • Perl will soon integrate with Java enough to use SwingSet - AWT.

    • I'd like to see some comparisions of code length when solving the same problems in Java and in perl. I have a strong suspicion that it is easier to do data processing in perl.

ETA

  • 0-3 MONTHS

    Perl compiler; kernel threads; fake hashes

  • 3-6 MONTHS

    Dynamically loaded application schemas; proper tied arrays; debugged tie interface; perl-Java integration

Summary (short)

Perl can store data

  • optimized for flexibility or for speed

  • in transient memory or persistent memory

without violating the principle of encapsulation or obstructing general ease of use.

ADVANCED FEATURES

Bless

The ObjStore module installs its own version of bless which assures that blessings are persistent. For example:

package MyObject;
use ObjStore;
@ISA = qw(ObjStore::HV);
sub new {
    my ($class, $store) = @_;
    my $o = $class->SUPER::new($store, $class);
    $o->{attribute} = 5;
    $o;
}

package main;
my $o = new MyObject($db);

If you store each class in a separate .pm file in your @INC path (see require), then the classes will be autoloaded as you traverse your data.

Class Autoloading

ObjStore tries to require each class as you access persistent instances the first time. This means that you can write generic data processing programs that automatically load the appropriate libraries to manipulate data as the data is accessed.

To disable the class autoloading behavior:

ObjStore::disable_class_auto_loading();

This mechanism is orthogonal to the AUTOLOAD mechanism for autoloading functions.

Transactions Part Two

  • EVAL

    Transactions are always executed within an implicit eval. If you do not want to abort your program when an ObjectStore exception occurs, you should indicate that you want to check errors yourself:

    ObjStore::rethrow_exceptions(0);

    After a transaction, you will need to check the value of $@ to see if anything went wrong and determine how to proceed.

    try_update {
       ...
    };
    die if $@;    # check for errors!
  • DEADLOCK

    Transactions are automatically retried in the case of a deadlock. If you need to handle deadlocks specially, you can use ObjStore::set_max_retries(0) and write the logic (or illogic) yourself.

Stargate

The stargate determines which collection representations are used to store implicitly created hashes and arrays. It is called recursively on data structures in order to copy them into persistent memory. If you replace the default stargate with your own, make sure to dismember the transient structures as they are processed to insure that circular structures will be collected in transient memory.

  ObjStore::set_stargate(sub {
    my ($seg, $sv) = @_;
    my $type = reftype $sv;
    my $class = ref $sv;
    if ($type eq 'HASH') {
	my $hv = new ObjStore::HV($seg, ...);
	while (my($hk,$v) = each %$sv) { $hv->STORE($hk, $v); }
	%$sv = ();
	if ($class ne 'HASH') { ObjStore::bless $hv, $class; }
	$hv
    } elsif ($type eq 'ARRAY') {
        ...
    } else {
	croak("Stargate: Don't know how to translate $sv");
    }
  };

TECHNICAL IMPLEMENTATION

You don't have to understand anything about the technical implementation. Just know that:

  • ObjectStore is outrageously powerful, sophisticated, even over-engineered.

  • The perl interface is optimized for simplicity and easy of use. (If it's not fun, why bother?)

The performance of raw ObjectStore is so good that even with a gunky perl layer, benchmarks will show that relational databases can be safely left on the bookshelf where they belong.

Differences Between The Perl And C++ APIs

Most stuff should be exactly the same. However,

  • Some static methods sit directly under ObjStore::.

  • Transactions are simplified.

Data Representation

Memory usage is much more important in a database than in transient memory. When databases can be as large or larger than ten million megabytes, a few percent difference in compactness can mean a lot. Therefore, I am always thinking about ways of conserving persistent memory.

enum ossvtype {
  ossv_undef=1,
  ossv_iv=2,    // integer
  ossv_nv=3,    // double
  ossv_pv=4,    // string
  ossv_obj=5	  // ref counted objects (containers or complex objects)
};

struct OSSV {               // persistent scalar
  void *vptr;
  os_unsigned_int16 _refs;  //unused
  os_int16 _type;
};

struct hkey {               // hash key
  char *pv;
  os_unsigned_int32 len;
};

struct hent {               // hash element
  hkey hk;
  OSSV hv;
};

struct OSPV_iv {            // IV storage
  os_int32 iv;
};

struct OSPV_nv {            // NV storage
  double nv;
};

There are number of weaknesses in the current schema:

  • OSSV

    The refcnt is no longer used and the type of an OSSV could be inferred instead of stored (save 4 bytes per OSSV). The same I32 can be used for an integer value or string length (save an allocation per I32).

  • HASH KEYS

    Hash keys store their length but not their hash. Actually, hash keys probably shouldn't even cache their hashed value, just a straight char* to minimize memory usage (save 4 bytes & an allocation).

  • STRINGS

    Strings do not store their length so you can't store strings with embedded NULLs.

  • NO WEAK REFERENCES

Changes will be made as soon as I finish the database evolver.

Go Extension Crazy

ObjStore::UNIVERSAL is the base class for all persistent objects. You cannot directly access persistent scalars from perl. They are always immediately copied into transient scalars. So the ObjStore::UNIVERSAL base class is only for objects (or collections).

ObjStore::UNIVERSAL::Container is the base class for all containers.

ObjStore::Set is the base class for sets.

ObjStore::HV is the base class for tied hashes.

ObjStore::AV is the base class for tied arrays.

ObjStore::Cursor is the base class for cursors.

ObjStore::File will be the base class for large binary data.

When an ObjectStore exception occurs, $ObjStore::EXCEPTION is called with an explaination. You can replace the default handler with your own function.

Each subclass of ObjStore::UNIVERSAL has a %REP hash. Persistent object implementations add their creation functions to the hash. Each packages' new method decides on the best representation, calls the creation function, and returns the persistent object.

You can add your own C++ representations for each of Set, AV, and HV. If you want to know the specifics, look at the code for the provided built-in representations (GENERIC.*).

You can add new families of objects that inherit from ObjStore::UNIVERSAL. Suppose you want highly optimized, persistent bit vectors? Or matrics? These would not be difficult to add. Especially once Object Design figures out how to support multiple application schemas within the same executable. They claim that this tonal facility will be available in the next release.

ossv_bridge typemap

The following explaination may be helpful to developers trying to understand the ObjStore typemap. If you don't know what a typemap is, just skip to the next section.

The struct ossv_bridge is used to bridge between perl and C++ objects. It contains transient cursors and transient pointers to persistent data. Immediately after a transaction finishes, invalidate is invoked on all outstanding bridges. This is necessary in order to update the reference counts properly. This was also the most difficult part to get right. But hey, how many databases do reference counting?

DIRECTION

  • MORE BUILT-IN DATA TYPES

    Text objects implemented using osmmtype and subclassed from IO::Handle. Support for one of Object Design's Text Object Managers. Support for bit vectors and matrics.

  • MORE APIS

    Support for notification, database access control, and any other interesting ObjectStore APIs.

EXPORTS

bless, try_read, try_update, try_abort_only by default. Most other static methods can also be exported.

BUGS

  • NESTED TRANSACTIONS

    Disabled until the transaction support is cleaned up.

  • CURSED OBJECTS

    The strings used to record the blessed nature of persistent objects are allocated in a private hash in the default segment of a database (See 'ospeek -all'). If you accidentally mess up or change any of these strings, your objects will be cursed. You will need to re-bless each to fix the broken pointers. A database copy script is in the works.

AUTHOR

Copyright (c) 1997 Joshua Nathaniel Pritikin. All rights reserved.

This package is free software; you can redistribute it and/or modify it under the same terms as perl itself. perl / ObjectStore is available via any CPAN mirror site. See http://www.perl.com/CPAN/modules/by-module/ObjStore

Portions of the collection code snapped from splash, Jim Morris's delightful C++ library ftp://ftp.wolfman.com/users/morris/public/splash .

Also, a poignant thanks to all the wonderful teachers with which I've had the opportunity of studying.

SEE ALSO

Examples in the t/ directory, perl5, ObjectStore, and happily not SQL!