NAME
ObjStore - Perl Extension For ObjectStore OODBMS
SYNOPSIS
use ObjStore;
use ObjStore::Config;
my $db = ObjStore::open(TMP_DBDIR . "/silly.db", 0, 0666);
try_update {
my $wb = $db->root('whiteboard', sub {new ObjStore::AV($db, 1001)});
for (my $x=0; $x < 1000; $x++) {
$wb->[$x] = {
repetition => $x,
msgs => ["I will not talk in ObjectStore/perl class.",
"I will study the documentation before asking questions."]
};
}
};
print "Very impressive. I see you are already an expert.\n";
DESCRIPTION
The new SQL and the sunset of relational databases.
ObjectStore is the leading object-oriented database. It is engineered by Object Design, Inc. ( http://www.odi.com ) (NASDAQ: ODIS). The database uses the virtual memory mechanism to make persistent data available in the most efficient manner possible.
In case you didn't know, Object Design's Persistent Storage Engine has been licensed by Sun, Microsoft, Netscape, and Symantic for inclusion in their Java development environments.
Prior to this joining of forces,
ObjectStore was too radical a design decision for many applications.
Perl5 did not have a simple way of storing complex data persistently.
Now there is an easy way to build databases, especially if you care about preserving your ideals of data encapsulation. (See below!)
API
Fortunately, you probably wont need to use most of the API. It is exhibited below mainly to make it seem like this product has a difficult and steep learning curve. Skip to the next section.
Mostly, the API mirrors the C++ API. Refer to the ObjectStore documentation for exact symantics. If you need a function that isn't available in perl, send mail to the OS/Perl mailing list (see the README).
ObjStore
$name = ObjStore::release_name()
$major = ObjStore::release_major()
$minor = ObjStore::release_minor()
$maintenance = ObjStore::release_maintenance()
$yes = ObjStore::network_servers_available();
ObjStore::set_auto_open_mode(mode, fp, [sz]);
$num = ObjStore::return_all_pages();
$size = ObjStore::get_page_size();
@Servers = ObjStore::get_all_servers();
$in_abort = ObjStore::abort_in_progress();
$db = ObjStore::open($pathname, $read_only, $mode);
$num = ObjStore::get_n_databases();
::Server
$name = $s->get_host_name();
$is_broken = $s->connection_is_broken();
$s->disconnect();
$s->reconnect();
@Databases = $s->get_databases();
::Database
$db->close();
$db->destroy();
$db->get_default_segment_size();
$db->get_sector_size();
$db->size();
$db->size_in_sectors();
$ctime = $db->time_created();
$is_open = $db->is_open();
$db->open_mvcc();
$is_mvcc = $db->is_open_mvcc();
$read_only = $db->is_open_read_only();
$can_write = $db->is_writable();
$db->set_fetch_policy(policy[, blocksize]);
Policy can be one of
segment
,page
, orstream
.$db->set_lock_whole_segment(policy);
Policy can be one of
as_used
,read
, orwrite
.$db = ObjStore::Database::of($pvar);
$Seg = $db->create_segment();
$Seg = $db->get_segment($segment_number);
@Segments = $db->get_all_segments();
@Roots = $db->get_all_roots();
$root = $db->create_root($root_name);
$root = $db->find_root($root_name);
$value = $db->root($root_name[, $new_value]);
This is the recommended API for roots. If the given root is not found, creates a new one. Sets the root's value if $new_value is defined. Returns the root's current value.
$db->destroy_root($root_name);
Destroys the root with the given name if it exists.
::Root
$root->get_name();
$root->get_value();
$root->set_value($new_value);
$root->destroy();
::Transaction
ObjectStore transactions and exceptions are seemlessly integrated into perl. ObjectStore exceptions cause a die
in perl just as perl exceptions cause a transaction abort.
try_update {
$top = $db->root('top');
$top->{abc} = 3;
die "Oops! abc should not change!"; # aborts the transaction
};
There are three types of transactions: try_read
, try_update
, and try_abort_only
. In a read transaction, you are not allowed to modify persistent data.
try_read {
my $var = $db->root('top');
$var->{abc} = 7; # write to $var triggers die(...)
};
$T = ObjStore::Transaction::get_current();
$type = $T->get_type();
$pop = $T->get_parent();
$T->prepare_to_commit();
$yes = $T->is_prepare_to_commit_invoked();
$yes = $T->is_prepare_to_commit_completed();
ObjStore::set_transaction_priority($very_low);
ObjStore::set_max_retries($oops);
ObjStore::rethrow_exceptions
my $oops = ObjStore::get_max_retries();
my $yes = ObjStore::is_lock_contention();
my $type = ObjStore::get_lock_status($ref);
my $tm = ObjStore::get_readlock_timeout();
my $tm = ObjStore::get_writelock_timeout();
ObjStore::set_readlock_timeout($tm);
ObjStore::set_writelock_timeout($tm);
::Segment
$Seg->destroy();
$size = $Seg->size();
$yes = $Seg->is_empty();
$yes = $Seg->is_deleted();
$num = $Seg->get_number();
$comment = $Seg->get_comment();
$Seg->set_comment($comment);
$Seg->lock_into_cache();
$Seg->unlock_from_cache();
$Seg->set_fetch_policy($policy[, $size]);
Policy can be one of
segment
,page
, orstream
.$Seg->set_lock_whole_segment($policy);
Policy can be one of
as_used
,read
, orwrite
.$Seg = ObjStore::Segment::of($pvar);
CONTAINERS
Databases are comprised of segments. Segments dynamically resize from very small to very big. You should split your data into lots segments when it makes sense. Segment improve locality of reference and can be a unit of locking or caching.
When you create a container you must specify the segment in which it is to be allocated. All containers are created using the form 'new ObjStore::$type($store, $cardinality)'
. You may pass any persistent object in place of $store and the new container will be created in the same segment as the $store object!
Arrays
The following code snippet creates a persistent array reference with an expected cardinality of ten elements.
my $a7 = new ObjStore::AV($store, 10);
None of the usually array operations are supported except fetch and store. (Push, pop, shift and unshift are available but not documented..., oops!) At least the following works:
$a7->[1] = [1,2,3,[4,5],6];
Complete array support will be available as soon as Larry and friends fix the TIEARRAY interface. (See perltie(3) or http://www.perl.com more info.)
Hashes
The following code snippet creates a persistent hash reference with an expected cardinality of ten elements.
my $h7 = new ObjStore::HV($store, 10);
An array representation is used for low cardinalities. Arrays do not scale well, but they do afford a compact representation. ObjectStore's os_Dictionary
is used for large cardinalities.
Data structures can be built with the normal perl syntax:
$h7->{foo} = { 'fwaz'=> { 1=>'blort', 'snorf'=>3 }, b=>'ouph' };
Or the equally effective but unbearibly tedious:
my $h1 = $dict->{foo} ||= new ObjStore::HV($dict);
my $h2 = $h1->{fwaz} ||= new ObjStore::HV($h1);
$h2->{1}='blort';
$h2->{snorf}=3;
$h1->{b}='ouph';
Perl saves us again! Relief.
Sets
If you have installed older releases, you might know that sets were supported. They still work, but they are re-implemented in terms of hashes.
And Cursors (Oh My!)
All containers have a method, new_cursor($segment)
, that creates a persistent cursor. A cursor is like any other object. It is stored in the database and you can bless it into a package. The following methods are always available:
$cs = $c->new_cursor($seg); # creates a cursor in segment $seg
$c2 = $cs->focus; # returns another ref to $c
($k,$v) = $cs->next; # returns the next element
Arrays return (index,value) pairs. Hash cursors return (key,value) pairs. Sets do something reasonable, but they are highly depreciated. All cursors return the empty list () when no more elements are available.
You should not assume the order of iteration will follow any particular pattern (but it probably will).
If you change membership of a collection while you're iterating through it, anything could happen, so don't.
Depending on the collection representation, cursors may have additional useful behavior. Currently, there is no way to test for this.
Cursors do not change the primary reference count; they use weak references. That means that you can know if the container is deleted even while cursors are focused on it. Use the
deleted
method to check.Cursors are also the only safe way to refer across databases. XXX
In the future, cursors may be extended to support the following methods:
$en = $cs->prev;
$cs = $key; # seek to $key
$dist = $cs - $cs2; # return the distance between two cursors
There may also be a way to test for availability of advanced features. (E.g. 'can_chicken_walk')
INTROSPECTION
ospeek
While there is no formalized schema for a perl database, the ospeek
utility generates a sample of data content and structure. The following output was snapped from a database created with the SYNOPSIS. This is the full, complete output. ospeek
outputs a summary, not the entire database.
Wait! No Schema?! How Can This Scale?
How can a relational database scale?! When you write down a central schema, you are violating the principle of encapsulation. This is dumb. None of the usual database management operations require a central schema. Why create artificial dependencies between your classes when you can avoid it?
Lazy Evolution
Even schema evolution can be done piecemeal. Give all your objects an evolve
method that insures that their representation is up-to-date.
Tag your objects with version numbers.
Or intelligently figure out how to evolve objects by examining their current structure.
The main thing is to keep an archive of prior object formats to regression test your new evolve
methods. If you can do extracts to a mini-database, that would do the trick. Then just run your new code through a copy of your historical database.
ospeek Example Output
ObjStore::Root whiteboard = ObjStore::AV [
ObjStore::HV {
msgs => ObjStore::AV [
'I will not talk in ObjectStore/perl class.',
'I will study the documentation before asking questions.',
],
repetition => 0,
},
ObjStore::HV {
msgs => ObjStore::AV [
'I will not talk in ObjectStore/perl class.',
'I will study the documentation before asking questions.',
],
repetition => 1,
},
ObjStore::HV {
msgs => ObjStore::AV [
'I will not talk in ObjectStore/perl class.',
'I will study the documentation before asking questions.',
],
repetition => 2,
},
...
],
Examined 1022 persistent slots.
posh
You can also walk around a database from the inside. Study the output I snapped from this posh
session:
posh 1.17 (Perl 5.00403 ObjectStore Release 5.0.1.0)
[set for READ]
/opt/os/tmp% ls
copier.db perltest.db.copy silly.db
perltest.db posh.db test.db
/opt/os/tmp% cd silly.db
$at = ObjStore::HV {
whiteboard => ObjStore::UNIVERSAL::Ref ...
},
% cd $at->{whiteboard}->focus
$at = ObjStore::AV=ARRAY(0xe0580000)% ls
$at = ObjStore::AV [
ObjStore::HV ...
ObjStore::HV ...
ObjStore::HV ...
...
],
$at = ObjStore::AV=ARRAY(0xe0580000)% ls $at->[0]->{msgs}
[0] = ObjStore::AV [
'I will not talk in ObjectStore/Perl class.',
'I will study the documentation before asking questions.',
],
$at = ObjStore::AV=HASH(0xe0580000)% update
[set for UPDATE]
$at = ObjStore::AV=ARRAY(0xe0580000)% cd $at->[0]->{msgs}
$at = ObjStore::AV=ARRAY(0xe058201c)% $at->[0] = 'This is ridiculous.';
$fake1 = 'This is ridiculous.',
$at = ObjStore::AV=ARRAY(0xe058201c)% ls
$at = ObjStore::AV [
'This is ridiculous.',
'I will study the documentation before asking questions.',
],
$at = ObjStore::AV=ARRAY(0xe058201c)% cd ..
$at = ObjStore::AV=ARRAY(0xe0580000)% ls
$at = ObjStore::AV [
ObjStore::HV ...
ObjStore::HV ...
ObjStore::HV ...
...
],
$at = ObjStore::AV=ARRAY(0xe0580000)% for (1..100) { $at->[$_] = $at->[0]; }
$fake2 = '',
$at = ObjStore::AV=ARRAY(0xe0580000)% ls(map {$at->[$_]->{msgs}} 68..70)
[0] = ObjStore::AV [
'This is ridiculous.',
'I will study the documentation before asking questions.',
],
[1] = ObjStore::AV [
'This is ridiculous.',
'I will study the documentation before asking questions.',
],
[2] = ObjStore::AV [
'This is ridiculous.',
'I will study the documentation before asking questions.',
],
WHY IS PERL A BETTER FIT FOR DATABASES THAN SQL, C++, OR JAVA?
When you write a structure declaration in C++ or Java you are assigning both field-names, field-types, and field-order.
struct CXX {
char *name;
char *title;
double size;
};
Programs almost always require a recompile to change any of these attributes. This is fine for small to medium size applications but is not suitable for large databases. It is too inflexible. An SQL-type language is needed.
When you create a table in SQL you are assigning only field-names and field-types.
create table CXX
(name varchar(80),
title varchar(80),
size double)
This is a more flexible data declaration, but SQL gives you far less expressive power than C++ or Java. Applications end up being written in C++ or Java while their data is stored in SQL. Managing the syncronization between the two languages creates a lot of extra complexity. So much so that there are many software companies that exist solely to help address this headache.
perl is better because it spans all the requirements in a single language. For example, this is similar to an SQL table:
my $h1 = { name => undef, title => undef, size => undef };
Only the field-names are specified.
To address the other side of the spectrum, Malcolm Beattie is working on a perl compiler which is currently in beta-test. Here is his brief description of a new hybrid hash-array that is supported:
An array ref $a can be dereferenced as if it were a hash
ref. $a->{foo} looks up the key "foo" in %{$a->[0]}. The value is the
index in the true underlying array @$a. As an addition, if the array
ref is in a lexical variable tagged with a classname ("my CXX $obj" to
match your example above) then constant key dereferences of the form
$obj->{foo} are mapped to $obj->[123] at compile time by looking up
the index in %CXX::FIELDS.
For example:
my $schema_hashref = { 'field1' => 1, 'field2' => 2 };
my $arr = [$schema_hashref, 'fwaz', 'snorf'];
print "$arr->{field1} : $arr->{field2}\n"; # "fwaz : snorf"
I haven't done benchmarks yet, but considering the implementation, compiled fake hashes should make perl very competitive with Java / ObjectStore database applications in terms of raw performance.
Summary (LONG)
SQL
All perl databases use the same flexible schema that can be examined and updated with generic tools. This is the key advantage of SQL, now available in perl.
Perl / ObjectStore is definitely faster than SQL too. Not to mention that perl is a general purpose programming language and SQL is at best a query language.
C++
Special purpose data types can be coded in C++ and dynamically linked into perl. Since C++ will always be faster than Java, this gives perl an edge in the long run. Perl is to C/C++ as C/C++ is to assembly language.
JAVA
Java has the buzz, but:
Just like C++, the lack of a universal generic schema limits use to a single application at a time. Without some sort of
tie
mechanism, I don't see how this can be remedied.All Java databases must serialize data to store it. Until Java supports persistent allocation directly, database operations will always be slower than C++.
Perl will soon integrate with Java enough to use SwingSet - AWT.
I'd like to see some comparisions of code length when solving the same problems in Java and in perl. I have a strong suspicion that it is easier to do data processing in perl.
Summary (SHORT)
Perl can store data
optimized for flexibility and/or for speed
in transient memory and persistent memory
without violating the principle of encapsulation or obstructing general ease of use.
ETA
0-3 MONTHS
Perl compiler; perl kernel threads; fake hashes
3-6 MONTHS
Dynamically loaded application schemas; proper tied arrays; debugged tie interface; Perl-Java integration
THE ADVANCED CHAPTER
Bless
The ObjStore module installs its own version of bless
which assures that blessings are persistent. For example:
package MyObject;
use ObjStore;
@ISA = qw(ObjStore::HV);
sub new {
my ($class, $store) = @_;
my $o = $class->SUPER::new($store, $class);
$o->{attribute} = 5;
$o;
}
package main;
my $o = new MyObject($db);
If you store each class in a separate .pm
file in your @INC path (see require
), then the classes will be autoloaded as you traverse your data.
Class Autoloading
ObjStore tries to require
each class as you access persistent instances the first time. This means that you can write generic data processing programs that automatically load the appropriate libraries to manipulate data as it's accessed.
To disable the class autoloading behavior:
ObjStore::disable_class_auto_loading();
This mechanism is orthogonal to the AUTOLOAD
mechanism for autoloading functions.
Transactions Redux
EVAL
Transactions are always executed within an implicit
eval
. If you do not want to abort your program when an ObjectStore exception occurs, you should indicate that you want to have control over your own reflexive behavior:ObjStore::rethrow_exceptions(0);
After a transaction, you will need to check the value of
$@
to see if anything went wrong and determine how to proceed.try_update { ... }; die if $@; # check for errors!
DEADLOCK
Transactions are automatically retried in the case of a deadlock. If you need to handle deadlocks specially, you can use ObjStore::set_max_retries(0) and write the logic (or illogic) yourself.
Stargate
The stargate determines which collection representations are used to store implicitly created hashes and arrays. It is called recursively on data structures in order to copy them into persistent memory. If you replace the default stargate with your own, make sure to dismember the transient structures as they are processed to insure that cyclic structures will be collected in transient memory. (See ObjStore.pm
for an example.)
Cross Database Pointers
You can allow cross database pointers with:
$db->allow_external_pointers;
But do not do this brainlessly! Databases that have references to other databases affect refcnts. Your refcnts will be wrong if you simply osrm
random databases. This will cause some of your data to become undeletable.
To deal with this properly, you must systematically undef
all your references and insure that a given database is empty before you osrm
it. $db-
destroy> contains the appropriate logic. However, pay special attention to cyclic structures. You must break all cycles in your data to allow it to be collected by $db-
destroy>. It's not hard, you just need to pay attention to detail.
Performance Check List
The word tuning implies too high a brain-level requirement. Getting performance out of ObjectStore is not rocket science.
DO AS MUCH AS POSSIBLE PER TRANSACTION
SEGMENTS
Is your data partitioned into as many segments as possible? (See the introduction to containers.)
COMPACTNESS
Is your data stored as compactly as possible? You get 90% of your performance because you can fit your whole working data set into RAM. If you are doing a good job, your database should be less than twice the size of it's ASCII dump; i.e., less than 2 times expansion. (See the section of data representation.)
WHERE IS THE REAL BOTTLENECK?
Use the 'time' command or DProf to analyze where your program is spending most of it's time. osp_copy is bottlenecked by perl and the network, not the database. Try using the perl compiler. (See http://www.perl.com ) Try upgrading to your network to ATM or run your program on the same machine as the ObjectStore server.
LOCKING AND CACHING
Object Design claims that your caching and locking settings also impact performance. I haven't been able to verify this. (See os_segment::set_lock_whole_segment and os_database::set_fetch_policy.)
TECHNICAL IMPLEMENTATION
You don't have to understand anything about the technical implementation. Just know that:
ObjectStore is outrageously powerful, sophisticated, and over-engineered.
The perl interface is optimized for simplicity and easy of use. (If it's not fun, why bother?)
The performance of raw ObjectStore is so good that even with a gunky perl layer, benchmarks will show that relational databases can be safely left on the bookshelf where they belong.
Differences Between The Perl And C++ APIs
Most stuff should be roughly the same. However,
Some static methods sit directly under
ObjStore::
Transactions are simplified.
Data Representation
Memory usage is much more important in a database than in transient memory. When databases can be as large or larger than ten million megabytes, a few percent difference in compactness can mean a lot.
To store 32 bit signed integers and doubles, small memory blocks must be allocated. Integers are stored in OSPV_iv
and doubles are stored in OSPV_nv
. 16 bit signed integers are stored much more efficiently than other types of numbers. However, you may need to tell perl to use integer
since perl generally defaults to using doubles.
[table of bytes per ?]
splash collections ...
Hard Limits
Reference counts are only 32 bits unsigned.
Weak reference counts are only 16 bits unsigned.
Strings are limited to a length of 32767 bytes.
Go Extension Crazy
ObjStore::UNIVERSAL
is the base class for all persistent objects. You cannot directly access persistent scalars from perl. They are always immediately copied into transient scalars. So the ObjStore::UNIVERSAL
base class is only for objects (or collections).
ObjStore::UNIVERSAL::Ref
is the base class for references.
ObjStore::UNIVERSAL::Container
is the base class for all containers.
ObjStore::UNIVERSAL::Cursor
is the base class for cursors.
ObjStore::AV
is the base class for tied arrays.
ObjStore::HV
is the base class for tied hashes.
ObjStore::File
will be the base class for large binary data.
When an ObjectStore exception occurs, $ObjStore::EXCEPTION
is called with an explaination. You can replace the default handler with your own function.
Each subclass of ObjStore::UNIVERSAL::Container
has a %REP
hash. Persistent object implementations add their create functions to the hash. Each packages' new
method decides on the best representation, calls the creation function, and returns the persistent object.
You can add your own C++ representations for each of AV and HV. If you want to know the specifics, look at the code for the built-in representations (GENERIC.*
).
You can add new families of objects that inherit from ObjStore::UNIVERSAL
. Suppose you want highly optimized, persistent bit vectors? Or matrics? These would not be difficult to add. Especially once Object Design figures out how to support multiple application schemas within the same executable. They claim that this tonal facility will be available in the next release.
ossv_bridge typemap
The following explaination may be helpful to developers trying to understand the ObjStore typemap. If you don't know what a typemap is, just skip to the next section.
The struct ossv_bridge
is used to bridge between perl and C++ objects. It contains transient cursors and transient pointers to persistent data. Immediately after a transaction finishes, invalidate
is invoked on all outstanding bridges.
This is necessary in order to update the reference counts properly. This was also the most difficult part to get right. But hey, how many databases do ref counting?
DIRECTION
LEANER COLLECTION REPRESENTATIONS
The ObjectStore collections are weighted down with unnecessary index and query support. I'd like to replace them with a suite of lean representations for large cardinality collections to compliment the Splash collections.
MORE BUILT-IN DATA TYPES
File objects implemented using osmmtype and subclassed from IO::Handle. Support for one of Object Design's Text Object Managers? Support for bit vectors and matrics?
APIs
Support for notification, database access control, and any other interesting ObjectStore APIs.
EXPORTS
bless
, try_read
, try_update
, try_abort_only
by default. Most other static methods can also be exported.
BUGS
HIGH VOLITILITY
Anything not documented is subject to change without notice. (I will try to preserve backward compatibility when possible.)
CURSED OBJECTS
The strings used to record the blessed nature of persistent objects are allocated in a private hash in the default segment of a database (See
'ospeek -all'
). If you accidentally mess up or change any of these strings, your objects will be cursed. You have a backup, right?NESTED TRANSACTIONS
Disabled until the transaction support is cleaned up.
MOP
This is not a general purpose ObjectStore editor with complete MOP support. Actually, I don't think this is a bug.
AUTHOR
Copyright (c) 1997 Joshua Nathaniel Pritikin. All rights reserved.
This package is free software; you can redistribute it and/or modify it under the same terms as perl itself. The software is provided "as is" without express or implied warranty. Perl / ObjectStore is available via any CPAN mirror site. See http://www.perl.com/CPAN/modules/by-module/ObjStore
Portions of the collection code snapped from splash, Jim Morris's delightful C++ library ftp://ftp.wolfman.com/users/morris/public/splash .
Also, a poignant thanks to all the wonderful teachers with which I've had the opportunity of studying.
SEE ALSO
Examples in the t/ directory, perl5, ObjectStore, and never again SQL!