NAME
ObjStore::Internals - a few notes on the implementation
SYNOPSIS
You don't have to understand anything about the technical implementation. Just know that:
ObjectStore
is outrageously powerful; sophisticated; and even over-engineered.The perl interface is optimized to be fun and easy. Since
ObjectStore
is also blindingly fast, you can happily leave relational databases to collect dust on the bookshelf where they belong.
DESCRIPTION
Perl & C++
APIs: What's The Difference?
Most stuff should be roughly the same. The few exceptions have generally arisen because there was an easy way to make the interface more programmer friendly.
Transactions are perlified.
Some static methods sit directly under
ObjStore::
instead of under their own classes. (Easier to import.)Databases are always blessed according to your pleasure.
lookup
,open
,is_open
, andlock_timeout
are augmented with multi-color, pop-tart style interfaces.
Why not just store perl data with the usual perl structures?
CHANGE CONTROL
As perl evolves, new data layouts are introduced. These changes must not cause database compatibility problems.
BINARY COMPATIBILITY
Perl doesn't have to worry about binary compatibility between platforms. Databases do. In addition, databases impose a number of restrictions on persistent data layout that would be onerous and sub-optimal if adopted by perl.
MEMORY USAGE
Perl often trades memory for speed. This is the wrong trade for a database. Memory usage is much more of a concern when data sets can be as large or larger than ten million megabytes. A few percent difference in compactness can be quite noticable.
Representation
All values take a minimum of 8 bytes (OSSV). These 8 bytes are used to store a 16-bit type, a pointer, and a general purpose 16-bit integer.
value stored extra allocation (in addition to OSSV)
------------------------------ -------------------------------------
undef none
pointer none
16-bit signed integers none
32-bit signed integers 4 byte block (OSPV_iv)
double 8 byte block (OSPV_nv)
string length of string (char*)
object (ref or container) sizeof object (see subclasses of OSSVPV)
bless .5-1k bytes per class (zero per object)
%ObjStore::sizeof XXX
Since 32-bit integers and doubles are fairly common and should be stored densely, a pool allocation algorithm is planned.
The ODI FAQ also states: In addition, there is an associated entry in the info segment for the segment in question for each allocation of the object. This is done in the tag table. The overhead is 16 bits (i.e., 2 bytes) for each singleton (i.e., non-array) allocation, 32 bits for each character array allocation for character arrays <= 255 characters, and 48 bits for each character array allocation > 255 characters, or any array allocation of an object of another type. Also, depending on the size of an object (i.e., if you allocate a "huge" object - one that is >64Kb) there is other overhead caused by alignment constraints.
If this seems like a lot of overhead, consider that it is not really possible to directly compare these numbers to RDBMS statistics. (Part of the problem is that RDBMS vendors can't even give you these statistics.) At least note that relational data can be stored with much less duplication when moved into ObjectStore
. (Definitely true if you write C++ extensions.) Of course, the real test must always be to code up your problem and make experimental measurements.
Hard-Coded Limits
Reference counts are only 32 bits unsigned. (The first person to hit this limit in a real application will receive a check from me for $32. Please submit a one page description of your application for judging. :-)
Readonly counts are only 16 bits. Once the counter reachs 2^16-10, the object becomes permenantly readonly. This should not be a problem in practice (actually, not even in theory).
Strings are limited to a length of 32767 bytes. (This limit will be relaxed...)
Bless
If you are a suspicious person (like my mom) you might have suspected that the ObjStore module installs its own version of bless
. Natually it does! The augmented bless
implements extra quality assurance to insure that blessings are correctly stored persistently. For example:
package Scottie;
use ObjStore;
use base 'ObjStore::HV';
$VERSION = '2.00';
sub new {
my ($class, $store) = @_;
my $o = $class->SUPER::new($store, { fur => 'buffy' });
$o;
}
package main;
my Scottie $dog = new Scottie($db);
# once a Scottie, always a Scottie
Persistent bless
also does some extra work to make evolution easier. It stores the current @ISA
tree along with the $VERSION
of every class in the @ISA
tree. (This necessitates assigning a $VERSION
to every class in the @ISA
tree. If this seems draconeion, recall that you have been spared the maintanance of a centralize schema.) Furthermore, the isa
method is tweaked such that it reports according to the moment of the bless
. Similarly, the versionof
method lets you query the saved $VERSION
s. This is helpful when doing evolution, as you can compare the old @ISA
and $VERSION
s to figure out what to change (and how). (UNIVERSAL::can
is unmodified.)
Technically speaking, bless
is re-implemented such that it can be extended by the bless from and the bless to classes via the BLESS
method. Both the bless from and bless to operations are funnelled through a single BLESS
method like this:
sub BLESS {
my ($r1,$r2);
if (ref $r1) { warn "$r1 leaving ".ref $r1." for a new life in $r2\n"; }
else { warn "$r2 entering $r1\n"; }
$r1->SUPER::BLESS($r2);
}
UNLOADED
Generic tools such as posh
or ospeek
must bless
objects when reading from an arbitrary database. Prior to trying to locate the implementations of arbitrary objects, get_INC
is used to fetch the stored @INC
and syncronize it with the transient @INC
. Then, each class found in the database is require
'd. However, if the require
fails, a package must be faked-up: ${"${package}::UNLOADED"}
is set to true. This signals that the @ISA tree should not be considered authoritative.
Go Extension Crazy
You cannot directly access persistent scalars from perl. They are always immediately copied into transient scalars. This is actually faster than the alternatives in most cases.
While all persistent objects are blessed, they are not considered blessed in the database unless they are members of some non-default class (not os_class). NOREFS
is not invoked on non-blessed database objects.
$ObjStore::COMPILE_TIME XXX
ObjStore::File
will be the base class for large binary data.
Each subclass of ObjStore::UNIVERSAL::Container
has a %REP
hash. The new
method decides on the best representation, calls the best creation function from the %REP
hash, returning the newly minted persistent object.
You can add your own C++
representation. If you want to know the specifics, look at the code for the built-in representations.
You can add new families of objects that inherit from ObjStore::UNIVERSAL
. Suppose you want highly optimized, persistent bit vectors? Or matrics? These would not be difficult to add. Especially once Object Design figures out how to support multiple application schemas within the same executable. They claim that this tonal facility will be available in the next release.
ObjStore::Index
Indices are extremely efficient because they do not copy their keys. It is critical that another pointer to OSSV
s is not stored, since OSSV
s can be relocated when arrays need to grow. OSSVPV
s are never relocated.
Notes On The Source Code
Some of my code might be hard to read. Imagine how hard it was to write!
Functions or methods starting with '_' are internal to the
ObjStore
extension. They are subject to change entirely without notice.Avoid
const
, privacy & templates.C++
sucks! Long liveC++
!The relationship between references and cursors is strange. It's probably best not to think about it.
BUGS
UNFORGEABLE NOTIFICATION SENDER INFORMATION
ODI feature request #13632: It would be very useful to know which client sent a given notification. While the client could fill in this information as part of the notification, the
osserver
already knows the sender's client# and could pass this information transparently to subscribers. The additional overhead be just 2 bytes per received notification.SIGNALS ARE UNRELIABLE
[This bug might have been entirely fixed!]
Perl signal handlers are not bulletproof. Since ObjectStore makes aggressive use of signals, pathelogical failure is not impossible. Currently, it is recommmended that you avoid lock timeouts and deadlocks altogether. This can be accomplished by restricting each database to a single writer, and using mvcc mode for readers. (This is a good architecture anyway!)
If you cannot use this architecture, please test the reliability of your signal handlers before making a large code investment. This is worth testing because with conservative and careful use of perl signal handlers, you should be able to avoid triggering any of the known failure modes.
MIXING WITH EVAL
[This bug might have been entirely fixed!]
It is possible to use
eval
within transactions, but you absolutely must not use theObjectStore
API or access any persistent memory.begin('read', sub { ... eval { $db->root('new root' => [1,2,3]); }; ... });
In the above code, the update in a read transaction will cause an exception that crashes perl. This is due to the excellent but imperfect integration of
ObjectStore
exceptions and perl exceptions. I understand how to fix it, just haven't had time. In general, you should global replaceeval
tobegin
.ABORT MANUAL-OVERRIDE
[A proper fix is on the TODO list. Maybe this is solved.]
Since
eval
does not imply a nested transaction, if you useeval
you can have situations where a transaction must be manually aborted. This situation actually turns up inposh
. Perhaps it is best explained by example:my $cmd = <$input>; begin 'update', sub { eval $cmd; if ($@) { ObjStore::Transaction::get_current()->abort(); #manual abort! print $@; } undef $@; };
CROSS DATABASE POINTERS
This feature is highly depreciated and will likely be discontinued, but at the moment you can allow cross database pointers with:
$db->_allow_external_pointers; #never do this!
But you should avoid this if at all possible. Using real pointers will affect refcnts, even between two different databases. Your refcnts will be wrong if you simply
osrm
a random database. This will cause some of your data to become permenently un-deletable. Currently, there is no way to safely delete un-deletable data.Instead, you can use references or cursors to refer to data in other databases. References may use the
os_reference_protected
class which is designed precisely to address this problem. Refcnts will not be updated remotely, but you'll still be protected from accessing deleted objects or removed databases. (Imagine the freedom. :-)LEAKS TRANSIENT
XPVRV
sThe problem is thoroughly understood. Work-arounds or a real fix have been discussed on the perl-porters mailing list. Well designed mechanisms are being developed to solve this problem correctly.
ObjStore::AVHV
EVOLUTIONIndexed records temporarily cannot be evolved due to const-ness. For now, it is recommended that records be removed, changed, and re-added to the table when changing indexed fields.
os_protected_reference
Allocates persistent memory that cannot be reclaimed without destroying the segment. This makes it non-trival to determine whether a segment is empty or not. The needed change is listed as ODI feature request
#SE055496_O#
.TRANSACTIONS
Transaction hold onto transient memory longer than necessary. The solution is to use doubly-linked lists. This was proven to work in an eariler version, but unfortunately I took the code out because I thought it was too complicated.
MOP
This is not a general purpose
ObjectStore
editor with completeMOP
support. Actually, I don't think this is a bug.HIGH VOLITILITY
Everything is subject to change without notice. (But backward compatibility will be preserved when possible. :-)
POOR QUALITY DOCUMENTATION
I didn't get a Ph.D in English. Sorry!