NAME

ObjStore::Internals - a few notes on the implementation

SYNOPSIS

You don't have to understand anything about the technical implementation. Just know that:

  • ObjectStore is outrageously powerful; sophisticated; and even over-engineered.

  • The perl interface is optimized to be fun and easy. Since ObjectStore is also blindingly fast, you can happily leave relational databases to collect dust on the bookshelf where they belong.

So basically, you don't have to understand anything to a greater depth. It's not necessary. You've arrived. You will be successful. However, more detail follows. If you like to turn things inside-out, read on!

DESCRIPTION

Perl & C++ APIs: What's The Difference?

Most stuff should be roughly the same. The few exceptions have generally arisen because there was a perl way to make the interface more programmer friendly.

  • Transactions are perlified.

  • Some static methods sit directly under ObjStore:: instead of under their own classes. (Easier to import.)

  • Databases are always blessed according to your pleasure.

  • lookup, open, is_open, and lock_timeout are augmented with multi-color, pop-tart style interfaces.

Why not just store perl data with the usual perl structures?

  • CHANGE CONTROL

    As perl evolves, new data layouts are introduced. These changes must not cause database compatibility problems.

  • BINARY COMPATIBILITY

    Perl doesn't have to worry about binary compatibility between platforms. Databases do. In addition, databases impose a number of restrictions on persistent data layout that would be onerous and sub-optimal if adopted by perl.

  • MEMORY USAGE

    Perl often trades memory for speed. This is the wrong trade for a database. Memory usage is much more of a concern when data sets can be as large or larger than ten million megabytes. A few percent difference in compactness can be quite noticable.

Bless

If you are a suspicious person (like my mom) you might have suspected that the ObjStore module installs its own version of bless. Natually it does! The augmented bless implements extra quality assurance to insure that blessings are correctly stored persistently. For example:

package Scottie;
use ObjStore;
use base 'ObjStore::HV';

sub new {
    my ($class, $near) = @_;
    $class->SUPER::new($near, { fur => 'buffy' });
}

my Scottie $dog = Scottie->new($db);

Persistent bless also does some extra work to make evolution easier. It stores the current @ISA tree along with the $VERSION of every class in the @ISA tree. Furthermore, the isa method is overridden such that it reports according to the moment of the bless. Similarly, the versionof method lets you query the saved $VERSIONs. This may be helpful when doing evolution, as you can compare the old @ISA and $VERSIONs to figure out what to change (and how). UNIVERSAL::can is unmodified.

Technically speaking, bless is re-implemented such that it can be extended by the bless from and the bless to classes via the BLESS method. Both the bless from and bless to operations are funnelled through a single BLESS method like this:

sub BLESS {
    my ($r1,$r2);
    if (ref $r1) { warn "$r1 leaving ".ref $r1." for a new life in $r2\n"; }
    else         { warn "$r2 entering $r1\n"; }
    $r1->SUPER::BLESS($r2);
}

UNLOADED

Generic tools such as posh or ospeek must bless objects when reading from an arbitrary database. To bless, there must be information about the inheritance tree. To try to get it, unknown classes found in a database are require'd. However, the require may fail. If it does fail, a package must be faked-up and ${"${package}::UNLOADED"} is set to true. This flag is used to signal that the @ISA tree should not be considered authoritative for a particular package.

Representation

All values take a minimum of 8 bytes (OSSV). These 8 bytes are used to store 16-bits of type information, a pointer, and a general purpose 16-bit value.

value stored                   extra allocation (in addition to OSSV)
------------------------------ -------------------------------------
undef                          none
pointer                        none
16-bit signed integers         none
32-bit signed integers         4 byte block (OSPV_iv)
double                         8 byte block (OSPV_nv)
string                         length of string (char*)
object (ref or container)      sizeof object (see subclasses of OSSVPV)
                               additional references take no extra space
bless                          .5-1k bytes per class (zero per object)

%ObjStore::sizeof XXX

Since 32-bit integers and doubles are fairly common and should be stored densely, a pool allocation algorithm is planned.

The ODI FAQ also states: In addition, there is an associated entry in the info segment for the segment in question for each allocation of the object. This is done in the tag table. The overhead is 16 bits (i.e., 2 bytes) for each singleton (i.e., non-array) allocation, 32 bits for each character array allocation for character arrays <= 255 characters, and 48 bits for each character array allocation > 255 characters, or any array allocation of an object of another type. Also, depending on the size of an object (i.e., if you allocate a "huge" object - one that is >64Kb) there is other overhead caused by alignment constraints.

If this seems like a lot of overhead, consider that it is not really possible to directly compare these numbers to RDBMS statistics. At least we can say that relational data can be stored with much less duplication when moved into ObjectStore. This is unquestionably true when you tailor your own C++ extensions to fit data access patterns.

Representation Limitations

  • Exact width types are preferred. Specify number of bits per integer when possible. It is still mostly unresolved as to how to deal with 64-bit integer types.

  • Try to be as binary compatible as possible between different platforms. N-bit width types generally need n-bit alignment. For example, 32-bit integers must usually be stored with 32-bit alignment.

  • Unions are not supported. (Don't even think about it! :-)

  • Variable length structures are probably not supported. For example:

    struct varstr {
      int refcnt;
      char string[0];  # sized via malloc
    };

    Instead, you must allocate an array separately:

    struct varstr {
      int refcnt;
      char *string;    # string = malloc(sizeof(char) * len)
    };
  • Changing the layout of structures after they are stored in a database is generally a nightmare. Instead it is recommended that a version number be appended to the name of the structure (e.g. mystruct1, mystruct2, mystruct3) and all support code be kept indefinitely.

Go Extension Crazy

Add your own C++ representation. New families of objects can inherit from ObjStore::UNIVERSAL. Suppose you want highly optimized, persistent bit vectors? Or matrics? No problem.

Documentation is slim, but don't let that stop you. There are many examples. See the included representations and also ObjStore::REP::HashRecord & ObjStore::Lib::PDL.

Typemap

The typemap is complicated because of the need to insure that persistent data is not accessed outside of its transaction.

OODB[SCALAR1 SCALAR2]
       |        |
     BRIDGE  BRIDGE
       |        |
PERL[SCALAR1 SCALAR2]

A bridge has two owners: perl and the current transaction. The bridge and the scalar have different lifetimes. The scalar lives for MIN(perl,txn), while the bridge must live for MAX(perl,txn) (or at least until perl is done).

Persistent refcnts are only (can only) be updated during update transactions. Fortunately, read-only transactions pose no problem: refcnts cannot be updated but object cannot be deleted either.

Bridges are also used to store transient cursors associated with collections. For example, suppose you need to iterate over a hash during a read transaction. The hash is read-only so you create the cursor transiently and store it in the bridge.

[XXX also mention dynacast stuff]

Notes On The Source Code

  • Functions or methods starting with '_' are internal to the ObjStore extension. They are subject to change (entirely) without notice.

  • Avoid const, privacy & templates. C++ sucks! Long live C++!

  • The relationship between references and cursors is strange. It's probably best not to think about it.

RELATED RESEARCH

ftp://ftp.cs.utexas.edu/pub/garbage/texas/README

http://cs-tr.cs.cornell.edu:80/Dienst/UI/1.0/Display/ncstrl.utexas_cs/CS-TR-98-07

http://www.usenix.org/publications/library/proceedings/usenix98/full_papers/saito/saito_html/saito.html

http://theory.lcs.mit.edu/~cilk/

BUGS

  • UNFORGEABLE NOTIFICATION SENDER INFORMATION

    ODI feature request #13632: It would be very useful to know which client sent a given notification. While the client could fill in this information as part of the notification, the osserver already knows the sender's client# and could pass this information transparently to subscribers. The additional overhead be just 2 bytes per received notification.

  • CROSS DATABASE POINTERS

    This feature is highly depreciated and will likely be discontinued, but at the moment you can allow cross database pointers with:

    $db->_allow_external_pointers;    #never do this!

    But you should avoid this if at all possible. Using real pointers will affect refcnts, even between two different databases. Your refcnts will be wrong if you simply osrm a random database. This will cause some of your data to become permenently un-deletable. Currently, there is no way to safely delete un-deletable data.

    Instead, you can use references or cursors to refer to data in other databases. References may use the os_reference_protected class which is designed precisely to address this problem. Refcnts will not be updated remotely, but you'll still be protected from accessing deleted objects or removed databases. (Imagine the freedom. :-)

  • ObjStore::AVHV EVOLUTION

    Indexed records temporarily cannot be evolved due to const-ness. For now, it is recommended that records be removed, changed, and re-added to the table when changing indexed fields.

  • os_protected_reference

    Allocates persistent memory that cannot be reclaimed without destroying the segment. This makes it non-trival to determine whether a segment is empty or not. The needed change is listed as ODI feature request #SE055496_O#.

  • WIN32

    There might be issues with setting up signal handlers. I don't know. I don't use Microsoft software if I can avoid it.

  • MOP

    This is not a general purpose ObjectStore editor with complete MOP support. Actually, I don't think this is a bug!

  • HIGH VOLITILITY

    Everything is subject to change without notice. (But backward compatibility will be preserved when possible. :-)

  • POOR QUALITY DOCUMENTATION

    I didn't get a Ph.D in English. Sorry!

SOURCE CODE AVAILABILITY

This section is not a legal contract. This section is not a legal statement in any way, shape, or form.

While there have been gains in software quality in the form of GNU, Perl, Apache, Linux, Qt, and recently the Netscape browser (potentially), we have been in the dark ages with respect to database technology. After avoiding relational databases for years, I sensed a combination of ObjectStore and Perl could offer the same level of quality and simplicity that I find invaluable in addressing the hurdles I face as a software professional.

It might seem obvious in hindsight (doesn't it always?!), but it is my conviction that it was something in particular (not just luck!) that encouraged me to imagine and implement this cutting-edge technology years before it would be recognized and adopted. Therefore, I would like to say "thank you!" to all the wonderful teachers with whom I've had the opportunity of studying. My ideas are truly theirs. While I predict that this software will be considered a world-class inter-disciplinary achievement (!), I do not see it as such. Rather, it was a simple mechanical exercise that I used to gain a basic level of clear understanding with the happy side effect of being useful in a business context. My subsequent hope is that people who use this software will also be able to learn from its clarity. But what are my real aspirations?

I should keep quiet about that but I can say this much: one of my friends had an email signature that read, "Life would be so much easier if we could just look at the source code." ("Computational metaphysics" :-) If you share this interest then I can tell you with some certainty that the answers are not found exclusively in the software world. However, interesting parallels can be drawn and are not unhelpful.

If you will suspend judgement for a moment, I would like to invite you imagine a hypothetical situation: What if you found someone who was absolutely convincing in their knowledge of the absolute? What if you had an opportunity to spend time with them? What would you do?

Designing software requires a basic subtlety of awareness: to be successful, you must be able to fluctuate fluidly between the place of reason and the place of silent knowledge. The place of concern is the forerunner to the place of reason. Similarly, the place of unbending intent is the forerunner to the place of silent knowledge. Yet, this technique is just a technique. There is a big difference between mastering the technique with respect to software and knowing its possibilities and potential in the general case.

One of the main reasons I bothered to kill myself developing this software (and foolishly giving it away for free!) is that I want people to know that perfection is actually a possibility. It's actually possible for you to be the experiencer of perfection, continuously and/or predictably. It seems like most people I meet are only certain that they can experience the lack of perfection or at best that they can experience perfection once in a while with a great deal of preparation and set up. Maybe you don't care? Maybe you don't believe it's possible? That's okay! You are welcome to use this software anyway (it will work wonders :-). Enlightenment is impersonal.