NAME

BerkeleyDB::SecIndices::Accessor - Simply drive your BerkeleyDB database with secondary indices

SYNOPSIS

use BerkeleyDB::SecIndices::Accessor qw(EGET ELCK EPUT EUPD);

my $student = [];
push @$student, {
  NAME  => 'tom', 
  CLASS => 'one',
  GRADE => 'two',
  SCORE => 80,
};
push @$student, {
  NAME  => 'jerry', 
  CLASS => 'two',
  GRADE => 'two',
  SCORE => 75,
};
my $stubs = BerkeleyDB::SecIndices::Accessor::->_stubs;
foreach (@$student) {
  my $rc = $stubs->{STUDENT}->{PUT}->($_);
  die "cannot put record into database STUDENT" 
    if $rc == EPUT or $rc == ELCK;
}

my $count = $stubs->{STUDENT}->{COUNT}->();
print "so far we have $count record(s)\n";

$student = $stubs->{STUDENT}->{GETS}->($count);
foreach my $s (@$student) {
  print "recno: ", $s->{KEY}, "\n";
  print "name : ", $s->{CONTENT}->{NAME}, "\n";
  print "score: ", $s->{CONTENT}->{SCORE}, "\n";
}

$student = $stubs->{STUDENT}->{FIELDS}->{grade}->('two', 1);
foreach my $s (@$student) {
  $s->{CONTENT}->{GRADE} = 'three';
  my $rc = $stubs->{STUDENT}->{UPD}->(
    $s->{KEY}, $s->{CONTENT});
  die "cannot update record in database STUDENT" 
    if $rc == EUPD or $rc == ELCK;
}

my $number = $stubs->{STUDENT_INDEX}->{COUNTDUP}->{class}->('one');
print "we have $number students in class one\n";

DESCRIPTION

BerkeleyDB is one of the most famous flat/non-relational databases widely used. Depending on the very strong features offerred, one can implement fast cache, in-memory/embedded database, btree, queue and vice versa. Smart guys from both sleepycat and open-source world are working very hard to bring new wonderful stuff.

Here you are touching another great feature of BerkeleyDB - Secondary Indicies. One can create several secondary databases for index purpose.

A typical scenario is showed in code example above. The primary database whose name here is 'STUDENT' is associated with several secondary indicies. The index database's name, for instance, is one of the keys in primary database hash record. The hash constructs a table logically, thus one index database catches and groups all values of a specific table column. Later one can directly fetch record(s) from primary database or in-directly perform a match action via querying secondary index database.

WHY BerkeleyDB, WHAT ABOUT TRANSACTION, LOG and LOCK HERE

SQL-statement is mining nearly everything nowadays, indeed. For the data set which is access-performance-critical, stable, by-query-and-reference-mainly, of-short-length-record-type, non-table-join-demand, BerkeleyDB gains a stage.

In most databases one have to talk with database locker(s) here and there. The case is not so often this time, by introducing another feature of BerkeleyDB - Concurrent Access Mode. The database working under this mode, is nearly dead-lock-free. Refer to document on Sleepycat in case to know more about this.

DATABASE CONFIGURATION

One configuration file is required by this wheel, which is of YAML format. The path of that file is specified by $BerkeleyDB::SecIndices::Accessor::CONFIG.

Note: DO init this value in the BEGIN section of your code before using this module, since it will be fetched during compile-time, _NOT_ run-time. Refer to test case(s) attached.

Say, to write a configuration file for the code example above, the content should be:

#### database configuration begin ####

--- HOME: /path/to/your/database/home

DATABASE:

STUDENT:

  FILE: student.db

STUDENT_INDEX:

  FILE: indices_student.db

  SUBS: 

    - NAME

    - CLASS

    - GRADE

    - SCORE

#### database configuration end ####

As seen, this configuration file tells the module to create two db files. The primary database is named 'STUDENT', and allocated file name should be student.db, which will be created under the path introduced by param 'set_data_dir' in DB_CONFIG found under the directory specified by key 'HOME'.

The naming rule of secondary index databases is the name of primary database plus '_INDEX'. There will possibly be more than one database created within this file. Yeah, BerkeleyDB supports that. Each item in 'SUBS' leads to a secondary index database created. As you can guess, once you put a key/value in primary database, each secondary index database will created a key/value pair, the key will be something like $entry->{NAME}, its value will be the key of this record in primary database. In case $entry->{NAME} is a ref of ARRAY or HASH, a subroutine is required to extract/make desired index key, refer to next section on how to install your customized key extractor.

By default, the module will die unless it cannot found the key slot in hash to put into primary database.

Next, a DB_CONFIG file is required under the directory specified by 'HOME' key in configuration file. A sample content of this file:

#### DB_CONFIG start ####

set_data_dir /path/to/create/.db/files

set_shm_key 20

#### DB_CONFIG end ####

set_data_dir specifies the path to create all *.db files. Since DB_CONFIG is put under 'HOME', the path could be relative;

set_shm_key specifies the shared memory key. BerkeleyDB will create a shared memory entry in system shared memory for sharing the same database environment(lock/sync) among several working processes.

Refer to document on sleepycat for more detail.

INSTALL CALLBACKS

BerkeleyDB supports four known types of database - Btree, Hash, Recno and Queue.

A primary database is by default of Recno type. A 'TYPE' key can be specified in database configuration file for the primary database. Currently only support Btree/Hash/Recno. The module will honor this setting. The feature is openned in case a standalone database, which is of type Btree or Hash, is required. The database permits duplicate key by default, DO NOT create any index database upon it unless you know what you are doing.

For index database, it will be ALWAYS a Btree. Duplicate key is okay.

There are three callbacks available: $CB_EXTRACT_SECKEY $CB_DUP $CB_DUPSORT . By asigning the code ref to a callback, one can:

$CB_EXTRACT_SECKEY : customize the way of extracting/making index key for _ALL_ index database

$CB_DUP : customize the way of sorting keys in _ALL_ index database

$CB_DUP_SORT : customize the way of sorting duplicate keys in _ALL_ index database.

Caution: install the callback(s) in BEGIN section of your code. In case one indeed wants to initialize all module-scope variables in run-time of code, he has to postpone the load of module by eval "use BerkeleyDB::SecIndices::Accessor;"; while this way is not recommended.

Note: a standalone Btree/Hash database mentioned above is left untouched by _ALL_ the callbacks.

METHOD

Several subroutines are imported automatically during the load of module. Normal way of invoking _ALL_ subs is pretty simple. As shown above, fetch from the hash reference returned by BerkeleyDB::SecIndices::Accessor::->_stubs.

Each subroutine has a explicitly exported name also.

___dbenv

A special subroutine to return db environment handler. Normally not required.

BerkeleyDB::SecIndices::Accessor::->___dbenv

Note: not covered by _stubs

_student and _student_index

For each database declared in configuration file, module will generate a subroutine to fetch the database handler for invoking other berkeleydb database methods not covered.

Naming rule is '_'. lc(<database_name>)

BerkeleyDB::SecIndices::Accessor::->_student

Note: not covered by _stubs

_stubs

a fundamental subroutine to access all 'userspace' methods offerred. See items below.

BerkeleyDB::SecIndices::Accessor::->_stubs

put_student(LIST)

For each primary database declared as type of Recno in configuration file, module will generate a subroutine to put new HASH records into database. As mentioned above, this will lead to a new index record created in each secondary index database.

BerkeleyDB::SecIndices::Accessor::->put_student->(@entries)

BerkeleyDB::SecIndices::Accessor::->_stubs->{STUDENT}->{PUT}->(@entries)

return the first new key on success, EPUT or ELCK on failure.

PUT method for standalone Btree/Hash database

For each standalone Btree/Hash database declared in configuration file, module will generate a subroutine to put new key/value into database.

BerkeleyDB::SecIndices::Accessor::->put_<lc(dbname)>->($key, $entry)

BerkeleyDB::SecIndices::Accessor::->_stubs->{dbname}->{PUT}->($key, $entry)

return TRUE on success, EPUT or ELCK on failure.

put2_student(\@hash_values, \@new_keys)

For each primary database declared as type of Recno in configuration file, module will generate a subroutine to put new HASH records into database. This will lead to a new index record created in each secondary index database.

BerkeleyDB::SecIndices::Accessor::->_put2_student(\@entries, \@keys)

BerkeleyDB::SecIndices::Accessor::->_stubs->{STUDENT}->{PUT2}->(\@entries, \@keys)

return TRUE on success, EPUT or ELCK on failure. The keys of new created records will be filled in @keys as the same sequence of entries in @entries.

PUT2 method for standalone Btree/Hash database

For each standalone Btree/Hash database declared in configuration file, module will generate a subroutine to put new key/value pairs into database.

BerkeleyDB::SecIndices::Accessor::->put2_<lc(dbname)>->(\%pairs, \@keys)

BerkeleyDB::SecIndices::Accessor::->_stubs->{dbname}->{PUT2}->(\%pairs, \@keys)

return TRUE on success, EPUT or ELCK on failure. The keys of new created records will be filled in @keys. Note: Since using HASH, sequence of keys is not guaranteed.

upd_student($key, $entry)

For each primary database declared in configuration file, module will generate a subroutine to update a HASH record in database. This will also lead to specific key change in some secondary index database.

BerkeleyDB::SecIndices::Accessor::->upd_student->($key, $entry)

BerkeleyDB::SecIndices::Accessor::->_stubs->{STUDENT}->{UPD}->($key, $entry)

return 0 on success, EUPD or ELCK on failure.

get_student($key)

For each primary database declared in configuration file, module will generate a subroutine to get a HASH record in database.

BerkeleyDB::SecIndices::Accessor::->get_student->($key)

BerkeleyDB::SecIndices::Accessor::->_stubs->{STUDENT}->{GET}->($key)

return HASH ref of record on success, EGET EEPT or EGET on failure.

get_students($number, [ $is_reverse, $offset ])

For each primary database declared in configuration file, module will generate a subroutine to get records.

in reverse order if $is_reverse is true; from offset $offset if $offset is set.

return a ref of ARRAY which contains fetched records. The number of items returned is actually depended on real item count in database. The structure of item is { KEY => $key, CONTENT => $entry }

BerkeleyDB::SecIndices::Accessor::->get_students($number)

BerkeleyDB::SecIndices::Accessor::->_stubs->{STUDENT}->{GETS}->($number, 1, 20)

del_students(LIST)

For each primary database declared in configuration file, module will generate a subroutine to delete requested record in database.

BerkeleyDB::SecIndices::Accessor::->del_students(@key_list)

BerkeleyDB::SecIndices::Accessor::->_stubs->{STUDENT}->{DEL}->(@key_list)

return deleted item number on success, ELCK on failure.

__students

For each primary database declared in configuration file, module will generate a subroutine to return current record number in database.

BerkeleyDB::SecIndices::Accessor::->__students()

BerkeleyDB::SecIndices::Accessor::->_stubs->{STUDENT}->{COUNT}->()

get_students_by_class($sec_key, [ $need_return_value, $fetch_only_lastone, $number, $offset ])

For each secondary database declared in configuration file, module will generate a subroutine to query associated primary database by index.

$sec_key : key of secondary index database, normally a string for SCALAR index field;

$need_return_value : return value of primary record if true;

$fetch_only_lastone: return only the last record if true;

$number : specify the number of record to fetch;

$offset : specify the offset to start for $number.

BerkeleyDB::SecIndices::Accessor::->get_students_by_class($sec_key)

BerkeleyDB::SecIndices::Accessor::->_stubs->{STUDENT}->{FIELDS}->{class}->($sec_key, 1)

BerkeleyDB::SecIndices::Accessor::->_stubs->{STUDENT}->{FIELDS}->{class}->($sec_key, 0, 1)

BerkeleyDB::SecIndices::Accessor::->_stubs->{STUDENT}->{FIELDS}->{class}->($sec_key, 1, undef, 20, 3)

return a ref of ARRAY which contains keys of fetched record. The structure of item is { KEY => $key, CONTENT => $entry } if $need_return_value is true.

Yeah, the proto is very ugly... Possibly offer a hash-style proto in future.

cat_student_index_grades([ $need_return_value ])

For each secondary database declared in configuration file, module will generate a subroutine to fetch all current index records.

$need_return_value: return value of primary record if true.

BerkeleyDB::SecIndices::Accessor::->cat_student_grades()

BerkeleyDB::SecIndices::Accessor::->_stubs->{STUDENT_INDEX}->{CAT}->{grade}->(1)

return a ref of ARRAY which contains key/value pairs of fetched record. Recall that the record value in secondary index database is the key of associated record in primary database. The structure of value for each key is { KEY => $key, CONTENT => $primary_entry } if $need_return_value is true.

__student_index_scores

For each secondary database declared in configuration file, module will generate a subroutine to return current record number in database.

BerkeleyDB::SecIndices::Accessor::->__student_scores()

BerkeleyDB::SecIndices::Accessor::->_stubs->{STUDENT_INDEX}->{COUNT}->{score}->()

__student_index_score_dups($sec_key)

For each secondary database declared in configuration file, module will generate a subroutine to return current duplicate record number of requested $sec_key in database.

BerkeleyDB::SecIndices::Accessor::->_student_score_dups($sec_key)

BerkeleyDB::SecIndices::Accessor::->_stubs->{STUDENT_INDEX}->{COUNTDUP}->{score}->($sec_key)

Error Checking

EPUT: error on creating new record(s);

EUPD: error on updating record;

EGET: error on fetching record;

EEPT: no record found or record deleted for requested key;

EGTS: error on fetching records;

EDEL: error on deleting record(s);

ELCK: error on obtaining a database cocurrent lock.

CAUTION: _ALL_ subroutines related to secondary index database will croak in case the index database corrupted.

EXPORT

EGET EPUT EDEL ELCK ... TRUE

Export _ALL_ operation check flags by use BerkeleyDB::SecIndices::Accessor qw(:const)

CAVEAT

Refer to document on Sleepycat regarding database backup/recovery and upgrade.

BUG

_ALL_ error check flags is integer. Once the returned value of subroutine is a reference or string, such code $ret == EGET will get a warning message.

TODO

Traditional transaction mode support.

UPD2 similar to PUT2.

SEE ALSO

BerkeleyDB YAML Storable

DB_File

BerkeleyDB Home

AUTHOR

Dongxu Ma, <dongxu@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2006 by Dongxu Ma

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

For your working copy of BerkeleyDB, normally it is under Sleepycat open source license, refer to http://www.sleepycat.com/company/licensing.html for detail.

2 POD Errors

The following errors were encountered while parsing the POD:

Around line 991:

'=item' outside of any '=over'

Around line 1236:

You forgot a '=back' before '=head2'