NAME

MCDB_File - Perl extension for access to mcdb constant databases

SYNOPSIS

use MCDB_File ();
tie %mcdb, 'MCDB_File', 'file.mcdb' or die "tie failed: $!\n";
$value = $mcdb{$key};
$num_records = scalar $mcdb;
untie %mcdb;

use MCDB_File ();
eval {
  my $mcdb_make = new MCDB_File::Make('t.mcdb')
    or die "create t.mcdb failed: $!\n";
  $mcdb_make->insert('key1', 'value1');
  $mcdb_make->insert('key2' => 'value2', 'key3' => 'value3');
  $mcdb_make->insert(%t);
  $mcdb_make->finish;
} or ($@ ne "" and warn "$@");

use MCDB_File ();
eval { MCDB_File::Make::create $file, %t; }
  or ($@ ne "" and warn "$@");

DESCRIPTION

MCDB_File is a module which provides a Perl interface to mcdb. mcdb is originally based on Dan Bernstein's cdb package.

mcdb - fast, reliable, simple code to create, read constant databases

Reading from an mcdb constant database

After the tie shown above, accesses to %h will refer to the mcdb file file.mcdb, as described in "tie" in perlfunc.

keys, values, and each can be used to iterate through records. Note that only one iteration loop can be in progress at any one time. Performing multiple iterations at the same time (i.e. in nested loops) will not have independent iterators and therefore should be avoided. Note that it is safe to use the find('key') method while iterating. See PERFORMANCE section below for sample usage.

Creating an mcdb constant database

An mcdb file is created in three steps. First call new MCDB_File::Make($fname), where $fname is the name of the database file to be created. Secondly, call the insert method once for each (key, value) pair. Finally, call the finish method to complete the creation. A temporary file is used during mcdb creation and atomically renamed to $fname when finish method is successful.

Alternatively, call the insert() method with multiple key/value pairs. This can be significantly faster because there is less crossing over the bridge from perl to C code. One simple way to do this is to pass in an entire hash, as in: $mcdb_make->insert(%hash);.

A simpler interface to mcdb file creation is provided by MCDB_File::Make::create $fname, %t. This creates an mcdb file named $fname containing the contents of %t.

EXAMPLES

These are all complete programs.

1. Use $mcdb->find('key') method to look up a 'key' in an mcdb.

use MCDB_File ();
$mcdb = tie %h, MCDB_File, "$file.mcdb" or die ...;
$value = $mcdb->find('key'); # slightly faster than $value = $h{key};
undef $mcdb;
untie %h;

2. Convert a Berkeley DB (B-tree) database to mcdb format.

use MCDB_File ();
use DB_File;

tie %h, DB_File, $ARGV[0], O_RDONLY, undef, $DB_BTREE
  or die "$0: can't tie to $ARGV[0]: $!\n";

MCDB_File::Make::create $ARGV[1], %h;  # croak()s if error

3. Convert a flat file to mcdb format. In this example, the flat file consists of one key per line, separated by a colon from the value. Blank lines and lines beginning with # are skipped.

use MCDB_File;

eval {
    my $mcdb = new MCDB_File::Make("data.mcdb")
      or die "$0: new MCDB_File::Make failed: $!\n";
    while (<>) {
        next if /^$/ or /^#/;
        chomp;
        ($k, $v) = split /:/, $_, 2;
        if (defined $v) {
            $mcdb->insert($k, $v);
        } else {
            warn "bogus line: $_\n";
        }
    }
    $mcdb->finish;
} or ($@ ne "" and die "$@");

4. Perl version of mcdbctl dump.

use MCDB_File ();

tie %data, 'MCDB_File', $ARGV[0]
  or die "$0: can't tie to $ARGV[0]: $!\n";
while (($k, $v) = each %data) {
    print '+', length $k, ',', length $v, ":$k->$v\n";
}
print "\n";

5. Although an mcdb file is constant, you can simulate updating it in Perl. This is an expensive operation, as you have to create a new database, and copy into it everything that is unchanged from the old database. (As compensation, the update does not affect database readers. The old database is available for them, up until the moment the new one is finished.)

use MCDB_File ();

$file = 'data.cdb';
tie %old, 'MCDB_File', $file
  or die "$0: can't tie to $file: $!\n";
$new = new MCDB_File::Make($file)
  or die "$0: new MCDB_File::Make failed: $!\n";

eval {
    # Add the new values; remember which keys we've seen.
    while (<>) {
        chomp;
        ($k, $v) = split;
        $new->insert($k, $v);
        $seen{$k} = 1;
    }

    # Add any old values that haven't been replaced.
    while (($k, $v) = each %old) {
        $new->insert($k, $v) unless $seen{$k};
    }

    $new->finish;
} or ($@ ne "" and die "$@");

REPEATED KEYS

Most users can ignore this section.

An mcdb file can contain repeated keys. If the insert method is called more than once with the same key during the creation of an mcdb file, that key will be repeated.

Here's an example.

$mcdb = new MCDB_File::Make("$file.mcdb") or die ...;
$mcdb->insert('cat', 'gato');
$mcdb->insert('cat', 'chat');
$mcdb->finish;

Normally, any attempt to access a key retrieves the first value stored under that key. This code snippet always prints gato.

$catref = tie %catalogue, MCDB_File, "$file.mcdb" or die ...;
print "$catalogue{cat}";

However, all the usual ways of iterating over a hash---keys, values, and each---do the Right Thing, even in the presence of repeated keys. This code snippet prints cat cat gato chat.

print join(' ', keys %catalogue, values %catalogue);

And these two both print cat:gato cat:chat, although the second is more efficient.

foreach $key (keys %catalogue) {
        print "$key:$catalogue{$key} ";
} 

while (($key, $val) = each %catalogue) {
        print "$key:$val ";
}

The multi_get method retrieves all the values associated with a key. It returns a reference to an array containing all the values. This code prints gato chat.

print "@{$catref->multi_get('cat')}";

multi_get always returns an array reference. If the key was not found in the database, it will be a reference to an empty array. To test whether the key was found, you must test the array, and not the reference.

$x = $catref->multi_get($key);
warn "$key not found\n" unless $x; # WRONG; message never printed
warn "$key not found\n" unless @$x; # Correct

Any extra references to MCDB_File object (like $catref in the examples above) must be released with undef or must have gone out of scope before calling untie on the hash. This ensures that the object's DESTROY method is called. Note that perl -w will check this for you; see perltie for further details.

use MCDB_File ();
$catref = tie %catalogue, MCDB_File, "$file.mcdb" or die ...;
print "@{$catref->multi_get('cat')}";
undef $catref;
untie %catalogue;

RETURN VALUES

The routines tie and new return undef if the attempted operation failed; $! contains the reason for failure. insert and finish call croak if the attempted operation fails.

DIAGNOSTICS

The following fatal errors may occur. (See "eval" in perlfunc if you want to trap them.)

Modification of an MCDB_File attempted

You attempted to modify a hash tied to a MCDB_File.

MCDB_File::Make::<insert|finish>:<error string>

An OS level problem occurred, such as permission denied writing to filesystem, or you have run out of disk space.

PERFORMANCE

The MCDB_File madvise method is a thin wrapper around the C library posix_madvise and MCDB_File provides constants: MADV_NORMAL MADV_RANDOM MADV_SEQUENTIAL MADV_WILLNEED and MADV_DONTNEED.

For very large mcdb files on which more than a few queries will be made, it is recommended that madvise with MCDB_File::MADV_RANDOM be called once on the object returned by tie.

my $mcdb = tie %h, MCDB_File, "$file.mcdb" or die ...;
$mcdb->madvise(MCDB_File::MADV_RANDOM);
$value = $mcdb->find('key'); # slightly faster than $value = $h{key};
# ... (lots more queries)
undef $mcdb;
untie %h;

For iterating over very large mcdb files, it is recommended that madvise with MCDB_File::MADV_SEQUENTIAL be called once on the object returned by tie.

Sometimes you need to get the most performance possible out of a library. Rumour has it that perl's tie() interface is slow. In order to get around that you can use MCDB_File in an object oriented fashion, rather than via tie().

my $mcdb = MCDB_File->TIEHASH('/path/to/mcdbfile.mcdb');
if ($mcdb->EXISTS('key')) {
    print "Key: 'key'; Value: ", $mcdb->FETCH('key'), "\n";
}
undef $mcdb;

For more information on the methods available on tied hashes see perltie.

Due to the internal Perl reuse of FETCH method to support queries, as well as each() and values(), it will be slightly more efficient to call the $mcdb->find('key') method than to call $mcdb->FETCH('key').

ACKNOWLEDGEMENTS

mcdb is based on cdb, created by Dan Bernstein <djb@koobera.math.uic.edu>. MCDB_File is based on CDB_File, created by Tim Goodwin, <tjg@star.le.ac.uk> and currently maintained by Todd Rinaldo https://github.com/toddr/CDB_File/

AUTHOR

gstrauss <code () gluelogic.com>