NAME
MCDB_File - Perl extension for access to mcdb constant databases
SYNOPSIS
use MCDB_File ();
tie %mcdb, 'MCDB_File', 'file.mcdb' or die "tie failed: $!\n";
$value = $mcdb{$key};
$num_records = scalar $mcdb;
untie %mcdb;
use MCDB_File ();
eval {
my $mcdb_make = new MCDB_File::Make('t.mcdb')
or die "create t.mcdb failed: $!\n";
$mcdb_make->insert('key1', 'value1');
$mcdb_make->insert('key2' => 'value2', 'key3' => 'value3');
$mcdb_make->insert(%t);
$mcdb_make->finish;
} or ($@ ne "" and warn "$@");
use MCDB_File ();
eval { MCDB_File::Make::create $file, %t; }
or ($@ ne "" and warn "$@");
DESCRIPTION
MCDB_File is a module which provides a Perl interface to mcdb. mcdb is originally based on Dan Bernstein's cdb package.
mcdb - fast, reliable, simple code to create, read constant databases
Reading from an mcdb constant database
After the tie
shown above, accesses to %h
will refer to the mcdb file file.mcdb
, as described in "tie" in perlfunc.
keys
, values
, and each
can be used to iterate through records. Note that only one iteration loop can be in progress at any one time. Performing multiple iterations at the same time (i.e. in nested loops) will not have independent iterators and therefore should be avoided. Note that it is safe to use the find('key') method while iterating. See PERFORMANCE section below for sample usage.
Creating an mcdb constant database
An mcdb file is created in three steps. First call new MCDB_File::Make($fname)
, where $fname
is the name of the database file to be created. Secondly, call the insert
method once for each (key, value) pair. Finally, call the finish
method to complete the creation. A temporary file is used during mcdb creation and atomically renamed to $fname
when finish
method is successful.
Alternatively, call the insert()
method with multiple key/value pairs. This can be significantly faster because there is less crossing over the bridge from perl to C code. One simple way to do this is to pass in an entire hash, as in: $mcdb_make->insert(%hash);
.
A simpler interface to mcdb file creation is provided by MCDB_File::Make::create $fname, %t
. This creates an mcdb file named $fname
containing the contents of %t
.
EXAMPLES
These are all complete programs.
1. Use $mcdb->find('key') method to look up a 'key' in an mcdb.
use MCDB_File ();
$mcdb = tie %h, MCDB_File, "$file.mcdb" or die ...;
$value = $mcdb->find('key'); # slightly faster than $value = $h{key};
undef $mcdb;
untie %h;
2. Convert a Berkeley DB (B-tree) database to mcdb format.
use MCDB_File ();
use DB_File;
tie %h, DB_File, $ARGV[0], O_RDONLY, undef, $DB_BTREE
or die "$0: can't tie to $ARGV[0]: $!\n";
MCDB_File::Make::create $ARGV[1], %h; # croak()s if error
3. Convert a flat file to mcdb format. In this example, the flat file consists of one key per line, separated by a colon from the value. Blank lines and lines beginning with # are skipped.
use MCDB_File;
eval {
my $mcdb = new MCDB_File::Make("data.mcdb")
or die "$0: new MCDB_File::Make failed: $!\n";
while (<>) {
next if /^$/ or /^#/;
chomp;
($k, $v) = split /:/, $_, 2;
if (defined $v) {
$mcdb->insert($k, $v);
} else {
warn "bogus line: $_\n";
}
}
$mcdb->finish;
} or ($@ ne "" and die "$@");
4. Perl version of mcdbctl dump.
use MCDB_File ();
tie %data, 'MCDB_File', $ARGV[0]
or die "$0: can't tie to $ARGV[0]: $!\n";
while (($k, $v) = each %data) {
print '+', length $k, ',', length $v, ":$k->$v\n";
}
print "\n";
5. Although an mcdb file is constant, you can simulate updating it in Perl. This is an expensive operation, as you have to create a new database, and copy into it everything that is unchanged from the old database. (As compensation, the update does not affect database readers. The old database is available for them, up until the moment the new one is finish
ed.)
use MCDB_File ();
$file = 'data.cdb';
tie %old, 'MCDB_File', $file
or die "$0: can't tie to $file: $!\n";
$new = new MCDB_File::Make($file)
or die "$0: new MCDB_File::Make failed: $!\n";
eval {
# Add the new values; remember which keys we've seen.
while (<>) {
chomp;
($k, $v) = split;
$new->insert($k, $v);
$seen{$k} = 1;
}
# Add any old values that haven't been replaced.
while (($k, $v) = each %old) {
$new->insert($k, $v) unless $seen{$k};
}
$new->finish;
} or ($@ ne "" and die "$@");
REPEATED KEYS
Most users can ignore this section.
An mcdb file can contain repeated keys. If the insert
method is called more than once with the same key during the creation of an mcdb file, that key will be repeated.
Here's an example.
$mcdb = new MCDB_File::Make("$file.mcdb") or die ...;
$mcdb->insert('cat', 'gato');
$mcdb->insert('cat', 'chat');
$mcdb->finish;
Normally, any attempt to access a key retrieves the first value stored under that key. This code snippet always prints gato.
$catref = tie %catalogue, MCDB_File, "$file.mcdb" or die ...;
print "$catalogue{cat}";
However, all the usual ways of iterating over a hash---keys
, values
, and each
---do the Right Thing, even in the presence of repeated keys. This code snippet prints cat cat gato chat.
print join(' ', keys %catalogue, values %catalogue);
And these two both print cat:gato cat:chat, although the second is more efficient.
foreach $key (keys %catalogue) {
print "$key:$catalogue{$key} ";
}
while (($key, $val) = each %catalogue) {
print "$key:$val ";
}
The multi_get
method retrieves all the values associated with a key. It returns a reference to an array containing all the values. This code prints gato chat.
print "@{$catref->multi_get('cat')}";
multi_get
always returns an array reference. If the key was not found in the database, it will be a reference to an empty array. To test whether the key was found, you must test the array, and not the reference.
$x = $catref->multi_get($key);
warn "$key not found\n" unless $x; # WRONG; message never printed
warn "$key not found\n" unless @$x; # Correct
Any extra references to MCDB_File
object (like $catref
in the examples above) must be released with undef
or must have gone out of scope before calling untie
on the hash. This ensures that the object's DESTROY
method is called. Note that perl -w
will check this for you; see perltie for further details.
use MCDB_File ();
$catref = tie %catalogue, MCDB_File, "$file.mcdb" or die ...;
print "@{$catref->multi_get('cat')}";
undef $catref;
untie %catalogue;
RETURN VALUES
The routines tie
and new
return undef if the attempted operation failed; $!
contains the reason for failure. insert
and finish
call croak
if the attempted operation fails.
DIAGNOSTICS
The following fatal errors may occur. (See "eval" in perlfunc if you want to trap them.)
- Modification of an MCDB_File attempted
-
You attempted to modify a hash tied to a MCDB_File.
- MCDB_File::Make::<insert|finish>:<error string>
-
An OS level problem occurred, such as permission denied writing to filesystem, or you have run out of disk space.
PERFORMANCE
The MCDB_File madvise
method is a thin wrapper around the C library posix_madvise
and MCDB_File provides constants: MADV_NORMAL
MADV_RANDOM
MADV_SEQUENTIAL
MADV_WILLNEED
and MADV_DONTNEED
.
For very large mcdb files on which more than a few queries will be made, it is recommended that madvise
with MCDB_File::MADV_RANDOM
be called once on the object returned by tie
.
my $mcdb = tie %h, MCDB_File, "$file.mcdb" or die ...;
$mcdb->madvise(MCDB_File::MADV_RANDOM);
$value = $mcdb->find('key'); # slightly faster than $value = $h{key};
# ... (lots more queries)
undef $mcdb;
untie %h;
For iterating over very large mcdb files, it is recommended that madvise
with MCDB_File::MADV_SEQUENTIAL
be called once on the object returned by tie
.
Sometimes you need to get the most performance possible out of a library. Rumour has it that perl's tie() interface is slow. In order to get around that you can use MCDB_File in an object oriented fashion, rather than via tie().
my $mcdb = MCDB_File->TIEHASH('/path/to/mcdbfile.mcdb');
if ($mcdb->EXISTS('key')) {
print "Key: 'key'; Value: ", $mcdb->FETCH('key'), "\n";
}
undef $mcdb;
For more information on the methods available on tied hashes see perltie.
Due to the internal Perl reuse of FETCH method to support queries, as well as each() and values(), it will be slightly more efficient to call the $mcdb->find('key') method than to call $mcdb->FETCH('key').
ACKNOWLEDGEMENTS
mcdb is based on cdb, created by Dan Bernstein <djb@koobera.math.uic.edu>. MCDB_File is based on CDB_File, created by Tim Goodwin, <tjg@star.le.ac.uk> and currently maintained by Todd Rinaldo https://github.com/toddr/CDB_File/
AUTHOR
gstrauss <code () gluelogic.com>