NAME

LMDB_File - Tie to LMDB (OpenLDAP's Lightning Memory-Mapped Database)

SYNOPSIS

# Simple TIE interface, when you're in a rush
use LMDB_File;

$db = tie %hash, 'LMDB_File', $path;

$hash{$key} = $value;
$value = $hash{$key};
each %hash;
keys %hash;
values %hash;
...


# The full power
use LMDB_File qw(:flags :cursor_op);

$env = LMDB::Env->new($path, {
    mapsize => 100 * 1024 * 1024 * 1024, # Plenty space, don't worry
    maxdbs => 20, # Some databases
    mode   => 0600,
    # More options
});

$txn = $env->BeginTxn(); # Open a new transaction

$DB = $txn->OpenDB( {    # Create a new database
    dbname => $dbname,
    flags => MDB_CREATE
});

$DB->put($key, $value);  # Simple put
$value = $DB->get($key); # Simple get

$DB->put($key, $value, MDB_NOOVERWITE); # Don't replace existing value

# Work with cursors
$cursor => $DB->Cursor;

$cursor->get($key, $value, MDB_FIRST); # First key/value in DB
$cursor->get($key, $value, MDB_NEXT);  # Next key/value in DB
$cursor->get($key, $value, MDB_LAST);  # Last key/value in DB
$cursor->get($key, $value, MDB_PREV);  # Previous key/value in DB

$DB->set_compare( sub { lc($a) cmp lc($b) } ); # Use my own key comparison function

DESCRIPTION

NOTE: This document is still under construction. Expect it to be incomplete in places.

LMDB_File is a Perl module which allows Perl programs to make use of the facilities provided by OpenLDAP's Lightning Memory-Mapped Database "LMDB".

LMDB is a Btree-based database management library modeled loosely on the BerkeleyDB API, but much simplified and extremely fast.

It is assumed that you have a copy of LMBD's documentation at hand when reading this documentation. The interface defined here mirrors the C interface closely but with an OO approach.

This is implemented with a number of Perl classes.

A LMDB's environment handler (MDB_env* in C) will be wrapped in the LMDB::Env class.

A LMDB's transaction handler (MDB_txn* in C) will be wrapped in the LMDB::Txn class.

A LMDB's cursor handler (MDB_cursor* in C) will be wrapped in the LMDB::Cursor class.

A LMDB's DataBase handler (MDB_dbi in C) will be wrapped in an opaque SCALAR, but because in LMDB all DataBase operations needs both a Transaction and a DataBase handler, LMDB_File will use a LMDB_File object that encapsulates both.

Error reporting

In the C API, most functions return 0 on success and an error code on failure.

In this module, when a function fails, the package variable $die_on_err controls the course of action. When $die_on_err is set to TRUE, this causes LMDB_File to die with an error message that can be trapped by an eval { ... } block.

When FALSE, the function will return the error code, in this case you should check the return value of any function call.

By default $die_on_err is TRUE.

Regardless of the value of $die_on_err, the code of the last error can be found in the package variable $last_err.

LMDB::Env

This class wraps an opened LMDB environment.

At construction time, the environment is created, if it does not exist, and opened.

When you are finished using it, in the C API you must call the mdb_env_close function to close it and free the memory allocated, but in Perl you simply will let that the object get out of scope.

Constructor

$Env = LMDB::Env->new ( $path [, ENVOPTIONS ] )

Creates a new LMDB::Env object and returns it. It encapsulates both LMDB's mdb_env_create and mdb_env_open functions.

$path is the directory in which the database files reside. This directory must already exist and should be writable.

ENVOPTIONS, if provided, must be a HASH Reference with any of the following options:

mapsize => INT

The size of the memory map to use for this environment.

The size of the memory map is also the maximum size of the database. The value should be chosen as large as possible, to accommodate future growth of the database. The size should be a multiple of the OS page size.

The default is 1048576 bytes (1 MB).

maxreaders => INT

The maximum number of threads/reader slots for the environment.

This defines the number of slots in the lock table that is used to track readers in the environment.

The default is 126.

maxdbs => INT

The maximum number of named databases for the environment.

This option is only needed if multiple databases will be used in the environment. Simpler applications that use the environment as a single unnamed database can ignore this option.

The default is 0, i.e. no named databases allowed.

mode => INT

The UNIX permissions to set on created files. This parameter is ignored on Windows. It defaults to 0600

flags => ENVFLAGS

Set special options for this environment. This option, if provided, can be specified by OR'ing the following flags:

MDB_FIXEDMAP

Use a fixed address for the mmap region. This flag must be specified when creating the environment, and is stored persistently in the environment. If successful, the memory map will always reside at the same virtual address and pointers used to reference data items in the database will be constant across multiple invocations. This option may not always work, depending on how the operating system has allocated memory to shared libraries and other uses. The feature is highly experimental.

MDB_NOSUBDIR

By default, LMDB creates its environment in a directory whose pathname is given in $path, and creates its data and lock files under that directory. With this option, $path is used as-is for the database main data file. The database lock file is the $path with "-lock" appended.

MDB_RDONLY

Open the environment in read-only mode. No write operations will be allowed. LMDB will still modify the lock file - except on read-only filesystems, where LMDB does not use locks.

MDB_WRITEMAP

Use a writeable memory map unless MDB_RDONLY is set. This is faster and uses fewer mallocs, but loses protection from application bugs like wild pointer writes and other bad updates into the database.

Incompatible with nested transactions (also known as sub transactions).

MDB_NOMETASYNC

Flush system buffers to disk only once per transaction, omit the metadata flush. Defer that until the system flushes files to disk, or next non-MDB_RDONLY commit or $Env->sync(). This optimization maintains database integrity, but a system crash may undo the last committed transaction. I.e. it preserves the ACI (atomicity, consistency, isolation) but not D (durability) database property.

This flag may be changed at any time using $Env->set_flags().

MDB_NOSYNC

Don't flush system buffers to disk when committing a transaction. This optimization means a system crash can corrupt the database or lose the last transactions if buffers are not yet flushed to disk. The risk is governed by how often the system flushes dirty buffers to disk and how often $Env->sync() is called. However, if the filesystem preserves write order and the MDB_WRITEMAP flag is not used, transactions exhibit ACI (atomicity, consistency, isolation) properties and only lose D (durability). I.e. database integrity is maintained, but a system crash may undo the final transactions. Note that MDB_NOSYNC | MDB_WRITEMAP leaves the system with no hint for when to write transactions to disk, unless $Env->sync() is called. MDB_MAPASYNC | MDB_WRITEMAP) may be preferable.

This flag may be changed at any time using $Env->set_flags().

MDB_MAPASYNC

When using MDB_WRITEMAP, use asynchronous flushes to disk. As with MDB_NOSYNC, a system crash can then corrupt the database or lose the last transactions. Calling $Env->sync() ensures on-disk database integrity until next commit.

This flag may be changed at any time using $Env->set_flags().

MDB_NOTLS

Don't use Thread-Local Storage. Tie reader locktable slots to "LMDB::Txn" objects instead of to threads. I.e. $Txn->reset() keeps the slot reserved for the "LMDB::Txn" object. A thread may use parallel read-only transactions. A read-only transaction may span threads if the user synchronizes its use. Applications that multiplex many user threads over individual OS threads need this option. Such an application must also serialize the write transactions in an OS thread, since LMDB's write locking is unaware of the user threads.

Class methods

$Env->copy ( $path )

Copy an LMDB environment to the specified $path

$Env->copyfd ( HANDLE )

Copy an LMDB environment to the specified HANDLE.

$status = $Env->stat

Returns a HASH reference with statistics for the main, unnamed, database in the environment, the HASH contains the following keys:

psize Size of a database page.
depth Depth (height) of the B-Tree
branch_pages Number of internal (non-leaf) pages
overflow_pages Number of overflow pages
entries Number of data items
$info = $Env->info

Returns a HASH reference with information about the environment, $info, with the following keys:

mapaddr Address of map, if fixed
mapsize Size of the data memory map
last_pgno ID of the last used page
last_txnid ID of the last committed transaction
maxreaders Max reader slots in the environment
numreaders Max reader slot used in the environment
$Env->sync ( BOOL )

Flush the data buffers to disk.

Data is always written to disk when $Txn->commit() is called, but the operating system may keep it buffered. LMDB always flushes the OS buffers upon commit as well, unless the environment was opened with MDB_NOSYNC or in part MDB_NOMETASYNC.

If BOOL is TRUE force a synchronous flush. Otherwise if the environment has the MDB_NOSYNC flag set the flushes will be omitted, and with MDB_MAPASYNC they will be asynchronous.

$Env->set_flags ( BITMASK, BOOL )

As noted above, some environment flags can be changed at any time.

BITMASK is the flags to change, bitwise OR'ed together. BOOL TRUE set the flags, FALSE clears them.

$Env->get_flags ( $flags )

Returns in $flags the environment flags.

$Env->get_path ( $path )

Returns in $path the path that was used in LMDB::Env->new(...)

$Env->get_maxreaders ( $readers )

Returns in $readers the maximum number of threads/reader slots for the environment

$mks = $Env->get_maxkeysize

Returns the maximum size of a key for the environment.

$Txn = $Env->BeginTxn ( [ $tflags ] )

Returns a new Transaction. A simple wrapper over the constructor of "LMDB::Txn".

If provided, $tflags will be passed to the constructor, if not provided, this wrapper will propagate the environment's flag MDB_RDONLY, if set, to the transaction constructor.

LMDB::Txn

In LMDB every operation (read or write) on a DataBase needs to be inside a transaction. This class wraps an LMDB transaction.

By default you must terminate the transaction by either the abort or commit methods. After a transaction is terminated, you should not call any other method on it, except env. If you let an object of this class get out of scope, by default the transaction will be aborted.

Constructor

$Txn = LMDB::Txn->new ( $Env [, $tflags ] )

Create a new transaction for use in the environment.

Class methods

$Txn->abort

Abort the transaction, terminating the transaction.

$Txn->commit

Commit the transaction, terminating the transaction.

$Txn->reset

Reset the transaction.

TO BE DOCUMENTED

$Txn->renew

Renew the transaction.

TO BE DOCUMENTED

$Env = $Txn->env

Returns the environment (an LMDB::Env object) that created the transaction, if it is still alive, or undef if called on a terminated transaction.

$SubTxn = $Txn->SubTxn ( [ $tflags ] )

Creates and returns a sub transaction (also known as a nested transaction).

Nested transactions are useful for combining components that create and commit transactions. No modifications are permanently stored until the highest level "parent" transaction is committed. Nested transactions can be aborted without aborting the parent transaction and only the changes made in the nested transaction will be rolled-back.

Aborting the parent transaction will abort and terminate all outstanding nested transactions. Committing the parent transaction will similarly commit and terminate all outstanding nested transactions.

Unlike some other databases, in LMDB changes made inside nested transactions are not visible to the parent transaction until the nested transaction is committed. In other words, transactions are always isolated, even when they are nested.

$Txn->AutoCommit ( [ BOOL ] )

When BOOL is provided, it sets the behavior of the transaction when going out of scope: BOOL TRUE makes arrangements for the transaction to be auto committed and BOOL FALSE returns to the default behavior: to be aborted. If you don't provide BOOL, you are only interested in knowing the current value of this option, which is returned in every case.

$DB = $Txn->OpenDB ( [ DBOPTIONS ] )
$DB = $Txn->OpenDB ( [ $dbname [, DBFLAGS ]] )

This method opens a DataBase in the environment. This is only syntactic sugar for LMDB_File->open(...).

DBOPTIONS, if provided, should be a HASH reference with any of the following keys:

dbname => $dbname
flags => DBFLAGS

You can also call this method using its values, $dbname and DBFLAGS, documented ahead.

LMDB_File

Constructor

$DB = LMDB_File->open ( $Txn [, $dbname [, DBFLAGS ] ] )

If provided $dbname, will be the name of a named Data Base in the environment, if not provided (or if $dbname is undef), the opened Data Base will be the unnamed (the default) one.

DBFLAGS, if provided, will set special options for this Data Base and can be specified by OR'ing the following flags:

MDB_REVERSEKEY

Keys are strings to be compared in reverse order

MDB_DUPSORT

Duplicate keys may be used in the database. (Or, from another perspective, keys may have multiple data items, stored in sorted order.) By default keys must be unique and may have only a single data item.

MDB_INTEGERKEY

Keys are binary integers in native byte order.

MDB_DUPFIXED

This flag may only be used in combination with #MDB_DUPSORT. This option tells the library that the data items for this database are all the same size, which allows further optimizations in storage and retrieval. When all data items are the same size, the #MDB_GET_MULTIPLE and #MDB_NEXT_MULTIPLE cursor operations may be used to retrieve multiple items at once.

MDB_INTEGERDUP

This option specifies that duplicate data items are also integers, and should be sorted as such.

MDB_REVERSEDUP

This option specifies that duplicate data items should be compared as strings in reverse order.

MDB_CREATE

Create the named database if it doesn't exist. This option is not allowed in a read-only transaction or a read-only environment.

Class methods

$DB->put ( $key, $data [, WRITEFLAGS ] )

Store items into a database.

This function stores key/data pairs in the database. The default behavior is to enter the new key/data pair, replacing any previously existing key if duplicates are disallowed, or adding a duplicate data item if duplicates are allowed

$key is the key to store in the database and $data the data to store.

WRITEFLAGS, if provided, will set special options for this operation and can be one following flags:

MDB_NODUPDATA

Enter the new key/data pair only if it does not already appear in the database. This flag may only be specified if the database was opened with #MDB_DUPSORT. The function will fail with MDB_KEYEXIST if the key/data pair already appears in the database.

MDB_NOOVERWRITE

Enter the new key/data pair only if the key does not already appear in the database.

The function will return MDB_KEYEXIST if the key already appears in the database, even if the database supports duplicates (#MDB_DUPSORT). The $data parameter will be set to point to the existing item.

MDB_RESERVE

NOTE: This isn't yet usable from Perl, stay tunned.

Reserve space for data of the given size, but don't copy the given data. Instead, return a pointer to the reserved space, which the caller can fill in later, but before the next update operation or the transaction ends. This saves an extra memcpy if the data is being generated later.

MDB_APPEND

Append the given key/data pair to the end of the database.

No key comparisons are performed. This option allows fast bulk loading when keys are already known to be in the correct order.

NOTE: Loading unsorted keys with this flag will cause data corruption.

MDB_APPENDDUP

As above, but for sorted duplicated data.

$DB->get ( $key, $data )
$data = $DB->get ( $key )

Get items from a database.

This method retrieves key/data pairs from the database.

If the database supports duplicate keys (#MDB_DUPSORT) then the first data item for the key will be returned. Retrieval of other items requires the use of the LMBD::Cursor->get() method.

The two-argument form, closer to the C API, returns in the provided argument $data the value associated with $key in the database if it exists or reports an error if not.

In the simpler, more "perlish" one-argument form, the method returns the value associated with $key in the database or undef if no such value exists.

This form is implemented by locally setting $die_on_err to FALSE.

$DB->ReadMode ( MODE )

This method allows you to modify the behavior of "get" (read) operations on the database.

The C documentation for the mdb_get function states that:

The memory pointed to by the returned values is owned by the
database. The caller need not dispose of the memory, and may not
modify it in any way. For values returned in a read-only transaction
any modification attempts will cause a SIGSEGV.

So this module implements two modes of operation for its "get" methods and you can select between them with this method.

When MODE is 0 (or any FALSE value) a default "safe" mode is used in which the data value found in the database is copied to the scalar returned, so you can do anything you want to that scalar without side effects.

But when MODE is 1 (or, in the current implementation, any TRUE value) a sort of hack is used to avoid the memory copy and the scalar returned will hold only a pointer to the data value found. This is much faster and uses less memory, especially when used with large values, but there are a few caveats: In a read-only transaction the value is valid only until the end of the transaction, and in a read-write transaction the value is valid only until the next write operation (because any write operation can potentially modify the in-memory btree).

NOTE: In order to achieve the zero-copy behavior desired by setting ReadMode to TRUE, you must use the two-argument form of get ($DB->get ( $key, $data )) or use the cursor get method described below.

$DB->del ( $key [, $data ] )

Delete items from a database.

This function removes key/data pairs from the database.

If the database does not support sorted duplicate data items, (MDB_DUPSORT) the $data parameter is optional and is ignored.

If the database supports sorted duplicates and the $data parameter is undef or not provided, all of the duplicate data items for the $key will be deleted. Otherwise, if the $data parameter is provided only the matching data item will be deleted.

$DB->set_compare ( CODE )

Set a custom key comparison function referenced by CODE for a database.

CODE should be a subroutine reference or an anonymous subroutine, that like Perl's "sort" in perlfunc, will receive the values to compare in the global variables $a and $b.

The comparison function is called whenever it is necessary to compare a key specified by the application with a key currently stored in the database. If no comparison function is specified, and no special key flags were specified in LMDB_File->open(), the keys are compared lexically, with shorter keys collating before longer keys.

Warning: This function must be called before any data access functions are used, otherwise data corruption may occur. The same comparison function must be used by every program accessing the database, every time the database is used.

$DB->Alive

Retunrs a TRUE value if the transaction in which this database was opened is still alive, i.e. not commited nor aborted yet, and FALSE otherwise.

$Cursor = $DB->Cursor

Creates a new LMDB::Cursor object to work in the database, see "LMDB::Cursor"

$txn = $DB->Txn

Returns the transaction that opened this database

$flags = $DB->flags

Retrieve the DB flags for this database.

$status = $DB->stat

Returns a HASH reference with statistics for the database, the hash will contain the following keys:

psize Size of a database page.
depth Depth (height) of the B-Tree
branch_pages Number of internal (non-leaf) pages
overflow_pages Number of overflow pages
entries Number of data items

LMDB::Cursor

To construct a cursor you should call the Cursor method of the LMDB_File class:

$cursor = $DB->Cursor

Class methods

$cursor->get($key, $data, CURSOR_OP)

This function retrieves key/data pairs from the database.

The variables $key and $data are used to return the values found.

CURSOR_OP determines the key/data to be retrieved and must be one of the following:

MDB_FIRST

Position at first key/data item.

MDB_FIRST_DUP

Position at first data item of current key. Only for MDB_DUPSORT

MDB_GET_BOTH

Position at key/data pair. Only for MDB_DUPSORT

MDB_GET_BOTH_RANGE

Position at key, nearest data. Only for MDB_DUPSORT

MDB_GET_CURRENT

Return key/data at current cursor position.

MDB_GET_MULTIPLE

Return all the duplicate data items at the current cursor position. Only for MDB_DUPFIXED

MDB_LAST

Position at last key/data item.

MDB_LAST_DUP

Position at last data item of current key. Only for MDB_DUPSORT

MDB_NEXT

Position at next data item.

MDB_NEXT_DUP

Position at next data item of current key. Only for MDB_DUPSORT

MDB_NEXT_MULTIPLE

Return all duplicate data items at the next cursor position. Only for MDB_DUPFIXED

MDB_NEXT_NODUP

Position at first data item of next key.

MDB_PREV

Position at previous data item.

MDB_PREV_DUP

Position at previous data item of current key. Only for MDB_DUPSORT

MDB_PREV_NODUP

Position at last data item of previous key.

MDB_SET

Position at specified key.

MDB_SET_KEY

Position at specified key, return key + data.

MDB_SET_RANGE

Position at first key greater than or equal to specified key.

$cursor->put($key, $data, WRITEFLAGS)

This function stores key/data pairs into the database. If the function fails for any reason, the state of the cursor will be unchanged. If the function succeeds and an item is inserted into the database, the cursor is always positioned to refer to the newly inserted item.

Exportable constants

At use time you can import into your namespace the following constants, grouped by their tags.

Environment flags :envflags

MDB_FIXEDMAP MDB_NOSUBDIR MDB_NOSYNC MDB_RDONLY MDB_NOMETASYNC
MDB_WRITEMAP MDB_MAPASYNC MDB_NOTLS

Data base flags :dbflags

MDB_REVERSEKEY MDB_DUPSORT MDB_INTEGERKEY MDB_DUPFIXED
MDB_INTEGERDUP MDB_REVERSEDUP MDB_CREATE

Write flags :writeflags

MDB_NOOVERWRITE MDB_NODUPDATA MDB_CURRENT MDB_RESERVE
MDB_APPEND MDB_APPENDDUP MDB_MULTIPLE

All flags :flags

All of :envflags, :dbflags and :writeflags

Cursor operations :cursor_op

MDB_FIRST MDB_FIRST_DUP MDB_GET_BOTH MDB_GET_BOTH_RANGE
MDB_GET_CURRENT MDB_GET_MULTIPLE MDB_NEXT MDB_NEXT_DUP MDB_NEXT_MULTIPLE
MDB_NEXT_NODUP MDB_PREV MDB_PREV_DUP MDB_PREV_NODUP MDB_LAST MDB_LAST_DUP
MDB_SET MDB_SET_KEY MDB_SET_RANGE

Error codes :error

MDB_SUCCESS MDB_KEYEXIST MDB_NOTFOUND MDB_PAGE_NOTFOUND MDB_CORRUPTED
MDB_PANIC MDB_VERSION_MISMATCH MDB_INVALID MDB_MAP_FULL MDB_DBS_FULL
MDB_READERS_FULL MDB_TLS_FULL MDB_TXN_FULL MDB_CURSOR_FULL MDB_PAGE_FULL
MDB_MAP_RESIZED MDB_INCOMPATIBLE MDB_BAD_RSLOT MDB_LAST_ERRCODE

Version information :version

MDB_VERSION_FULL MDB_VERSION_MAJOR MDB_VERSION_MINOR
MDB_VERSION_PATCH MDB_VERSION_STRING MDB_VERSION_DATE

TIE Interface

The simplest interface to LMDB is using "tie" in perlfunc.

The TIE interface of LMDB_File can take several forms that depend on the data at hand.

tie %hash, 'LMDB_File', $path [, $options ]

The most common form.

tie %hash, 'LMDB_File', $path, $flags, $mode

For compatibility with other DBM modules.

tie %hash, 'LMDB_File', $Txn [, DBOPTIONS ]

When you have a Transaction object $Txn at hand.

tie %hash, 'LMDB_File', $Env [, DBOPTIONS ]

When you have an Environment object $Env at hand.

tie %hash, $DB

When you have an opened database.

The first two forms will create and/or open the Environment at $path, create a new Transaction and open a database in the Transaction.

If provided, $options must be a HASH reference with options for both the Environment and the database.

Valid keys for $option are any described above for ENVOPTIONS and DBOPTIONS.

In the case that you have already created a transaction or an environment, you can provide a HASH reference in DBOPTIONS for options exclusively for the database.

AUTHOR

Salvador Ortiz Garcia, <sortiz@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2013 by Salvador Ortiz Garcia Copyright (C) 2013 by Matías Software Group, S.A. de C.V.

This library is free software; you can redistribute it and/or modify it under the terms of the Artistic License version 2.0, see LICENSE.