Take me over?
NAME
Alzabo::ObjectCache - A simple in-memory cache for row objects.
SYNOPSIS
use Alzabo::ObjectCache
( store => 'Alzabo::ObjectCache::Store::Memory',
sync => 'Alzabo::ObjectCache::Sync::BerkeleyDB',
sync_dbm_file => 'somefile.db' );
DESCRIPTION
This class exists primarily to delegate necessary caching operations to other objects.
It always contains two objects. One is responsible for storing the objects to be cached. This can be done in any way that the storing object sees fit.
The syncing object is responsible for making sure that objects in multiple processes stay in sync with each other, as well as within a single process. For example, if an object in process 1 is deleted and then process 2 attempts to retrieve the same object from the database, process 2 needs to be told (in this case via an exception) that this object is no longer available. Similarly if process 1 updates the database then if there is a cached object in process 2, it needs to know that it should fetch its data again.
IMPORT
This module is configured entirely through the parameters passed when it is imported.
Parameters
store => 'Alzabo::ObjectCache::Store::Foo'
This should be the name of a class that implements the Alzabo::ObjectCache object storing interface.
The default is
Alzabo::ObjectCache::Store::Memory
.sync => 'Alzabo::ObjectCache::Sync::Foo'
This should be the name of a class that implements the Alzabo::ObjectCache object syncing interface.
Default is
Alzabo::ObjectCache::Sync::Null
.lru_size => $size
This is the maximum number of objects you want the storing class to store at once. If it is 0 or undefined, the default, the storage class will store an unlimited number of objects.
All parameters given will be also be passed through to the import method of the storing and syncing class being used.
LRU STORAGE
Any storage module can be turned into an LRU cache by passing an lru_size parameter to this module when using it.
For example:
use Alzabo::ObjectCache
( store => 'Alzabo::ObjectCache::Store::Memory',
lru_size => 100,
sync => 'Alzabo::ObjectCache::Sync::BerkeleyDB',
sync_dbm_file => 'somefile.db' );
CACHING SCENARIOS
The easiest way to understand how the Alzabo caching system works is to outline different scenarios and show the results based on different caching configurations.
Scenario 1 - Single process - delete followed by select/update
In a single process, the following sequence occurs:
- A row object is retrieved.
- The row object's delete
method is called, removing the data it represents from the database.
- The program attempts to call the row object's select
or update
method.
Results
No caching
An
Alzabo::Exception::NoSuchRow
exception is thrown.Any syncing module
An
Alzabo::Exception::Cache::Deleted
exception is thrown.
Scenario 2 - Multiple processes - delete followed by select
Assume two process, ids 1 and 2.
- Process 1 retrieves a row object.
- Process 2 retrieves a row object for the same database row.
- Process 1 calls that object's delete
method.
- Process 2 calls that object's select
method.
Results
No caching
An
Alzabo::Exception::NoSuchRow
exception is thrown.Alzabo::ObjectCache::Sync::Null module is in use
If the column(s) have been previously retrieved in process 2, then that data will be returned. Otherwise, an
Alzabo::Exception::NoSuchRow
exception is thrown.Any other syncing module is in use
An
Alzabo::Exception::Cache::Deleted
exception is thrown.
Scenario 3 - Multiple processes - delete followed by update
Assume two process, ids 1 and 2.
- Process 1 retrieves a row object.
- Process 2 retrieves a row object for the same database row.
- Process 1 calls that object's delete
method.
- Process 2 calls that object's update
method.
Results
No caching
An
Alzabo::Exception::NoSuchRow
exception is thrown.Alzabo::ObjectCache::Sync::Null module is in use
The object will attempt to update the database. This is a potential disaster if, in the meantime, another row with the same primary key has been inserted.
Any other syncing module is in use
An
Alzabo::Exception::Cache::Deleted
exception is thrown.
Scenario 4 - Multiple processes - update followed by update
Assume two process, ids 1 and 2.
- Process 1 retrieves a row object.
- Process 2 retrieves a row object for the same database row.
- Process 1 calls that object's update
method.
- Process 2 calls that object's update
method.
- Process 1 calls that object's select
method.
Results
No caching
The data from process 2's update is returned.
Alzabo::ObjectCache::Sync::Null module is in use
The data from process 1's update is returned.
Any other syncing module is in use
An
Alzabo::Exception::Cache::Expired
exception is thrown when process 2 attempts to update the row. If process 2 were to then attempt the update again it would succeed (as the object is updated before the exception is thrown).
Scenario 5 - Multiple processes - delete followed by insert (same primary key)
Assume two process, ids 1 and 2.
- Process 1 retrieves a row object.
- The row is deleted. In this case, it does not matter whether this happens through Alzabo or not.
- Process 2 inserts a new row, with the same primary key.
- Process 1 or 2 calls that object's select
method.
Results
All cases.
The correct data (from process 2's insert) is returned. This is a bit odd if process 1 called the object's
delete
method, but in that case it shouldn't be reusing the same object anyway.
This example may seem a bit far-fetched but is actually quite likely when using MySQL's auto_increment
feature with older versions of MySQL, where numbers could be re-used.
Summary
The most important thing to take from this is that you should never use the Alzabo::ObjectCache::Sync::Null
class in a multi-process situation. It is really only safe if you are sure your code will only be running in a single process at a time.
In all other cases, either use no caching or use one of the other syncing classes to ensure that data really is synced across multiple processes.
RACE CONDITIONS
It is important to note that there are small race conditions in the syncing scheme. When data is requested from a row object, the row object first makes sure that it is up to date with the database. If it is not, it refreshes itself. Then, it returns the requested data (whether or or not it had to refresh). It is possible that in the time between checking whether or not it is expired that an update could occur. This would not be seen by the row object.
I don't consider this a bug since it is impossible to work around and is unlikely to be a problem. In a single process, this is not an issue. In a multi-process application, this is the price that is paid for caching.
If this is a problem for your application then you should not use caching.
SYNCING MODULES
The following syncing modules are available with Alzabo:
Alzabo::ObjectCache::Sync::Null
This module simply emulates the syncing interface without doing any actual syncing, though it does track deleted objects. This module is useful is you want to cache objects in a single process but you don't need the overhead of real syncing.
Alzabo::ObjectCache::Sync::BerkeleyDB
Alzabo::ObjectCache::Sync::SDBM_File
Alzabo::ObjectCache::Sync::DB_File
These three modules all use DBM files, via the relevant module, to do multi-process syncing. They are listed in order from fastest to slowest. Using DB_File is significantly slower than either BerkeleyDB or SDBM_File, which are both relatively fast.
They all take the same parameters:
sync_dbm_file => $filename
The file which should be used to store syncing data.
clear_on_startup => $boolean
Indicates whether or not the file should be cleared before it is first used.
Alzabo::ObjectCache::Sync::Mmap
This module uses Cache::Mmap
for syncing. It takes the following parameters.
sync_mmap_file => $filename
The file which should be used to store syncing data.
clear_on_startup => $boolean
Indicates whether or not the file should be cleared before it is first used.
Alzabo::ObjectCache::Sync::RDBMS
This module uses an RDBMS to do syncing. This does not need to be the same database as your data is stored in, though it could be.
If the database it is told to use does not contain the table it needs, it will use the Alzabo::Create
modules to create it. If you have warnings turned on, this will cause a warning telling you that these modules were loaded, as having them loaded in any sort of persistent process is probably a waste of memory.
The table it stores data in looks like this:
AlzaboObjectCacheSync
----------------------
object_id varchar(22) primary key
sync_time varchar(40)
This modules take the following parameters:
sync_schema_name => $name
This should be the name of the schema where you want syncing data to be stored. If it doesn't exist, this module will attempt to create it.
sync_rdbms => $name (optional)
If the schema given does not exist, then this parameter is required so this module knows what type of database it is connecting to.
sync_user => $user (optional)
A username with which to connect to the database.
sync_password => $password (optional)
A password with which to connect to the database.
sync_host => $host (optional)
The host where the database lives.
sync_connect_params => { extra_param => 1 }
Extra connection parameters. These will simply be passed onto the relevant Driver module.
Alzabo::ObjectCache::Sync::IPC
This module is quite slow and is included mostly for historical reasons (it was one of the first syncing modules made). I recommend against using it but if you must it takes the following parameters:
clear_on_startup => $boolean
Indicates whether or not the file should be cleared before it is first used.
STORAGE MODULES
All of the storage modules may be turned into LRU caches by simply passing the lru_size parameter.
The following storage modules are included with Alzabo:
Alzabo::ObjectCache::Store::Null
This module mimics the storage interface without actually storing anything. It is useful if you want to use syncing without any storage.
Alzabo::ObjectCache::Store::Memory
This module simply stored cached objects in memory.
Alzabo::ObjectCache::Store::BerkeleyDB
This module stores serialized cached objects in a DBM file using the BerkeleyDB module.
It takes these parameters:
store_dbm_file => $filename
The file which should be used to store serialized objects.
clear_on_startup => $boolean
Indicates whether or not the file should be cleared before it is first used.
Alzabo::ObjectCache::Store::RDBMS
This module uses an RDBMS to do store. This does not need to be the same database as your data is stored in, though it could be.
For example, if you are using Oracle as your primary RDBMS, caching serialized objects in a MySQL database might be a performance boost.
If the database it is told to use does not contain the table it needs, it will use the Alzabo::Create
modules to create it. If you have warnings turned on, this will cause a warning telling you that these modules were loaded, as having them loaded in any sort of persistent process is probably a waste of memory.
The table it stores data in looks like this:
AlzaboObjectCacheStore
----------------------
object_id varchar(22) primary key
object_data blob
The actual type of the object_data column will vary depending on what RDBMS you are using.
This modules take the following parameters:
store_schema_name => $name
This should be the name of the schema where you want syncing data to be stored. If it doesn't exist, this module will attempt to create it.
store_rdbms => $name (optional)
If the schema given does not exist, then this parameter is required so this module knows what type of database it is connecting to.
store_user => $user (optional)
A username with which to connect to the database.
store_password => $password (optional)
A password with which to connect to the database.
store_host => $host (optional)
The host where the database lives.
store_connect_params => { extra_param => 1 }
Extra connection parameters. These will simply be passed onto the relevant Driver module.
Alzabo::ObjectCache METHODS
new
Returns
A new Alzabo::ObjectCache
object.
fetch_object ($id)
Returns
The specified object if it is in the cache. Otherwise it returns undef.
store_object ($object)
Stores an object in the cache. This will not overwrite an existing object in the cache. To do that you must first call the delete_from_cache
method.
is_expired ($object)
Returns
Whether or not the given object is expired.
is_deleted ($object)
Returns
A boolean value indicating whether or not an object has been deleted from the cache.
register_refresh ($object)
Tells the cache system that an object has refreshed its data from the database.
register_change ($object)
Tells the cache system that an object has updated its data in the database.
register_delete ($object)
This tells the cache that the object has been removed from its external data source. This causes the cache to remove the object internally. Future calls to is_deleted
for this object will now return true.
delete_from_cache ($object)
This method allows you to remove an object from the cache. This does not register the object as deleted. It is provided solely so that you can call store_object
after calling this method and have store_object
actually store the new object.
clear
Call this method to completely clear the cache.
MAKING YOUR OWN SUBCLASSES
It is relatively easy to create your own storage or syncing modules by following a fairly simple interface.
Storage Interface
The interface that any object storing module needs to implement is as follows:
new
Returns
A new object.
fetch_object ($id)
Returns
The specified object if it is in the cache. Otherwise it returns undef.
store_object ($object)
Stores an object in the cache but should not overwrite an existing object.
delete_from_cache ($object)
This method deletes an object from the cache.
clear
Completely clears the cache.
Syncing Interface
Any class that implements the syncing interface should inherit from Alzabo::ObjectCache::Sync
. This class provides most of the functionality necessary to handle syncing operations.
The interface that any object storing module needs to implement is as follows:
_init
This method will be called when the object is first created.
clear
Clears the process-local sync times (not the times shared between processes).
sync_time ($id)
Returns
Returns the time that the object matching the given id was last refreshed.
update ($id, $time, $overwrite)
This is called to update the state of the syncing object in regards to a particularl object. The first parameter is the object's id. The second is the time that the object was last refreshed. The third parameter tells the syncing object whether or not to preserve an existing time for the object if it already has one.
AUTHOR
Dave Rolsky, <autarch@urth.org>