NAME

PDL::DiskCache -- simple caching object for tieing lists of data

SYNOPSIS

NON-OO: use PDL::DiskCache; tie @a,'PDL::DiskCache', \@files, \%options; imag $a[3];

OO: use PDL::DiskCache; $a = diskcache(\@files,\%options); imag $a->[3]; or use PDL::DiskCache; $a = new PDL::DiskCache(\@files,\%options); imag $a->[4];

\@files

an array ref containing a list of file names

\%options

a hash ref containing options for the PDL::DiskCache object (see "TIEARRAY" below for details)

DESCRIPTION

PDL::DiskCache stores one dimension of a data set on disk, caching some of them in memory. It's useful for operations where you have to look at a large collection of files one or a few at a time.

PDL::DiskCache connects to FITS files by default but can connect to any sort of file at all -- the read/write routines are the only place where it examines the underlying data. To use PDL::DiskCache with other sorts of data, subclass it or, for ease of quick-and-dirty use, you can pass in the read and write routines in the options hash.

Items are swapped out on a FIFO basis, so if you have 10 slots and an expression with 10 items in it then you're OK (but you probably want more slots than that); but if you use more items in an expression than there are slots, thrashing will occur!

The OO interface is preferred, since you then have access to all the methods and not just the normal array-access methods.

Shortcomings & caveats

There's no file locking, so you could really hose yourself by having two of these things going at once on the same files.

Since this is a tied array, things like Dumper traverse it transparently. That is sort-of good but also sort-of dangerous. You wouldn't want to PDL::Dumper::sdump() a large PDL::DiskCache, for example -- that would defeat the purpose of using a PDL::DiskCache in the first place...

Author, license, no warranty

Copyright 2001, Craig DeForest

This code may be distributed under the same terms as Perl itself (license available at http://www.perl.org). Copying, reverse engineering, distribution, and modification are explicitly allowed so long as this notice is preserved intact and modified versions are clearly marked as such.

If you modify the code and it's useful, please send a copy of the modified version to cdeforest@solar.stanford.edu.

This package comes with NO WARRANTY.

FUNCTIONS

diskcache

Object constructor.

Synopsis
$a = diskcache(\@f,\%options);
Options

see the TIEARRAY options,below.

TIEARRAY

Tied-array constructor.

Synopsis
TIEARRAY(class,\@f,\%options)
Options

ro (default 0): If set, treat the files as read-only (modifications to the tied array will only persist until the changed elements are swapped out)

rw (default 1): If set, allow reading and writing to the files. Because there's currently no way to determine reliably whether a PDL has been modified, rw files are always written to disk when they're swapped out -- this causes a slight performance hit.

mem (default 20): Number of files to be cached in memory at once.

read (default \&rfits): A function ref pointing to code that will read list objects from disk. The function must have the same syntax as rfits: $object = rfits(filename).

write (default \&wfits): A function ref pointing to code that will write list objects to disk. The function must have the same syntax as wfits: func(object,filename).

verbose (default 0): Get chatty.

purge

Remove an item from the oldest slot in the cache, writing to disk as necessary. You also send in how many slots to purge (default 1; sending in -1 purges everything.)

For most uses, a nice MODIFIED flag in the data structure could save some hassle here. But PDLs can get modified out from under us with slicing and .= -- so for now we always assume everything is tainted and must be written to disk.

FETCH

Fetching routine. (Does it have to be an lvalue?)

sync

In a rw cache, flush all items out to disk but retain them in the cache. This is useful primarily for cache protection and could be slow. Because we have no way of knowing what's modified and what's not in the cache, all elements are always flushed from an rw cache. For ro caches, this is a not-too-slow (but safe) no-op.

DESTROY

Synchronize the cache out to disk if it's an rw cache, before allowing it to be broken down by the destructor crew.