NAME
SIL::Shoe::Data - Class for handling Shoebox databases
SYNOPSIS
require SIL::Shoe::Data;
$s = SIL::Shoe::Data->new("infile.db", $key);
$s->index;
$s->findrecord("Mine");
$s->readrecord(\@fieldkeys);
$s->{'newfield'} = "Hi there mum!";
push(@fieldkeys, "newfield");
$s->printrecord(\*FILEHANDLE, @fieldkeys)
DESCRIPTION
This class provides support for Standard Format databases as generated by Shoebox. The class supports indexing, incrementing, etc. and as such holds static information regarding a database.
The following methods are available:
SIL::Shoe::Data->new("filename", "key", 'attrname' => 'value', ...)
Creates a new Shoebox object corresponding to the SF database in "filename". The keyfield marker is given at this point also. If key is blank, then it will be guessed by looking for the first marker in the file (excluding \_sh). Extra attributes are also supported including:
- nostripnl
-
If set, instructs that multi-line fields not be joined into a single line by a space.
- nostripws
-
If set, disables the default stripping of whitespace from the start and end of a field's data.
- allfields
-
Make sure that all fields are output by printrecord, even those not in the given list.
- noblank
-
Indicates that records printed via printrecord should not have a following blank line.
- unicode
-
Assume the file is UTF8 unicode data otherwise process as bytes
$s->index("otherkey")
"otherkey" is optional and allows indexing on a key other than the key field.
Indexes the database according to the key field. Since Shoebox seems happy to hold its index in memory, so shall we. This index supports multiple records identically keyed.
The internal structure of the index is a hash of index entries each of which is an array of locations in the file. Thus:
$s->{' index'}{$entry}[$num]
returns a seek
location into the file. Note also that the index can be kept and saved and a new index created as needed.
This direct access is useful, for example in finding all the values of a given sfm:
$s->index("auth");
@auths = keys %{$s->{' index'}};
otherkey may take multiple values, in which case the index is indexed on the values of each field in the list passed, joined by a null (\000)
$s->index("title", "auth");
$myind = $s->{' index'}{"mybook\000me"}[0];
for records with multiple occurrences of an indexed field, then multiple index entries will me made. Thus
\entry 001
\title mybook
\title mybook: mysubtitle
\auth me
\auth myself
would result in 4 index entries for this one record:
"mybook\000me", "mybook\000myself",
"mybook: mysubtitle\000me", "mybook: mysubtitle\000myself"
Indexing also allows for some options, these are passed as a hash reference as the first parameter, as in:
$s->index({'-lines' => 1}, "title", "auth");
- -lines
-
Keeps the line number of the key field of each record in the index. The values are stored in the corresponding hash:
$s->{' lineindex'}
- -md5
-
Stores an md5 hash of each record according to the index entries in
$s->{' md5index'}
$s->findrecord("value");
Searches through the database for a key with the given value. Identical matching only is supported. If the database has been indexed, then the index is used in preference, which may, of course, be indexed on a different field.
For multiple records with the same index entries, multiple calls to findrecord with the same value will refer to each record in turn.
Calling findrecord clears the readrecord marker which allows sequential reading of records.
Returns undef if no record found and at the end of a list of records. Thus:
while ($s->findrecord("FirstOnly")) { ... }
Will process all the records indexed by "FirstOnly".
findrecord may also be passed a list of values in which case they are joined appropriately for searching a corresponding index with that value.
$s->readrecord(\@fieldlist [, $loc])
Reads a record from the current location as located by the last findrecord or readrecord whichever is later. Notice that if the last findrecord failed then the readrecord will start from the beginning of the file.
\@fieldlist is optional.
Multiple fields with the same name are *not* stored as an array, as might be expected, but as fields with spaces in as in f, f 0, f 1, etc. The precise names are returned in \@fieldlist. The advantage of this method is that users just wanting the first occurrence don't have to decide whether something is coming as an array or as a string. The other alternative would have been to make every field an array resulting in major hassle for people.
A way of turning the multiple fields into an array is to use a map
function of the form:
@array = map { m/^$fieldname(?:\s+\d+)?$/o ? $s->{$_} : () } @fieldlist;
which returns an array of fields called $fieldname from $s.
$loc specifies a location in the file to read from. Usually it is undefined, but if set allows for control over which record is read.
Returns undef if no record read (probably due to end of file).
$s->proc_record($sub [,$loc])
Iterates over each line of a record calling $sub for each line, which has been chomped. Uses the same approach as readrecord in choosing where to read.
$s->allof($key)
Returns all occurrences of a given key, in field order
$s->printrecord(\*FILE, @fieldlist)
Prints out an SH record with fields in the given order. If $s->{' allfields'} is set, then also add onto the end of the list, all unmentioned fields.
$s->rewind([$pos])
Rewinds the current pointer to the start (or $pos if given)
$s->DESTROY()
The destructor for an SF database. Closes the file before disappearing.