NAME

FileArchiveIndexer::Update

DESCRIPTION

This module should not be used directly. It is inherited by FileArchiveIndexer. The code and documentation are placed herein for organization purposes only.

UPDATE

This is the step that will be running regularly on your system. What is does, is collect current filesystem information, about what the files of interest are, and what they md5sums are.

This information is kept in the 'files table'. Everytime you udpate, the entire files table is dropped, reset, dumped, killed, and rebuilt. Entirely. This data should be kept current. The only things we want in this part of the index are the location of the file on disk, so we may physically find it- and the md5sum hex digest of the file, so we may later associate metadata, indexed data about the files, with an actual file on disk.

This process takes about 30 minutes on a Intel Xeon 2.40GHz machine for a record of 60k files occupying 30 gigs. This is not a memory intensive procedure.

Why are the Location and Indexing steps kept sepparate?

Indexing a file can take a long time. Indexing does not mean simply

repopulate_files_table()

This is what you do to update. It completely rebuilds the files table. But leaves the data table untouched. This operation does not commit, you must call commit afterwards

return value is count of files found on disk

$i->repopulate_files_table;
$i->dbh->commit;

Using Digest::MD5::File: Takes an hour for 700 clients, approx 60k pdf documents.

Using Digest::MD5 700 clients, 60k docs... 30 secs

_find_all_files()

DOCUMENT_ROOT()

set or get the document root for your indexer.

$i->DOCUMENT_ROOT('/home/myself');

finder()

returns File::Find::Rule Object.

This is called if you want to make changes to what files we want the index to hold or scan. If you know that perhaps some file with the filename "june" has changed.. you could do a quick update this way:

$i->finder->name( qr/june.+\.pdf/i );
$i->scan_for_new_and_enqueue;

scan_for_new() will seek inside DOCUMENT_ROOT if it is set or warn. Please see File::Find::Rule for more.

The only default set on the finder object is file(), so that it only matches files.

min_chars_to_index()

SEE ALSO

FileArchiveIndexer

AUTHOR

Leo Charre leocharre at cpan dot org

LICENSE

This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, i.e., under the terms of the "Artistic License" or the "GNU General Public License".

DISCLAIMER

This package is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

See the "GNU General Public License" for more details.