NAME
File::Locate::Iterator -- read "locate" database with an iterator
SYNOPSIS
use File::Locate::Iterator;
my $it = File::Locate::Iterator->new;
while (defined (my $entry = $it->next)) {
print $entry,"\n";
}
DESCRIPTION
File::Locate::Iterator reads a "locate" database file in iterator style. Each next() call on the iterator returns the next entry from the database.
/
/bin
/bin/bash
/bin/cat
Locate databases normally hold filenames as a way of finding files by name faster than churning through all directories. Optional glob, suffix and regexp options on the iterator can restrict the entries returned.
See examples/native.pl in the File-Locate-Iterator sources for a simple sample read, or for a examples/mini-locate.pl whole locate program simulation.
Only "LOCATE02" format files are supported, per current versions of GNU locate, not the previous "slocate" format.
Iterators from this module are stand-alone, they don't need any of the various Perl iterator frameworks. But see Iterator::Locate, Iterator::Simple::Locate and MooseX::Iterator::Locate to inter-operate with those frameworks, with ways to grep, map and otherwise manipulate iterations.
Forks and Threads
If an iterator using a file handle is cloned to a new thread or a process level fork() then generally it can be used by the parent or the child but not both. The underlying file descriptor position is shared by parent and child, so when one of them reads it upsets the position for the other. This sort of thing affects almost all code working with file handles across fork and threads. Perhaps some thread CLONE code could let threads work correctly (if slower), but a fork is probably doomed.
Iterators using mmap work correctly for both forks and threads, except that the if_sensible size calculation and sharing is not thread-aware beyond the mmaps existing when the thread is spawned. File::Map knows the mmaps across all threads, but won't reveal them.
Taint Mode
In taint mode (see "Taint mode" in perlsec) entries from a file or file handle are always tainted, the same as other file input. Taintedness of a database_str content string propagates to the entries.
For database_str_ref the initial taintedness propagates. In the unlikely event you untaint it during iteration the entries remain tainted because they depend or may depend on the data back from when the input was tainted. A rewind() will reset the taintedness though.
For reference, taint mode is only a small slowdown for the XS iterator code, and usually (it seems) only a little more for the pure Perl.
Other Notes
The locate database format is only designed to be read forwards, hence no prev method on the iterator. The start of a previous record can't be distinguished by its content, and the "front coding" means the state at a given point may depend on records an arbitrary distance back too. A "tell" which gave file position plus state would be possible, though perhaps a "clone" of the whole iterator would be more use.
On some systems mmap may be a bit too effective, giving a process more of the CPU than other processes which make periodic read system calls. This is a matter of OS scheduling, but you might have to turn down the nice or ionice if doing a lot of mmapped work (see nice(1), ionice(1), "setpriority" in perlfunc, ioprio_set(2)).
FUNCTIONS
Constructor
$it = File::Locate::Iterator->new (key=>value,...)-
Create and return a new locate database iterator object. The following optional key/value pairs can be given,
database_file(string, default the system locate database)database_fh(handle ref)-
The file to read, either as filename or file handle. The default file is the
default_database_file()below.$it = File::Locate::Iterator->new (database_file => '/foo/bar.db');A filehandle is read with the usual
PerlIOso it can use layers and come from various sources, but it should be in binary mode (see "binmode" in perlfunc and ":raw" in PerlIO). database_str(string)database_str_ref(ref to string)-
The database contents to read in the form of a byte string.
$it = File::Locate::Iterator->new (database_str => "\0LOCATE02\0\0/hello\0\006/world\0");The string ends up copied into the iterator, or
database_str_refcan be used to have it look into a given scalar without copying,my $str = "\0LOCATE02\0\0/hello\0\006/world\0"; $it = File::Locate::Iterator->new (database_str_ref => \$str);For
database_str_refif the originating scalar is tied or has other magic that that ends up re-run for each access, in the usual way. That might be a good thing, or you might prefer the copying ofdatabase_strin that case. suffix(string)suffixes(arrayref of strings)glob(string)globs(arrayref of strings)regexp(string or regexp object)regexps(arrayref of strings or regexp objects)-
Restrict the entries returned to those with given suffix(es) or matching the given glob(s) or regexp(s). For example,
# C code files on the system, .c and .h $it = File::Locate::Iterator->new (suffixes => ['.c','.h']);If multiple patterns or suffixes are given then matches of any are returned.
Globs are in the style of the
locateprogram which meansfnmatchwith no options (see File::FnMatch) and the pattern match of the full entry if there's wildcards ("*", "?" or "[") or of any part if a fixed string.glob => '*.c' # .c files, no .cxx files glob => '.c' # fixed str, .cxx matches tooGlobs should be byte strings (not wide chars) since that's how the database entries are handled, and suspect
fnmatchhas no notion of charset coding for its strings and patterns. use_mmap(string, default "if_sensible")-
Whether to use
mmapto access the database. This is fast and resource-efficient when available. To use mmap you must have theFile::Mapmodule (version 0.38 or higher), the file must fit in available address space, and for adatabase_fhhandle there mustn't be any transformingPerlIOlayers. Theuse_mmapchoices areundef \ "default" | use mmap if sensible "if_sensible" / "if_possible" use mmap if possible, otherwise file I/O 0 don't use mmap 1 must use mmap, croak if cannotSetting
default,undefor omitted meansif_sensible.if_sensibleuses mmap if available, and the file size is reasonable, and fordatabase_fhif it isn't already using an:mmaplayer.if_possibleuses mmap whenever it can be done, without those qualifiers.$it = File::Locate::Iterator->new (use_mmap => 'if_possible');When multiple iterators access the same file they share the mmap. The size check for
if_sensiblecounts space in allFile::Locate::Iteratormappings and won't go beyond 1/5 of available data space, which is assumed to be a quarter of the wordsize, so for a 32-bit system a total at most 200Mb.if_possibleandif_sensiblewill only mmap ordinary files because generally the file size on char specials is not reliable.
$filename = File::Locate::Iterator->default_database_file()-
Return the default database file used for
newabove. This is meant to be the same as thelocateprogram uses and currently means$ENV{'LOCATE_PATH'} if that env var set /var/cache/locate/locatedb otherwisePerhaps in the future it might be possible to check how
findutilshas been installed rather than assuming /var/cache/locate/.
Operations
$entry = $it->next-
Return the next entry from the database, or no values at end of file. No values means
undefin scalar context or an empty list in array context so you can loop with eitherwhile (defined (my $filename = $it->next)) ...or
while (my ($filename) = $it->next) ...The return is a byte string since it's normally a filename and Perl handles filenames as byte strings.
$it->rewind-
Rewind
$itback to the start of the database. The next$it->nextcall will return the first entry.This is only possible when the underlying database file or handle is seekable, ie.
seek()works, which will usually mean a plain file, or PerlIO layers with seek support.
ENVIRONMENT VARIABLES
LOCATE_PATH-
Default locate database.
FILES
OTHER WAYS TO DO IT
File::Locate reads a locate database with callbacks instead. Whether you want callbacks or an iterator is a matter of personal preference. Iterators let you write your own loop and have multiple searches in progress simultaneously.
The speed of an iterator is about the same as callbacks when File::Locate::Iterator is built with its XSUB code (which requires Perl 5.10.0 or higher currently).
Iterators are good for cooperative coroutining like POE or Gtk where state must be held in some sort of variable to be progressed by calls from the main loop. Note that next() blocks on reading from the database, so the database should generally be a plain file rather than a socket or something, so as not to hold up a main loop.
If you have the recommended File::Map module then iterators share an mmap of the database file. Otherwise currently each holds a separate open handle to the database which means a file descriptor and PerlIO buffering per iterator. Sharing a handle and making each seek to its desired position would be possible, but a seek drops buffered data and so would go slower. Some PerlIO or IO::Handle trickery might transparently share an fd and keep buffered blocks from multiple file positions.
SEE ALSO
Iterator::Locate, Iterator::Simple::Locate, MooseX::Iterator::Locate
File::Locate, locate(1) and the GNU Findutils manual, File::FnMatch, File::Map
HOME PAGE
http://user42.tuxfamily.org/file-locate-iterator/index.html
COPYRIGHT
Copyright 2009, 2010, 2011 Kevin Ryde
File-Locate-Iterator is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3, or (at your option) any later version.
File-Locate-Iterator is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with File-Locate-Iterator. If not, see http://www.gnu.org/licenses/