NAME
Pod::POM::Web::Indexer - full-text search for Pod::POM::Web
SYNOPSIS
perl -MPod::POM::Web::Indexer -e index
DESCRIPTION
Adds full-text search capabilities to the Pod::POM::Web application. This requires Search::Indexer to be installed.
Queries may include plain terms, "exact phrases", '+' or '-' prefixes, Boolean operators and parentheses. See Search::QueryParser for details.
METHODS
index
Pod::POM::Web::Indexer->new->index(%options)
Walks through directories in @INC
and indexes all *.pm
and *.pod
files, skipping shadowed files (files for which a similar loading path was already found in previous @INC
directories), and skipping files that are too big.
Default indexing is incremental : files whose modification time has not changed since the last indexing operation will not be indexed again.
Options can be
- -max_size
-
Size limit (in bytes) above which files will not be indexed. The default value is 300K. Files of size above this limit are usually not worth indexing because they only contain big configuration tables (like for example
Module::CoreList
orUnicode::Charname
). - -from_scratch
-
If true, the previous index is deleted, so all files will be freshly indexed. If false (the default), indexation is incremental, i.e. files whose modification time has not changed will not be re-indexed.
- -positions
-
If true, the indexer will also store word positions in documents, so that it can later answer to "exact phrase" queries.
So if
-positions
are on, a search for"more than one way"
will only return documents which contain that exact sequence of contiguous words; whereas if-positions
are off, the query is equivalent tomore AND than AND one AND way
, i.e. it returns all documents which contain these words anywhere and in any order.The option is off by default, because it requires much more disk space, and does not seem to be very relevant for searching Perl documentation.
The index
function is exported into the main::
namespace if perl is called with the -e
flag, so that you can write
perl -MPod::POM::Web::Indexer -e index
PERFORMANCES
On my machine, indexing a module takes an average of 0.2 seconds, except for some long and complex sources (this is why sources above 300K are ignored by default, see options above). Here are the worst figures (in seconds) :
Date/Manip 39.655
DBI 30.73
Pod/perlfunc 29.502
Module/CoreList 27.287
CGI 16.922
Config 13.445
CPAN 12.598
Pod/perlapi 10.906
CGI/FormBuilder 8.592
Win32/TieRegistry 7.338
Spreadsheet/WriteExcel 7.132
Pod/perldiag 5.771
Parse/RecDescent 5.405
Bit/Vector 4.768
The index will be stored in an index subdirectory under the module installation directory. The total index size should be around 10MB if -positions
are off, and between 30MB and 50MB if -positions
are on, depending on how many modules are installed.
TODO
- highlights in shown documents
- paging