NAME

Pod::POM::Web::Indexer - fulltext search for Pod::POM::Web

SYNOPSIS

perl -MPod::POM::Web::Indexer -e "Pod::POM::Web::Indexer->new->index"

DESCRIPTION

Adds fulltext search capabilities to the Pod::POM::Web application. This requires Search::Indexer to be installed.

Queries may include plain terms, "exact phrases", '+' or '-' prefixes, boolean operators and parentheses. See Search::QueryParser for details.

METHODS

index

Pod::POM::Web::Indexer->new->index(%options)

Walks through directories in @INC and indexes all *.pm and *.pod files, skipping shadowed files (files for which a similar loading path was already found in previous @INC directories), and skipping files that are too big.

Default indexing is incremental : files whose modification time has not changed since the last indexing operation will not be indexed again.

Options can be

-max_size

Size limit (in bytes) above which files will not be indexed. The default value is 300K. Files of size above this limit are usually not worth indexing because they only contain big configuration tables (like for example Module::CoreList or Unicode::Charname).

-from_scratch

If true, the previous index is deleted, so all files will be freshly indexed. If false (the default), indexation is incremental, i.e. files whose modification time has not changed will not be re-indexed.

-positions

If true, the indexer will also store word positions in documents, so that it can later answer to "exact phrase" queries.

So if -positions are on, a search for "more than one way" will only return documents which contain that exact sequence of contiguous words; whereas if -positions are off, the query is equivalent to more AND than AND one AND way, i.e. it returns all documents which contain these words anywhere and in any order.

The option is off by default, because it requires much more disk space, and does not seem to be very relevant for searching Perl documentation.

PERFORMANCES

On my machine, indexing a module takes an average of 0.2 seconds, except for some long and complex sources (this is why sources above 300K are ignored by default, see options above). Here are the worst figures (in seconds) :

Date/Manip            39.655
DBI                   30.73
Pod/perlfunc          29.502
Module/CoreList       27.287
CGI                   16.922
Config                13.445
CPAN                  12.598
Pod/perlapi           10.906
CGI/FormBuilder        8.592
Win32/TieRegistry      7.338
Spreadsheet/WriteExcel 7.132
Pod/perldiag           5.771
Parse/RecDescent       5.405
Bit/Vector             4.768

The index will be stored in an index subdirectory under the module installation directory. The total index size should be around 10MB if -positions are off, and between 30MB and 50MB if -positions are on, depending on how many modules are installed.

TODO

- highlights in shown documents
- paging