NAME

HTML::Index::Create - Perl extension for creating a searchable HTML files

SYNOPSIS

use HTML::Index::Create;

$indexer = HTML::Indexer->new( %options );

$indexer->create_index;

DESCRIPTION

HTML::Index::Create is a simple module for creating a searchable index for HTML files so that they can be subsequently searched by keywords. It is looselly based on the indexer.pl script in the O'Reilly "CGI Programming with Perl, 2nd Edition" book (http://www.oreilly.com/catalog/cgi2/author.html).

All files in are parsed using HTML::TreeBuilder and the word in those pages added to the index. Words are stored lowercase, anything at least 2 characters long, and consist of alphanumerics ([a-z\d]{2,}).

Indexes are stored to use Berkeley DB files.

The modification times of files in the index are stored, and they are "re-inexed" if their modification time changes. Searches return results in no particular order - it is up to the caller to re-order them appropriately! Indexes can be run incrementally - only new or updated files will be indexed or re-indexed.

CONSTRUCTOR OPTIONS

VERBOSE: Print various bumpf to STDERR.
STOP_WORD_FILE: Specify a file containing "stop words" to ignore when indexling. A sample stopwords.txt file is included in this distribution. MAke sure you use the same STOP_WORD_FILE for indexing and searching. Otherwise, if you submit a search for a word that was in the stop word list when indexing (especially in a combination search) you may not get the result you expect!
DB_HASH_CACHESIZE: Set the cachesize for the DB_File hashes. Default is 0.
REFRESH: Boolean to regenerate the index from scratch.
DB_DIR: Specify a directory to store the Berkeley DB files. Defaults to '.'.

METHODS

create_index: Does exactly what it says on the can.

AUTHOR

Ave Wrigley <Ave.Wrigley@itn.co.uk>

COPYRIGHT

To install HTML::Index, copy and paste the appropriate command in to your terminal.

cpanm

cpanm HTML::Index

CPAN shell

perl -MCPAN -e shell
install HTML::Index

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)