NAME

Freq - An inverted text index.

SYNOPSIS

Index documents:

# cat textcorpus.txt | tokenize | indexstream index_dir

Search:

# freqsearch index_dir
# (type search terms)

PROGRAMMING API

use Freq;

# open for indexing
$index = Freq->open_write( "indexname" );
$index->index_document( "docname", $string );
$index->close_index();

# open for searching
$index = Freq->open_read( "indexname" );

# Find all docs containing a phrase
$result = $index->doc_hash( "this phrase and no other phrase" );

# result is hashref:
# { doc1 => [ match1, match2 ... matchN ],
#   doc2 => [ match1, ... ]
# }
# ... where 'match' is the token location of each match within that doc.

SEARCH SYNTAX

Sequences of words are enclosed in angle brackets '<' and '>'. Alternations are enclosed in square brackets '[' and ']'. These may be nested within each other as long as it makes logical sense. "<the quick [brown grey] fox>" is a simple valid phrase. Nested square brackets don't make sense, logically, so they aren't allowed. Also not allowed are adjacent angle bracket sequences. However, alternations may be adjacent, as in "<I [go went] [to from] the store>". As long as these rules are followed, search terms may be arbitrarily complex. For example:

"<At [a the] council [of for] the [gods <deities we [love <believe in>] with all our [hearts strength]>] on olympus [hercules apollo athena <some guys>] [was were] [praised <condemned to eternal suffering in the underworld>]>"

Two operators are available to do proximity searches. '#wN' represents *at least* N intervening skips between words (the number of between words plus 1). Thus "<The #w8 dog>" would match the text "The quick brown fox jumped over the lazy dog". If #w7 or lesser had been used it would not match, but if #w9 or greater had been used it would still match. Also there is the '#tN' operator, which represents *exactly* N intervening skips. Thus for the above example "<The #t8 dog>", and no other value, would match. These operators can be used after words or alternations, but no other place.

EXPORT

None. Use programming API as shown.

AUTHOR

Ira Joseph Woodhead, ira at sweetpota dot to

SEE ALSO

Lucene Search::InvertedIndex