From Code to Community: Sponsoring The Perl and Raku Conference 2025 Learn more

NAME

KinoSearch::InvIndexer - build inverted indexes

WARNING

KinoSearch is alpha test software. The API and the file format are subject to change.

SYNOPSIS

my $analyzer
= KinoSearch::Analysis::PolyAnalyzer->new( language => 'en' );
my $invindexer = KinoSearch::InvIndexer->new(
invindex => '/path/to/invindex',
create => 1,
analyzer => $analyzer,
);
$invindexer->spec_field(
name => 'title'
boost => 3,
);
$invindexer->spec_field( name => 'bodytext' );
while ( my ( $title, $bodytext ) = each %source_documents ) {
my $doc = $invindexer->new_doc($title);
$doc->set_value( title => $title );
$doc->set_value( bodytext => $bodytext );
$invindexer->add_doc($doc);
}
$invindexer->finish;

DESCRIPTION

The InvIndexer class is KinoSearch's primary tool for creating and modifying inverted indexes, which may be searched using KinoSearch::Searcher.

METHODS

new

my $invindexer = KinoSearch::InvIndexer->new(
invindex => '/path/to/invindex', # required
create => 1, # default: 0
analyzer => $analyzer, # default: no-op Analyzer
);

Create an InvIndexer object.

spec_field

$invindexer->spec_field(
name => 'url', # required
boost => 1, # default: 1,
analyzer => undef, # default: analyzer spec'd in new()
indexed => 0, # default: 1
analyzed => 0, # default: 1
stored => 1, # default: 1
compressed => 0, # default: 0
vectorized => 0, # default: 1
);

Define a field.

  • name - the field's name.

  • boost - A multiplier which determines how much a field contributes to a document's score.

  • analyzer - By default, all indexed fields are analyzed using the analyzer that was supplied to new(). Supplying an alternate for a given field overrides the primary analyzer.

  • indexed - index the field, so that it can be searched later.

  • analyzed - analyze the field, using the relevant Analyzer. Fields such as "category" or "product_number" might be indexed but not analyzed.

  • stored - store the field, so that it can be retrieved when the document turns up in a search.

  • compressed - compress the stored field, using the zlib compression algorithm.

  • vectorized - store the field's "term vectors", which are required by KinoSearch::Highlight::Highlighter for excerpt selection and search term highlighting.

new_doc

my $doc = $invindexer->new_doc;

Spawn an empty KinoSearch::Document::Doc object, primed to accept values for the fields spec'd by spec_field.

add_doc

$invindexer->add_doc($doc);

Add a document to the invindex.

add_invindexes

my $invindexer = KinoSearch::InvIndexer->new(
invindex => $invindex,
analyzer => $analyzer,
);
$invindexer->add_invindexes( $another_invindex, $yet_another_invindex );
$invindexer->finish;

Absorb existing invindexes into this one. May only be called once per InvIndexer. add_invindexes() and add_doc() cannot be called on the same InvIndexer.

delete_docs_by_term

my $term = KinoSearch::Index::Term->new( 'id', $unique_id );
$invindexer->delete_docs_by_term($term);

Mark any document which contains the supplied term as deleted, so that it will be excluded from search results. For more info, see Deletions in KinoSearch::Docs::FileFormat.

finish

$invindexer->finish(
optimize => 1, # default: 0
);

Finish the invindex. Invalidates the InvIndexer. Takes one hash-style parameter.

  • optimize - If optimize is set to 1, the invindex will be collapsed to its most compact form, which will yield the fastest queries.

COPYRIGHT

Copyright 2005-2006 Marvin Humphrey

LICENSE, DISCLAIMER, BUGS, etc.

See KinoSearch version 0.12.