NAME
KinoSearch::InvIndexer - build inverted indexes
WARNING
KinoSearch is alpha test software. The API and the file format are subject to change.
SYNOPSIS
my
$analyzer
= KinoSearch::Analysis::PolyAnalyzer->new(
language
=>
'en'
);
my
$invindexer
= KinoSearch::InvIndexer->new(
invindex
=>
'/path/to/invindex'
,
create
=> 1,
analyzer
=>
$analyzer
,
);
$invindexer
->spec_field(
name
=>
'title'
boost
=> 3,
);
$invindexer
->spec_field(
name
=>
'bodytext'
);
while
(
my
(
$title
,
$bodytext
) =
each
%source_documents
) {
my
$doc
=
$invindexer
->new_doc(
$title
);
$doc
->set_value(
title
=>
$title
);
$doc
->set_value(
bodytext
=>
$bodytext
);
$invindexer
->add_doc(
$doc
);
}
$invindexer
->finish;
DESCRIPTION
The InvIndexer class is KinoSearch's primary tool for creating and modifying inverted indexes, which may be searched using KinoSearch::Searcher.
METHODS
new
my
$invindexer
= KinoSearch::InvIndexer->new(
invindex
=>
'/path/to/invindex'
,
# required
create
=> 1,
# default: 0
analyzer
=>
$analyzer
,
# default: no-op Analyzer
);
Create an InvIndexer object.
invindex - can be either a filepath, or an InvIndex subclass such as KinoSearch::Store::FSInvIndex or KinoSearch::Store::RAMInvIndex.
create - create a new invindex, clobbering an existing one if necessary.
analyzer - an object which subclasses KinoSearch::Analysis::Analyzer, such as a PolyAnalyzer.
spec_field
$invindexer
->spec_field(
name
=>
'url'
,
# required
boost
=> 1,
# default: 1,
analyzer
=>
undef
,
# default: analyzer spec'd in new()
indexed
=> 0,
# default: 1
analyzed
=> 0,
# default: 1
stored
=> 1,
# default: 1
compressed
=> 0,
# default: 0
vectorized
=> 0,
# default: 1
);
Define a field.
name - the field's name.
boost - A multiplier which determines how much a field contributes to a document's score.
analyzer - By default, all indexed fields are analyzed using the analyzer that was supplied to new(). Supplying an alternate for a given field overrides the primary analyzer.
indexed - index the field, so that it can be searched later.
analyzed - analyze the field, using the relevant Analyzer. Fields such as "category" or "product_number" might be indexed but not analyzed.
stored - store the field, so that it can be retrieved when the document turns up in a search.
compressed - compress the stored field, using the zlib compression algorithm.
vectorized - store the field's "term vectors", which are required by KinoSearch::Highlight::Highlighter for excerpt selection and search term highlighting.
new_doc
my
$doc
=
$invindexer
->new_doc;
Spawn an empty KinoSearch::Document::Doc object, primed to accept values for the fields spec'd by spec_field.
add_doc
$invindexer
->add_doc(
$doc
);
Add a document to the invindex.
add_invindexes
my
$invindexer
= KinoSearch::InvIndexer->new(
invindex
=>
$invindex
,
analyzer
=>
$analyzer
,
);
$invindexer
->add_invindexes(
$another_invindex
,
$yet_another_invindex
);
$invindexer
->finish;
Absorb existing invindexes into this one. May only be called once per InvIndexer. add_invindexes() and add_doc() cannot be called on the same InvIndexer.
delete_docs_by_term
my
$term
= KinoSearch::Index::Term->new(
'id'
,
$unique_id
);
$invindexer
->delete_docs_by_term(
$term
);
Mark any document which contains the supplied term as deleted, so that it will be excluded from search results. For more info, see Deletions in KinoSearch::Docs::FileFormat.
finish
$invindexer
->finish(
optimize
=> 1,
# default: 0
);
Finish the invindex. Invalidates the InvIndexer. Takes one hash-style parameter.
optimize - If optimize is set to 1, the invindex will be collapsed to its most compact form, which will yield the fastest queries.
COPYRIGHT
Copyright 2005-2006 Marvin Humphrey
LICENSE, DISCLAIMER, BUGS, etc.
See KinoSearch version 0.12.