NAME
Lucy::Search::Compiler - Query-to-Matcher compiler.
SYNOPSIS
# (Compiler is an abstract base class.)
package MyCompiler;
use base qw( Lucy::Search::Compiler );
sub make_matcher {
my $self = shift;
return MyMatcher->new( @_, compiler => $self );
}
DESCRIPTION
The purpose of the Compiler class is to take a specification in the form of a Query object and compile a Matcher object that can do real work.
The simplest Compiler subclasses – such as those associated with constant-scoring Query types – might simply implement a make_matcher() method which passes along information verbatim from the Query to the Matcher’s constructor.
However it is common for the Compiler to perform some calculations which affect it’s “weight” – a floating point multiplier that the Matcher will factor into each document’s score. If that is the case, then the Compiler subclass may wish to override get_weight(), sum_of_squared_weights(), and apply_norm_factor().
Compiling a Matcher is a two stage process.
The first stage takes place during the Compiler’s construction, which is where the Query object meets a Searcher object for the first time. Searchers operate on a specific document collection and they can tell you certain statistical information about the collection – such as how many total documents are in the collection, or how many documents in the collection a particular term is present in. Lucy’s core Compiler classes plug this information into the classic TF/IDF weighting algorithm to adjust the Compiler’s weight; custom subclasses might do something similar.
The second stage of compilation is make_matcher(), method, which is where the Compiler meets a SegReader object. SegReaders are associated with a single segment within a single index on a single machine, and are thus lower-level than Searchers, which may represent a document collection spread out over a search cluster (comprising several indexes and many segments). The Compiler object can use new information supplied by the SegReader – such as whether a term is missing from the local index even though it is present within the larger collection represented by the Searcher – when figuring out what to feed to the Matchers’s constructor, or whether make_matcher() should return a Matcher at all.
CONSTRUCTORS
new
my $compiler = MyCompiler->SUPER::new(
parent => $my_query,
searcher => $searcher,
similarity => $sim, # default: undef
boost => undef, # default: see below
);
Abstract constructor.
parent - The parent Query.
searcher - A Lucy::Search::Searcher, such as an IndexSearcher.
similarity - A Similarity.
boost - An arbitrary scoring multiplier. Defaults to the boost of the parent Query.
ABSTRACT METHODS
make_matcher
$compiler->make_matcher(
reader => $seg_reader, # required
need_score => $bool, # required
);
Factory method returning a Matcher.
reader - A SegReader.
need_score - Indicate whether the Matcher must implement score().
Returns: a Matcher, or undef if the Matcher would have matched no documents.
METHODS
get_weight
$compiler->get_weight();
Return the Compiler’s numerical weight, a scoring multiplier. By default, returns the object’s boost.
get_similarity
$compiler->get_similarity();
Accessor for the Compiler’s Similarity object.
get_parent
$compiler->get_parent();
Accessor for the Compiler’s parent Query object.
sum_of_squared_weights
$compiler->sum_of_squared_weights();
Compute and return a raw weighting factor. (This quantity is used by normalize()). By default, simply returns 1.0.
apply_norm_factor
$compiler->apply_norm_factor($factor);
Apply a floating point normalization multiplier. For a TermCompiler, this involves multiplying its own weight by the supplied factor; combining classes such as ORCompiler would apply the factor recursively to their children.
The default implementation is a no-op; subclasses may wish to multiply their internal weight by the supplied factor.
factor - The multiplier.
normalize
$compiler->normalize();
Take a newly minted Compiler object and apply query-specific normalization factors. Should be invoked by Query subclasses during make_compiler() for top-level nodes.
For a TermQuery, the scoring formula is approximately:
(tf_d * idf_t / norm_d) * (tf_q * idf_t / norm_q)
normalize() is theoretically concerned with applying the second half of that formula to a the Compiler’s weight. What actually happens depends on how the Compiler and Similarity methods called internally are implemented.
INHERITANCE
Lucy::Search::Compiler isa Lucy::Search::Query isa Clownfish::Obj.