NAME
Lucy::Search::Compiler - Query-to-Matcher compiler.
SYNOPSIS
# (Compiler is an abstract base class.)
package
MyCompiler;
sub
make_matcher {
my
$self
=
shift
;
return
MyMatcher->new(
@_
,
compiler
=>
$self
);
}
DESCRIPTION
The purpose of the Compiler class is to take a specification in the form of a Query object and compile a Matcher object that can do real work.
The simplest Compiler subclasses – such as those associated with constant-scoring Query types – might simply implement a make_matcher() method which passes along information verbatim from the Query to the Matcher’s constructor.
However it is common for the Compiler to perform some calculations which affect it’s “weight” – a floating point multiplier that the Matcher will factor into each document’s score. If that is the case, then the Compiler subclass may wish to override get_weight(), sum_of_squared_weights(), and apply_norm_factor().
Compiling a Matcher is a two stage process.
The first stage takes place during the Compiler’s construction, which is where the Query object meets a Searcher object for the first time. Searchers operate on a specific document collection and they can tell you certain statistical information about the collection – such as how many total documents are in the collection, or how many documents in the collection a particular term is present in. Lucy’s core Compiler classes plug this information into the classic TF/IDF weighting algorithm to adjust the Compiler’s weight; custom subclasses might do something similar.
The second stage of compilation is make_matcher(), method, which is where the Compiler meets a SegReader object. SegReaders are associated with a single segment within a single index on a single machine, and are thus lower-level than Searchers, which may represent a document collection spread out over a search cluster (comprising several indexes and many segments). The Compiler object can use new information supplied by the SegReader – such as whether a term is missing from the local index even though it is present within the larger collection represented by the Searcher – when figuring out what to feed to the Matchers’s constructor, or whether make_matcher() should return a Matcher at all.
CONSTRUCTORS
new
my
$compiler
= MyCompiler->SUPER::new(
parent
=>
$my_query
,
searcher
=>
$searcher
,
similarity
=>
$sim
,
# default: undef
boost
=>
undef
,
# default: see below
);
Abstract constructor.
parent - The parent Query.
searcher - A Lucy::Search::Searcher, such as an IndexSearcher.
similarity - A Similarity.
boost - An arbitrary scoring multiplier. Defaults to the boost of the parent Query.
ABSTRACT METHODS
make_matcher
my
$matcher
=
$compiler
->make_matcher(
reader
=>
$reader
,
# required
need_score
=>
$need_score
,
# required
);
Factory method returning a Matcher.
reader - A SegReader.
need_score - Indicate whether the Matcher must implement score().
Returns: a Matcher, or undef if the Matcher would have matched no documents.
METHODS
get_weight
my
$float
=
$compiler
->get_weight();
Return the Compiler’s numerical weight, a scoring multiplier. By default, returns the object’s boost.
get_similarity
my
$similarity
=
$compiler
->get_similarity();
Accessor for the Compiler’s Similarity object.
get_parent
my
$query
=
$compiler
->get_parent();
Accessor for the Compiler’s parent Query object.
sum_of_squared_weights
my
$float
=
$compiler
->sum_of_squared_weights();
Compute and return a raw weighting factor. (This quantity is used by normalize()). By default, simply returns 1.0.
apply_norm_factor
$compiler
->apply_norm_factor(
$factor
);
Apply a floating point normalization multiplier. For a TermCompiler, this involves multiplying its own weight by the supplied factor; combining classes such as ORCompiler would apply the factor recursively to their children.
The default implementation is a no-op; subclasses may wish to multiply their internal weight by the supplied factor.
factor - The multiplier.
normalize
$compiler
->normalize();
Take a newly minted Compiler object and apply query-specific normalization factors. Should be invoked by Query subclasses during make_compiler() for top-level nodes.
For a TermQuery, the scoring formula is approximately:
(tf_d * idf_t / norm_d) * (tf_q * idf_t / norm_q)
normalize() is theoretically concerned with applying the second half of that formula to a the Compiler’s weight. What actually happens depends on how the Compiler and Similarity methods called internally are implemented.
INHERITANCE
Lucy::Search::Compiler isa Lucy::Search::Query isa Clownfish::Obj.