NAME

Lucy::Search::Compiler - Query-to-Matcher compiler.

SYNOPSIS

# (Compiler is an abstract base class.)
package MyCompiler;
use base qw( Lucy::Search::Compiler );

sub make_matcher {
    my $self = shift;
    return MyMatcher->new( @_, compiler => $self );
}

DESCRIPTION

The purpose of the Compiler class is to take a specification in the form of a Query object and compile a Matcher object that can do real work.

The simplest Compiler subclasses – such as those associated with constant-scoring Query types – might simply implement a make_matcher() method which passes along information verbatim from the Query to the Matcher’s constructor.

However it is common for the Compiler to perform some calculations which affect it’s “weight” – a floating point multiplier that the Matcher will factor into each document’s score. If that is the case, then the Compiler subclass may wish to override get_weight(), sum_of_squared_weights(), and apply_norm_factor().

Compiling a Matcher is a two stage process.

The first stage takes place during the Compiler’s construction, which is where the Query object meets a Searcher object for the first time. Searchers operate on a specific document collection and they can tell you certain statistical information about the collection – such as how many total documents are in the collection, or how many documents in the collection a particular term is present in. Lucy’s core Compiler classes plug this information into the classic TF/IDF weighting algorithm to adjust the Compiler’s weight; custom subclasses might do something similar.

The second stage of compilation is make_matcher(), method, which is where the Compiler meets a SegReader object. SegReaders are associated with a single segment within a single index on a single machine, and are thus lower-level than Searchers, which may represent a document collection spread out over a search cluster (comprising several indexes and many segments). The Compiler object can use new information supplied by the SegReader – such as whether a term is missing from the local index even though it is present within the larger collection represented by the Searcher – when figuring out what to feed to the Matchers’s constructor, or whether make_matcher() should return a Matcher at all.

CONSTRUCTORS

new

my $compiler = MyCompiler->SUPER::new(
    parent     => $my_query,
    searcher   => $searcher,
    similarity => $sim,        # default: undef
    boost      => undef,       # default: see below
);

Abstract constructor.

  • parent - The parent Query.

  • searcher - A Lucy::Search::Searcher, such as an IndexSearcher.

  • similarity - A Similarity.

  • boost - An arbitrary scoring multiplier. Defaults to the boost of the parent Query.

ABSTRACT METHODS

make_matcher

$compiler->make_matcher(
    reader     => $seg_reader,  # required
    need_score => $bool,        # required
);

Factory method returning a Matcher.

  • reader - A SegReader.

  • need_score - Indicate whether the Matcher must implement score().

Returns: a Matcher, or undef if the Matcher would have matched no documents.

METHODS

get_weight

$compiler->get_weight();

Return the Compiler’s numerical weight, a scoring multiplier. By default, returns the object’s boost.

get_similarity

$compiler->get_similarity();

Accessor for the Compiler’s Similarity object.

get_parent

$compiler->get_parent();

Accessor for the Compiler’s parent Query object.

sum_of_squared_weights

$compiler->sum_of_squared_weights();

Compute and return a raw weighting factor. (This quantity is used by normalize()). By default, simply returns 1.0.

apply_norm_factor

$compiler->apply_norm_factor($factor);

Apply a floating point normalization multiplier. For a TermCompiler, this involves multiplying its own weight by the supplied factor; combining classes such as ORCompiler would apply the factor recursively to their children.

The default implementation is a no-op; subclasses may wish to multiply their internal weight by the supplied factor.

  • factor - The multiplier.

normalize

$compiler->normalize();

Take a newly minted Compiler object and apply query-specific normalization factors. Should be invoked by Query subclasses during make_compiler() for top-level nodes.

For a TermQuery, the scoring formula is approximately:

(tf_d * idf_t / norm_d) * (tf_q * idf_t / norm_q)

normalize() is theoretically concerned with applying the second half of that formula to a the Compiler’s weight. What actually happens depends on how the Compiler and Similarity methods called internally are implemented.

INHERITANCE

Lucy::Search::Compiler isa Lucy::Search::Query isa Clownfish::Obj.