NAME
KinoSearch::Analysis::Analyzer - Base class for analyzers.
SYNOPSIS
# abstract base class -- must be subclassed
package MyAnalyzer;
sub analyze {
my ( $self, $token_batch ) = @_;
while ( my $token = $token_batch->next ) {
my $new_text = transform( $token->get_text );
$token->set_text($new_text);
}
return $token_batch;
}
sub transform {
# ...
}
DESCRIPTION
In KinoSearch, an Analyzer is a filter which processes text, transforming it from one form into another. For instance, an analyzer might break up a long text into smaller pieces (Tokenizer), or it might convert text to lowercase (LCNormalizer).
SUBCLASSING
All Analyzer subclasses must provide an analyze
method.
analyze
analyze()
takes a single TokenBatch as input, and it returns a TokenBatch, either the same one (presumably transformed in some way), or a new one.
COPYRIGHT
Copyright 2005-2007 Marvin Humphrey
LICENSE, DISCLAIMER, BUGS, etc.
See KinoSearch version 0.20_01.