NAME

KinoSearch::Analysis::Analyzer - Base class for analyzers.

SYNOPSIS

# abstract base class -- must be subclassed

package MyAnalyzer;

sub analyze_batch {
    my ( $self, $token_batch ) = @_;

    while ( my $token = $token_batch->next ) {
        my $new_text = transform( $token->get_text );
        $token->set_text($new_text);
    }

    return $token_batch;
}

sub transform {
    # ...
}

DESCRIPTION

In KinoSearch, an Analyzer is a filter which processes text, transforming it from one form into another. For instance, an analyzer might break up a long text into smaller pieces (Tokenizer), or it might convert text to lowercase (LCNormalizer).

SUBCLASSING

All Analyzer subclasses must provide an analyze_batch method.

analyze_batch

$token_batch = $analyzer->analyze_batch($token_batch);

Abstract method. analyze_batch() takes a single TokenBatch as input, and it returns a TokenBatch, either the same one (presumably transformed in some way), or a new one.

COPYRIGHT

Copyright 2005-2007 Marvin Humphrey

LICENSE, DISCLAIMER, BUGS, etc.

See KinoSearch version 0.20.