NAME
Lucy::Analysis::Analyzer - Tokenize/modify/filter text.
SYNOPSIS
# Abstract base class.
DESCRIPTION
An Analyzer is a filter which processes text, transforming it from one form into another. For instance, an analyzer might break up a long text into smaller pieces (RegexTokenizer), or it might perform case folding to facilitate case-insensitive search (Normalizer).
CONSTRUCTORS
new
package
MyAnalyzer;
our
%foo
;
sub
new {
my
$self
=
shift
->SUPER::new;
my
%args
=
@_
;
$foo
{
$$self
} =
$args
{foo};
return
$self
;
}
Abstract constructor. Takes no arguments.
ABSTRACT METHODS
transform
my
$inversion
=
$analyzer
->transform(
$inversion
);
Take a single Inversion as input and returns an Inversion, either the same one (presumably transformed in some way), or a new one.
inversion - An inversion.
METHODS
transform_text
my
$inversion
=
$analyzer
->transform_text(
$text
);
Kick off an analysis chain, creating an Inversion from string input. The default implementation simply creates an initial Inversion with a single Token, then calls transform(), but occasionally subclasses will provide an optimized implementation which minimizes string copies.
text - A string.
split
my
$arrayref
=
$analyzer
->
split
(
$text
);
Analyze text and return an array of token texts.
text - A string.
dump
my
$obj
=
$analyzer
->
dump
();
Dump the analyzer as hash.
Subclasses should call dump() on the superclass. The returned object is a hash which should be populated with parameters of the analyzer.
Returns: A hash containing a description of the analyzer.
load
my
$obj
=
$analyzer
->load(
$dump
);
Reconstruct an analyzer from a dump.
Subclasses should first call load() on the superclass. The returned object is an analyzer which should be reconstructed by setting the dumped parameters from the hash contained in dump
.
Note that the invocant analyzer is unused.
dump - A hash.
Returns: An analyzer.
INHERITANCE
Lucy::Analysis::Analyzer isa Clownfish::Obj.