NAME
KinoSearch::Analysis::PolyAnalyzer - Multiple Analyzers in series.
SYNOPSIS
my
$schema
= KinoSearch::Plan::Schema->new;
my
$polyanalyzer
= KinoSearch::Analysis::PolyAnalyzer->new(
language
=>
'en'
,
);
my
$type
= KinoSearch::Plan::FullTextType->new(
analyzer
=>
$polyanalyzer
,
);
$schema
->spec_field(
name
=>
'title'
,
type
=>
$type
);
$schema
->spec_field(
name
=>
'content'
,
type
=>
$type
);
DESCRIPTION
A PolyAnalyzer is a series of Analyzers, each of which will be called upon to "analyze" text in turn. You can either provide the Analyzers yourself, or you can specify a supported language, in which case a PolyAnalyzer consisting of a CaseFolder, a Tokenizer, and a Stemmer will be generated for you.
Supported languages:
en
=> English,
da
=> Danish,
de
=> German,
es
=> Spanish,
fi
=> Finnish,
fr
=> French,
hu
=> Hungarian,
it
=> Italian,
nl
=> Dutch,
no
=> Norwegian,
pt
=> Portuguese,
ro
=> Romanian,
ru
=> Russian,
sv
=> Swedish,
tr
=> Turkish,
CONSTRUCTORS
new( [labeled params] )
my
$analyzer
= KinoSearch::Analysis::PolyAnalyzer->new(
language
=>
'es'
,
);
# or...
my
$case_folder
= KinoSearch::Analysis::CaseFolder->new;
my
$tokenizer
= KinoSearch::Analysis::Tokenizer->new;
my
$stemmer
= KinoSearch::Analysis::Stemmer->new(
language
=>
'en'
);
my
$polyanalyzer
= KinoSearch::Analysis::PolyAnalyzer->new(
analyzers
=> [
$case_folder
,
$whitespace_tokenizer
,
$stemmer
, ], );
language - An ISO code from the list of supported languages.
analyzers - An array of Analyzers. The order of the analyzers matters. Don't put a Stemmer before a Tokenizer (can't stem whole documents or paragraphs -- just individual words), or a Stopalizer after a Stemmer (stemmed words, e.g. "themselv", will not appear in a stoplist). In general, the sequence should be: normalize, tokenize, stopalize, stem.
METHODS
get_analyzers()
Getter for "analyzers" member.
INHERITANCE
KinoSearch::Analysis::PolyAnalyzer isa KinoSearch::Analysis::Analyzer isa KinoSearch::Object::Obj.
COPYRIGHT AND LICENSE
Copyright 2005-2010 Marvin Humphrey
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.