NAME

KinoSearch::Analysis::Tokenizer - customizable tokenizing

SYNOPSIS

my $whitespace_tokenizer
    = KinoSearch::Analysis::Tokenizer->new( token_re => qr/\S+/, );

# or...
my $word_char_tokenizer
    = KinoSearch::Analysis::Tokenizer->new( token_re => qr/\w+/, );

# or...
my $apostrophising_tokenizer = KinoSearch::Analysis::Tokenizer->new;

# then... once you have a tokenizer, put it into a PolyAnalyzer
my $polyanalyzer = KinoSearch::Analysis::PolyAnalyzer->new(
    analyzers => [ $lc_normalizer, $word_char_tokenizer, $stemmer ], );

DESCRIPTION

Generically, "tokenizing" is a process of breaking up a string into an array of "tokens".

# before:
my $string = "three blind mice";

# after:
@tokens = qw( three blind mice );

KinoSearch::Analysis::Tokenizer decides where it should break up the text based on the value of token_re.

# before:
my $string = "Eats, Shoots and Leaves.";

# tokenized by $whitespace_tokenizer
@tokens = qw( Eats, Shoots and Leaves. );

# tokenized by $word_char_tokenizer
@tokens = qw( Eats Shoots and Leaves   );

CONSTRUCTOR

new

my $tokenizer = KinoSearch::Analysis::Tokenizer->new(
    token_re => $matches_one_token, );

Construct a Tokenizer object.

token_re must be a pre-compiled regular expression matching one token. It must not use any capturing parentheses, though non-capturing parentheses are fine:

# match "O'Henry" as well as "Henry" and "it's" as well as "it"
my $token_re = qr/
        \b        # start with a word boundary
        \w+       # Match word chars.
        (?:       # Group, but don't capture...
           '\w+   # ... an apostrophe plus word chars.
        )?        # Matching the apostrophe group is optional.
        \b        # end with a word boundary
    /xsm;
my $apostrophizing_tokenizer
    = KinoSearch::Analysis::Tokenizer->new( token_re => $token_re, );

Incidentally, the above token_re is the default value.

COPYRIGHT

LICENSE, DISCLAIMER, BUGS, etc.

See KinoSearch version 0.07.

To install KinoSearch, copy and paste the appropriate command in to your terminal.

cpanm

cpanm KinoSearch

CPAN shell

perl -MCPAN -e shell
install KinoSearch

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)