NAME

KinoSearch1::Analysis::Tokenizer - customizable tokenizing

SYNOPSIS

my $whitespace_tokenizer
    = KinoSearch1::Analysis::Tokenizer->new( token_re => qr/\S+/, );

# or...
my $word_char_tokenizer
    = KinoSearch1::Analysis::Tokenizer->new( token_re => qr/\w+/, );

# or...
my $apostrophising_tokenizer = KinoSearch1::Analysis::Tokenizer->new;

# then... once you have a tokenizer, put it into a PolyAnalyzer
my $polyanalyzer = KinoSearch1::Analysis::PolyAnalyzer->new(
    analyzers => [ $lc_normalizer, $word_char_tokenizer, $stemmer ], );

DESCRIPTION

Generically, "tokenizing" is a process of breaking up a string into an array of "tokens".

# before:
my $string = "three blind mice";

# after:
@tokens = qw( three blind mice );

KinoSearch1::Analysis::Tokenizer decides where it should break up the text based on the value of token_re.

# before:
my $string = "Eats, Shoots and Leaves.";

# tokenized by $whitespace_tokenizer
@tokens = qw( Eats, Shoots and Leaves. );

# tokenized by $word_char_tokenizer
@tokens = qw( Eats Shoots and Leaves   );

METHODS

new

# match "O'Henry" as well as "Henry" and "it's" as well as "it"
my $token_re = qr/
        \b        # start with a word boundary
        \w+       # Match word chars.
        (?:       # Group, but don't capture...
           '\w+   # ... an apostrophe plus word chars.
        )?        # Matching the apostrophe group is optional.
        \b        # end with a word boundary
    /xsm;
my $tokenizer = KinoSearch1::Analysis::Tokenizer->new(
    token_re => $token_re, # default: what you see above
);

Constructor. Takes one hash style parameter.

token_re - must be a pre-compiled regular expression matching one token.

COPYRIGHT

LICENSE, DISCLAIMER, BUGS, etc.

See KinoSearch1 version 1.01.

To install KinoSearch1, copy and paste the appropriate command in to your terminal.

cpanm

cpanm KinoSearch1

CPAN shell

perl -MCPAN -e shell
install KinoSearch1

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	Go to GitHub issues (only if GitHub is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)