NAME
Lingua::FreeLing2::Tokenizer - Interface to FreeLing2 Tokenizer
SYNOPSIS
use Lingua::FreeLing2::Tokenizer;
my $pt_tok = Lingua::FreeLing2::Tokenizer->new("pt");
# compute list of Lingua::FreeLing2::Word
my $list_of_words = $pt_tok->tokenize("texto e mais texto.");
# compute list of strings (words)
my $list_of_words = $pt_tok->tokenize("texto e mais texto.",
to_text => 1);
DESCRIPTION
Interface to the FreeLing2 tokenizer library.
new
Object constructor. One argument is required: the languge code (Lingua::FreeLing2
will search for the tokenization data file) or the full or relative path to the tokenization data file.
Returns the tokenizer object for that language, or undef in case of failure.
tokenize
This is the only available method for the tokenizer object. It receives a string and tokenizes the text, returning a reference to a list of words.
Without any further configuration option, it will return a reference to a list of Lingua::FreeLing2::Word. The option to_text
can be set, and it will return a reference to a list of strings.
SEE ALSO
Lingua::FreeLing2(3) for the documentation table of contents. The freeling library for extra information, or perl(1) itself.
AUTHOR
Alberto Manuel Brandão Simões, <ambs@cpan.org>
Jorge Cunha Mendes <jorgecunhamendes@gmail.com>
COPYRIGHT AND LICENSE
Copyright (C) 2011 by Projecto Natura