NAME

Lingua::FreeLing2::Tokenizer - Interface to FreeLing2 Tokenizer

SYNOPSIS

use Lingua::FreeLing2::Tokenizer;

my $pt_tok = Lingua::FreeLing2::Tokenizer->new("pt");

# compute list of Lingua::FreeLing2::Word
my $list_of_words = $pt_tok->tokenize("texto e mais texto.");

# compute list of strings (words)
my $list_of_words = $pt_tok->tokenize("texto e mais texto.",
                                      to_text => 1);

DESCRIPTION

Interface to the FreeLing2 tokenizer library.

new

Object constructor. One argument is required: the languge code (Lingua::FreeLing2 will search for the tokenization data file) or the full or relative path to the tokenization data file.

Returns the tokenizer object for that language, or undef in case of failure.

tokenize

This is the only available method for the tokenizer object. It receives a string and tokenizes the text, returning a reference to a list of words.

Without any further configuration option, it will return a reference to a list of Lingua::FreeLing2::Word. The option to_text can be set, and it will return a reference to a list of strings.

SEE ALSO

Lingua::FreeLing2(3) for the documentation table of contents. The freeling library for extra information, or perl(1) itself.

AUTHOR

Alberto Manuel Brandão Simões, <ambs@cpan.org>

Jorge Cunha Mendes <jorgecunhamendes@gmail.com>

COPYRIGHT AND LICENSE

Copyright (C) 2011 by Projecto Natura