NAME
Lingua::FreeLing2::Splitter - Interface to FreeLing2 Splitter
SYNOPSIS
use Lingua::FreeLing2::Splitter;
use Lingua::FreeLing2::Tokenizer;
my $pt_tok = Lingua::FreeLing2::Tokenizer->new("pt");
my $pt_split = Lingua::FreeLing2::Splitter->new("pt");
# compute list of Lingua::FreeLing2::Words
my $list_of_words = $pt_tok->tokenize( $text );
my $list_of_sentences = $pt_split->split($list_of_words);
DESCRIPTION
Interface to the FreeLing2 splitter library.
new
Object constructor. One argument is required: the languge code (Lingua::FreeLing2
will search for the splitter data file) or the full or relative path to the splitter data file.
Returns the splitter object for that language, or undef in case of failure.
split
This is the only available method for the splitter object. It receives a list of Lingua::FreeLing2::Word objects (you can obtain one using the Lingua::FreeLing2::Tokenizer), and splits the text to a list of sentences.
Without any further configuration option, it will return a reference to a list of Lingua::FreeLing2::Sentence. The option to_text
can be set, and it will return a reference to a list of strings, where the words/tokens will be separated by a simple space.
$list_of_sentences = $pt_split->split($list_of_words, to_text => 1 )
The buffered
option can also be set to the value 0
if the function should not buffer tokens while processing. The default is to buffer.
$list_of_sentences = $pt_split->split($list_of_words, buffered => 0 )
NOTE: Before exiting, your application you should run the split method without the buffered feature, so that all the text is really processed!
SEE ALSO
Lingua::FreeLing2(3) for the documentation table of contents. The freeling library for extra information, or perl(1) itself.
AUTHOR
Alberto Manuel Brandão Simões, <ambs@cpan.org>
Jorge Cunha Mendes <jorgecunhamendes@gmail.com>
COPYRIGHT AND LICENSE
Copyright (C) 2011 by Projecto Natura