NAME

Lingua::FreeLing2::Splitter - Interface to FreeLing2 Splitter

SYNOPSIS

use Lingua::FreeLing2::Splitter;
use Lingua::FreeLing2::Tokenizer;

my $pt_tok = Lingua::FreeLing2::Tokenizer->new("pt");
my $pt_split = Lingua::FreeLing2::Splitter->new("pt");

# compute list of Lingua::FreeLing2::Words
my $list_of_words = $pt_tok->tokenize( $text );
my $list_of_sentences = $pt_split->split($list_of_words);

DESCRIPTION

Interface to the FreeLing2 splitter library.

new

Object constructor. One argument is required: the languge code (Lingua::FreeLing2 will search for the splitter data file) or the full or relative path to the splitter data file.

Returns the splitter object for that language, or undef in case of failure.

split

This is the only available method for the splitter object. It receives a list of Lingua::FreeLing2::Word objects (you can obtain one using the Lingua::FreeLing2::Tokenizer), and splits the text to a list of sentences.

Without any further configuration option, it will return a reference to a list of Lingua::FreeLing2::Sentence. The option to_text can be set, and it will return a reference to a list of strings, where the words/tokens will be separated by a simple space.

$list_of_sentences = $pt_split->split($list_of_words, to_text => 1 )

The buffered option can also be set to the value 0 if the function should not buffer tokens while processing. The default is to buffer.

$list_of_sentences = $pt_split->split($list_of_words, buffered => 0 )

NOTE: Before exiting, your application you should run the split method without the buffered feature, so that all the text is really processed!

SEE ALSO

Lingua::FreeLing2(3) for the documentation table of contents. The freeling library for extra information, or perl(1) itself.

AUTHOR

Alberto Manuel Brandão Simões, <ambs@cpan.org>

Jorge Cunha Mendes <jorgecunhamendes@gmail.com>

COPYRIGHT AND LICENSE

Copyright (C) 2011 by Projecto Natura