Lingua::PT::PLN - Perl extension for simple natural language processing of the Portuguese language
use Lingua::PT::PLN;
# occurrence counter
%o = oco("file");
printPNstring({ %options... } ,$textstrint);
printPNstring([ @options... ] ,$textstrint);
forPN( sub{my ($pn, $contex)=@_;... } ) ;
forPN( {t=>"double"}, sub{my ($pn, $contex)=@_;... }, sub{...} ) ;
forPNstring(sub{my ($pn, $contex)=@_;... } ,$textstring, regsep) ;
$st = syllable($phrase);
$s = accent($phrase);
$s = wordaccent($word);
$s = xmlsentences($textstring);
$s = xmlsentences({st=>"frase"},$textstring);
@s = sentences($textstring);
perl -MLingua::PT::PLN -e 'cqptokens("file")' > out
This is a module for Natural Language Processing of the Portuguese.
Occurrence counting: oco
Counts word occurrence from a string or a set of files. Returns an hash with the information or creates a sorted file with the results.
This function takes optionally as first argument an hash of options where you can specify:
- num => 1
means the output should be sorted by ocurrence number;
- alpha => 1
mean the output should be sorted lexicographically
- output => "f"
means the output will be written to the file "f";
- from => "string"
means that next argument (after the option hash) is a string which should be used as input for the function.
- from => "file"
means that remaining arguments to the function are filenames which should be used as input for the function. This is the default option.
oco({num=>1,output=>"f"}, "f1","f2")
# sort by occurrence
# store output on file "f"
# process files "f1" and "f2"
oco({alpha=>1,output=>"f"}, "f1","f2")
# sort lexicographically
# store output on file "f"
# process files "f1" and "f2"
%oc = oco("f1","f2")
# return a hash with the occurrences
# use "f1" and "f2" as input files
%oc = oco( {from=>"string"},"text in a string")
# use a string as input
# return a hash with the occurrences
forPN( $funref )
Substitutes all propername
by funref(propername)
in STDIN and sends output to STDOUT
Opcionally you can pass {t =
"full"}> as first parameter to obtain names after "."
forPN({in=> inputfile(sdtin), out => file(stdout)}, sub{...})
forPN({sep=>"\n", t=>"normal"}, sub{...})
forPN({sep=>'', t=>"double"}, sub{...}, sub{...})
forPNstring( $funref, "textstring" [, regSeparator] )
Substitutes all propername
by funref(propername)
in the text string.
syllable( $phrase )
Returns the phrase with the syllables separated by "|"
accent( $phrase )
Returns the phrase with the syllables separated by "|" and accents marked with the charater ".
cpqtokens - encodes a text from STDIN for CQP (one token per line)
sentences - ....
xmlsentences - ....
By default, sentences are marked with "s". To change this use st
optional parameter. Example:
xmlsentences({st=> "tag"}, text)
to mark sentences with tag "tag".
José João Almeida (
Alberto Simões (
Paulo Rocha (
thanks to
Diana Santos
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 763:
Non-ASCII character seen before =encoding in 'José'. Assuming CP1252