NAME

Lingua::PT::PLN - Perl extension for simple natural language processing of the Portuguese language

SYNOPSIS

use Lingua::PT::PLN;

# occurrence counter
%o = oco("file");
oco({num=>1,output=>"outfile"},"file");

printPN(@options);
printPNstring({ %options... } ,$textstrint);
printPNstring([ @options... ] ,$textstrint);

forPN( sub{my ($pn, $contex)=@_;... } ) ;
forPN( {t=>"double"}, sub{my ($pn, $contex)=@_;... }, sub{...} ) ;

forPNstring(sub{my ($pn, $contex)=@_;... } ,$textstring, regsep) ;

$st = syllable($phrase);
$s = accent($phrase);
$s = wordaccent($word);

$s = xmlsentences($textstring);
$s = xmlsentences({st=>"frase"},$textstring);
@s = sentences($textstring);


perl -MLingua::PT::PLN -e 'cqptokens("file")' > out

DESCRIPTION

This is a module for Natural Language Processing of the Portuguese.

Occurrence counting: `oco`

Counts word occurrence from a string or a set of files. Returns an hash with the information or creates a sorted file with the results.

This function takes optionally as first argument an hash of options where you can specify:

num => 1: means the output should be sorted by ocurrence number;
alpha => 1: mean the output should be sorted lexicographically
output => "f": means the output will be written to the file "f";
from => "string": means that next argument (after the option hash) is a string which should be used as input for the function.
from => "file": means that remaining arguments to the function are filenames which should be used as input for the function. This is the default option.

Examples:

oco({num=>1,output=>"f"}, "f1","f2")
# sort by occurrence
# store output on file "f"
# process files "f1" and "f2"

oco({alpha=>1,output=>"f"}, "f1","f2")
# sort lexicographically
# store output on file "f"
# process files "f1" and "f2"

%oc = oco("f1","f2")
# return a hash with the occurrences
# use "f1" and "f2" as input files

%oc = oco( {from=>"string"},"text in a string")
# use a string as input
# return a hash with the occurrences

`forPN( $funref )`

Substitutes all propername by funref(propername) in STDIN and sends output to STDOUT

Opcionally you can pass {t = "full"}> as first parameter to obtain names after "."

forPN({in=> inputfile(sdtin), out => file(stdout)}, sub{...})
forPN({sep=>"\n", t=>"normal"}, sub{...})
forPN({sep=>'', t=>"double"}, sub{...}, sub{...})

`forPNstring( $funref, "textstring" [, regSeparator] )`

Substitutes all propername by funref(propername) in the text string.

`printPNstring(options)`

printPN("oco")

printPNstring("oco")

`syllable( $phrase )`

Returns the phrase with the syllables separated by "|"

`accent( $phrase )`

Returns the phrase with the syllables separated by "|" and accents marked with the charater ".

`cqptokens()`

cpqtokens - encodes a text from STDIN for CQP (one token per line)

`sentences()`

sentences - ....

`xmlsentences()`

xmlsentences - ....

By default, sentences are marked with "s". To change this use st optional parameter. Example:

xmlsentences({st=> "tag"}, text)

to mark sentences with tag "tag".

AUTHOR

José João Almeida (jj@di.uminho.pt)

Alberto Simões (albie@alfarrabio.di.uminho.pt)

Paulo Rocha (paulo.rocha@di.uminho.pt)

thanks to

Diana Santos

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)

NAME

SYNOPSIS

DESCRIPTION

Occurrence counting: oco

forPN( $funref )

forPNstring( $funref, "textstring" [, regSeparator] )

printPNstring(options)

syllable( $phrase )

accent( $phrase )

cqptokens()

sentences()

xmlsentences()