NAME
Text::Statistics::Latin - performs corpora statistical analyses
SYNOPSIS
use CText::Statistics::Latin;
&Text::Statistics::Latin:LATIN();
DESCRIPTION
Text::Statistics::Latin creates a seven column CSV file output with one line each token per text given as input a corpus that files names follows ' 1 (1). txt', '1 (2). txt', ..., '1 (n).txt' or 1 \(([1-9]|[1-9][0-9]+)\)\.txt Columns stores statistical information: (1) number of word forms in document d; (2) number of tokens in d; (3) Id number of d, ie., n; (4) frequency of term t in d; (5) corpus frequency of t ; (6) document frequency of t (number of documents where t occurs at least once); (7) t, UTF8 latin coded token-string
Main output file name is '1 (n + 5).txt' and it is stored in the same directory as the corpus itself, toghether with residual files on each input file with .txu and .txv extensions.
This code was written under CAPES BEX-09323-5
Methods
Example:
#!/usr/bin/perl use strict; use Text::Statistics::Latin;
&Text::Statistics::Latin::LATIN("5"); #4 files (5 - 1) are analysed.
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 53:
=over is the last thing in the document?!