NAME
Text::TFIDF::Ngram - Compute the TF-IDF measure for ngram phrases
VERSION
version 0.02
SYNOPSIS
use Text::TFIDF::Ngram;
my @files = [qw( foo.txt bar.txt )];
my $obj = Text::TFIDF::Ngram->new(
files => \@files,
size => 3,
stopwords => 1,
);
my $x = $obj->tf( 'foo.txt', 'foo' );
$x = $obj->idf('foo');
$x = $obj->tfidf( 'foo.txt', 'foo' );
$x = $obj->tfidf_by_file;
print Dumper $x;
DESCRIPTION
The TF-IDF ("term frequency-inverse document frequency") measure is used in information retrieval and text mining. It is a statistical measure used to see how important a word is in a document or collection of documents.
NAME
Text::TFIDF::Ngram - Compute the TF-IDF measure for ngram phrases
ATTRIBUTES
files
ArrayRef of filenames.
size
Integer ngram phrase size.
stopwords
Boolean indicating that phrases with stopwords will be ignored. Default is 1.
counts
HashRef of the ngram counts of each processed file.
METHODS
new
$obj = Text::TFIDF::Ngram->new(
files => \@files,
size => $size,
stopwords => $stopwords,
);
Create a new Text::TFIDF::Ngram
object. If the files argument is passed in, populates the object using those files.
The size is the number of words in an ngram phrase and defaults to 2.
BUILD
Load the given file phrase counts.
tf
$tf = $obj->tf( $file, $phrase );
Returns the frequency of the given phrase in the document file. This is not the "raw count" of the phrase, but rather the percentage of times it is seen.
idf
$idf = $obj->idf($phrase);
Returns the inverse document frequency of a phrase.
tfidf
$tfidf = $obj->tfidf( $file, $phrase );
Computes the TF-IDF weight for the given document and word. If the file is not in the corpus used to populate the module, undef is returned.
tfidf_by_file()
$tfidf = $obj->tfidf_by_file;
Construct a HashRef of all files with all terms and their tfidf values.
SEE ALSO
https://en.wikipedia.org/wiki/Tf%E2%80%93idf
AUTHOR
Gene Boggs <gene@cpan.org>
COPYRIGHT AND LICENSE
This software is copyright (c) 2018 by Gene Boggs.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.