NAME

Text::TFIDF::Ngram - Compute the TF-IDF measure for ngram phrases

VERSION

version 0.01

SYNOPSIS

use Text::TFIDF::Ngram;
my @files = [qw( foo.txt bar.txt )];
my $size  = 3;
my $obj   = Text::TFIDF::Ngram->new( files => \@files, size => $size );
my $tfidf = $obj->tfidf_by_file;
print Dumper $tfidf;

DESCRIPTION

The TF-IDF ("term frequency-inverse document frequency") measure is used in information retrieval and text mining. It is a statistical measure used to see how important a word is in a document or collection of documents.

NAME

Text::TFIDF::Ngram - Compute the TF-IDF measure for ngram phrases

ATTRIBUTES

files

ArrayRef of filenames.

size

Integer ngram phrase size.

counts

HashRef of the ngram counts of each processed file.

METHODS

new

$obj = Text::TFIDF::Ngram->new( files => \@files, size => $size );

Create a new Text::TFIDF::Ngram object. If the files argument is passed in, populates the object using those files.

The size is the number of words in an ngram phrase and defaults to 2.

BUILD

Load the given file phrase counts.

tf

$tf = $obj->tf( $file, $phrase );

Returns the frequency of the given phrase in the document file. This is not the "raw count" of the phrase, but rather the percentage of times it is seen.

idf

$idf = $obj->idf($phrase);

Returns the inverse document frequency of a phrase.

tfidf

$tfidf = $obj->tfidf( $file, $phrase );

Computes the TF-IDF weight for the given document and word. If the file is not in the corpus used to populate the module, undef is returned.

tfidf_by_file()

$tfidf = $obj->tfidf_by_file;

Construct a HashRef of all files with all terms and their tfidf values.

AUTHOR

Gene Boggs <gene@cpan.org>

COPYRIGHT AND LICENSE

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.

To install Text::TFIDF::Ngram, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Text::TFIDF::Ngram

CPAN shell

perl -MCPAN -e shell
install Text::TFIDF::Ngram

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)

NAME

VERSION

SYNOPSIS

DESCRIPTION

NAME

ATTRIBUTES

files

size

counts

METHODS

new

BUILD

tf

idf

tfidf

tfidf_by_file()

SEE ALSO

AUTHOR

COPYRIGHT AND LICENSE

Module Install Instructions