NAME
Lingua::JA::TFIDF - TF/IDF calculator based on MeCab.
SYNOPSIS
use Lingua::JA::TFIDF;
use Data::Dumper;
my $calc = Lingua::JA::TFIDF->new(%config);
# calculate TF/IDF and return a result object.
my $result = $calc->tfidf($text);
print Dumper $result->list;
# dump the result object.
print Dumper $result->dump
# or calculate just TF
print Dumper $calc->tf($text)->list;
DESCRIPTION
* This software is still in alpha release *
Lingua::JA::TFIDF is TF/IDF calculator based on MeCab. It has DF(Document Frequency) data set that was fetched from Yahoo Search API, beforehand.
METHODS
new(%config)
Instantiates a new Lingua::JA::TFIDF object. Takes the following parameters (optional).
my $calc = Lingua::JA::TFIDF->new(
df_file => 'my_df_file', # default is undef
ng_word => \@original_ngword, # default is undef
fetch_df => 1, # default is undef
fetch_df_save => 'my_df_file', # default is undef
LWP_UserAgent => \%lwp_useragent_config, # default is undef
XML_TreePP => \%xml_treepp_config, # default is undef
yahoo_api_appid => $myid, # default is undef
);
tfidf($text);
Calculates TF/IDF score. If the text includes unknown words, Document Frequency score of unknown words are replaced the average score of known words. If you set TRUE value to fetch_df parameter on constructor, the calculator fetches the unknown word from Yahoo Search API.
tf($text);
Calculates TF score.
ng_word
Accessor method. You can replace NG word.
mecab
Inner accessor method.
df_data
Inner accessor method.
fetcher
Inner accessor method.
AUTHOR
Takeshi Miki <miki@cpan.org>
LICENSE
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.