NAME

Lingua::JA::TFIDF - TF/IDF calculator based on MeCab.

SYNOPSIS

use Lingua::JA::TFIDF;
use Data::Dumper;

my $calc = Lingua::JA::TFIDF->new(%config);

# calculate TF/IDF and return a result object.
my $result = $calc->tfidf($text);
print Dumper $result->list;

# dump the result object.
print Dumper $result->dump

# or calculate just TF 
print Dumper $calc->tf($text)->list;

DESCRIPTION

* This software is still in alpha release *

Lingua::JA::TFIDF is TF/IDF calculator based on MeCab. It has DF(Document Frequency) data set that was fetched from Yahoo Search API, beforehand.

METHODS

new(%config)

Instantiates a new Lingua::JA::TFIDF object. Takes the following parameters (optional).

my $calc = Lingua::JA::TFIDF->new(
  df_file         => 'my_df_file',           # default is undef
  ng_word         => \@original_ngword,      # default is undef
  fetch_df        => 1,                      # default is undef
  fetch_df_save   => 'my_df_file',           # default is undef
  LWP_UserAgent   => \%lwp_useragent_config, # default is undef
  XML_TreePP      => \%xml_treepp_config,    # default is undef
  yahoo_api_appid => $myid,                  # default is undef
);

tfidf($text);

Calculates TF/IDF score. If the text includes unknown words, Document Frequency score of unknown words are replaced the average score of known words. If you set TRUE value to fetch_df parameter on constructor, the calculator fetches the unknown word from Yahoo Search API.

tf($text);

Calculates TF score.

ng_word

Accessor method. You can replace NG word.

mecab

Inner accessor method.

df_data

Inner accessor method.

fetcher

Inner accessor method.

AUTHOR

Takeshi Miki <miki@cpan.org>

LICENSE

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

SEE ALSO