The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Lingua::JA::TFIDF - TFIDF Calculator based on MeCab.

SYNOPSIS

  use Lingua::JA::TFIDF;
  use Data::Dumper;

  my $calc   = Lingua::JA::TFIDF->new(%config);

  # calculate TFIDF and return a result object.
  my $result = $$calc->tfidf;
  print Dumper $result->list;

  # or calculate just TF 
  print Dumper $calc->tf->list;

  # dump the result object.
  print Dumper $result->dump

DESCRIPTION

* This software is still in alpha release *

Lingua::JA::TFIDF is TFIDF Calculator based on MeCab. It has DF(Document Frequency) data set that was fetched from Yahoo Search API, beforehand.

METHODS

new(%config)

Instantiates a new Lingua::JA::TFIDF object. Takes the following parameters (optional).

  my $calc = Lingua::JA::TFIDF->new(
    df_file         => 'my_df_file',           # default is undef
    ng_word         => \@original_ngword,      # default is undef
    fetch_df        => 1,                      # default is undef
    fetch_df_save   => 'my_df_file',           # default is undef
    LWP_UserAgent   => \%lwp_useragent_config, # default is undef
    XML_TreePP      => \%xml_treepp_config,    # default is undef
    yahoo_api_appid => $myid,                  # default is undef
  );

tfidf($text);

Calculates TFIDF score. If the text includes unknown words, Document Frequency score of unknown words are replaced the average score of known words. If you set TRUE value to fetch_df parameter on constructor, the calculator fetches the unknown word from Yahoo Search API.

tf($text);

Calculates TF score.

ng_word

Accessor method. You can replace ngword.

df_data

Inncer accessor method.

fetcher

Inncer accessor method.

AUTHOR

Takeshi Miki <miki@cpan.org>

LICENSE

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

SEE ALSO