Changes for version 0.07 - 2002-02-15

  • Major improvements to NaiveBayes - it now gives results on par with (slightly better than, actually) the Naive Bayes results in Yang's "Re-Examination" paper.
  • Corrected a floating-point underflow problem in NaiveBayes that occurred with long documents
  • When categorizing, NaiveBayes now correctly skips words that weren't in any training documents.
  • Added the threshold() accessor method to NaiveBayes
  • Fixed a crash that occurred when no stopwords were specified
  • Improved the formatting of the output for AI::Categorize::Evaluate
  • reuters-21578, features_kept => 0.1
    • Summary *************************************
    • Name miR miP miF1 error time *
    • 01-NaiveBayes: 0.824 0.883 0.839 0.005 407 sec *
  • drmath-1.00, features_kept => 0.1
    • Summary *************************************
    • Name miR miP miF1 error time *
    • 01-NaiveBayes: 0.323 0.397 0.341 0.016 145 sec *
    • 01-kNN: 0.636 0.144 0.220 0.078 990 sec * <- features_kept => 0.2
    • 01-kNN: 0.606 0.149 0.223 0.073 1221 sec * <- features_kept => 0.1

Modules

Automatically categorize documents based on content
Automate and compare AI::Categorize modules
Naive Bayes Algorithm For AI::Categorize
Base class for other algorithms
k-Nearest-Neighbor Algorithm For AI::Categorize

Provides

in Categorize.pm
in Categorize.pm

Examples