AI-Categorize-0.06

Changes for version 0.06

Fixed a bug which resulted in incorrect probabilities in NaiveBayes categorize() calculations.
Threshold for Naive Bayes categorizer is now a settable parameter, letting you tune performance to balance precision and recall to suit your needs. Default threshold is 0.3 (used to be fixed at 0.5).
Added the precision() and recall() methods, which are another set of measures of how good a categorizer is.
Wrote documentation for the VectorBased superclass - it was previously vestigial docs from the kNN module (oops).
No changes made to the kNN categorizer - however, the precision and recall scores below show that clearly some changes are needed. The main problem is the setting of thresholds, and I've done some work in this area that's already improved scores, but it's not ready yet.
Current scores on the drmath-1.00 corpus with features_kept => 0.1:
- Summary *************************************
- Name miR miP miF1 error time *
- 01-NaiveBayes: 0.226 0.280 0.239 0.018 79 sec * <- threshold=0.3
- 01-NaiveBayes: 0.161 0.213 0.176 0.017 93 sec * <- threshold=0.5
- 02-kNN: 0.650 0.109 0.178 0.105 2069 sec *
- miR = micro-avg. recall miP = micro-avg. precision *
- miF = micro-avg. F1 error = micro-avg. error rate *

Automatically categorize documents based on content

Automate and compare AI::Categorize modules

Naive Bayes Algorithm For AI::Categorize

Base class for other algorithms

k-Nearest-Neighbor Algorithm For AI::Categorize

in Categorize.pm

in Categorize.pm

To install AI::Categorize, copy and paste the appropriate command in to your terminal.

cpanm AI::Categorize

perl -MCPAN -e shell
install AI::Categorize

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)