Changes for version 0.12 - 2004-02-24
- tag() now tags with reserved category "UNKNOWN" if no category meets the probability threshold. This guarantees that all messages passed to tag() will receive a header
- parse() augmented with a separate tokenize() function parse() now tags 'to', 'from', 'subject', and 'mailer' tokens with context
- tokenize() extracts href/src/mailto host/addresses, strips all html tags, decodes html entities, and does much smarter handling of punctuation for words with punctuation embedded within. Also strips ">>>" forwarding symbols.
- prediction now uses Robinson-Fisher inverse chi squared to combine individual word predictors
Modules
Perl extension for probabilistic mail classification
spam classification based on Paul Graham's algorithm
a trivial subclass example
Examples
- examples/corpora/README
- examples/corpora/sa-nonspam.mbox
- examples/corpora/sa-spam.mbox
- examples/graham-test.pl
- examples/spamometer/README
- examples/spamometer/new-spamometer.pl
- examples/spamometer/procmailrc
- examples/spamometer/tag-message.pl
- examples/spamometer/train-spamometer.pl
- examples/trivial-test.pl