NAME

Mail::Classifier::Trivial - a trivial subclass example

SYNOPSIS

use Mail::Classifier::Trivial;
$bb = Mail::Classifier::Trivial->new();
$bb->train( { SPAM => 'spam.mbox', NOTSPAM => 'notspam.mbox'} );
%xval = $bb->crossval(2, .8, {SPAM => 'spam.mbox', NOTSPAM => 'notspam.mbox'} );

ABSTRACT

Mail::Classifier::Trivial is a trivial subclass implementation.

DESCRIPTION

This class demonstrates an example of subclassing Mail::Classifier to actually classify mail. It provides crude random categorization based on training set category frequencies.

METHODS THAT ARE EXTENDED IN THIS SUBCLASS

* new 
* init
* forget
* isvalid
* parse
* learn
* score
new [OPTIONS|FILENAME|CLASSIFIER]

Create a new classifier object, setting any class options by passing a hash-reference to key/value pairs. Alternatively, can be called with a filename from a previous saved classifier, or another classifier object, in which case the classifier will be cloned, duplicating all data and datafiles.

$bb = Mail::Classifier->new();
$bb = Mail::Classifier->new( { OPTION1 => 'foo', OPTION2 => 'bar' } );
$bb = Mail::Classifier->new( "/tmp/saved-classifier" );
$cc = Mail::Classifier->new( $bb );

This subclass method has no additional options and only adds a data table to use for frequency counting. Though it doesn't really need to, this subclass uses an MLDBM::Sync file.

init

Called during new to initialize the class with data tables.

$self->init( {%options} );
forget

Blanks out the frequency data.

$bb->forget;
isvalid MESSAGE

Confirm that a message can be handled -- e.g. text vs attachment, etc. MESSAGE is a Mail::Message object. In this subclass version, all messages are still valid.

$bb->isvalid($msg);
parse MESSAGE

Breaks up a message into tokens -- this is just a stub for where/how class extensions should place parsing. In this subclass, no parsing takes place and the function is still a stub.

$bb->parse($msg);
learn CATEGORY, MESSAGE
unlearn CATEGORY, MESSAGE

learn processes a message as an example of a category according to some algorithm. MESSAGE is a Mail::Message.

unlearn reverses the process, for example to "unlearn" a message that has been falsely classified.

In this subclass, these functions only updates a frequency count of messages by category.

$bb->learn('SPAM', $msg);
$bb->unlearn('SPAM', $msg);
score MESSAGE

Takes a message and returns a list of categories and probabilities in decending order. MESSAGE is a Mail::Message

In this subclasses returns a single category randomly.

($best-cat, $best-cat-prob, @rest) = $bb->score($msg);
%probs = $bb->score($msg);

PREREQUISITES

MLDBM
MLDBM::Sync
Mail::Box::Manager
Mail::Address

BUGS

There are always bugs...

SEE ALSO

Mail::Classifier

AUTHOR

David Golden, <david@hyperbolic.net>

COPYRIGHT AND LICENSE

Copyright 2002 by David Golden

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.