NAME
Mail::Classifier::Trivial - a trivial subclass example
SYNOPSIS
use Mail::Classifier::Trivial;
$bb = Mail::Classifier::Trivial->new();
$bb->train( { SPAM => 'spam.mbox', NOTSPAM => 'notspam.mbox'} );
%xval = $bb->crossval(2, .8, {SPAM => 'spam.mbox', NOTSPAM => 'notspam.mbox'} );
ABSTRACT
Mail::Classifier::Trivial is a trivial subclass implementation.
DESCRIPTION
This class demonstrates an example of subclassing Mail::Classifier to actually classify mail. It provides crude random categorization based on training set category frequencies.
METHODS THAT ARE EXTENDED IN THIS SUBCLASS
* new
* init
* forget
* isvalid
* parse
* learn
* score
- new [OPTIONS|FILENAME|CLASSIFIER]
-
Create a new classifier object, setting any class options by passing a hash-reference to key/value pairs. Alternatively, can be called with a filename from a previous saved classifier, or another classifier object, in which case the classifier will be cloned, duplicating all data and datafiles.
$bb = Mail::Classifier->new(); $bb = Mail::Classifier->new( { OPTION1 => 'foo', OPTION2 => 'bar' } ); $bb = Mail::Classifier->new( "/tmp/saved-classifier" ); $cc = Mail::Classifier->new( $bb );
This subclass method has no additional options and only adds a data table to use for frequency counting. Though it doesn't really need to, this subclass uses an MLDBM::Sync file.
- init
-
Called during new to initialize the class with data tables.
$self->init( {%options} );
- forget
-
Blanks out the frequency data.
$bb->forget;
- isvalid MESSAGE
-
Confirm that a message can be handled -- e.g. text vs attachment, etc. MESSAGE is a Mail::Message object. In this subclass version, all messages are still valid.
$bb->isvalid($msg);
- parse MESSAGE
-
Breaks up a message into tokens -- this is just a stub for where/how class extensions should place parsing. In this subclass, no parsing takes place and the function is still a stub.
$bb->parse($msg);
- learn CATEGORY, MESSAGE
- unlearn CATEGORY, MESSAGE
-
learn processes a message as an example of a category according to some algorithm. MESSAGE is a Mail::Message.
unlearn reverses the process, for example to "unlearn" a message that has been falsely classified.
In this subclass, these functions only updates a frequency count of messages by category.
$bb->learn('SPAM', $msg); $bb->unlearn('SPAM', $msg);
- score MESSAGE
-
Takes a message and returns a list of categories and probabilities in decending order. MESSAGE is a Mail::Message
In this subclasses returns a single category randomly.
($best-cat, $best-cat-prob, @rest) = $bb->score($msg); %probs = $bb->score($msg);
PREREQUISITES
MLDBM
MLDBM::Sync
Mail::Box::Manager
Mail::Address
BUGS
There are always bugs...
SEE ALSO
Mail::Classifier
AUTHOR
David Golden, <david@hyperbolic.net>
COPYRIGHT AND LICENSE
Copyright 2002 by David Golden
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.