NAME

AI::Classifier::Text - A convenient class for text classification

VERSION

version 0.03

SYNOPSIS

my $cl = AI::Classifier::Text->new(classifier => AI::NaiveBayes->new(...));
my $res = $cl->classify("do cats eat bats?");
$res    = $cl->classify("do cats eat bats?", { new_user => 1 });
$cl->store('some-file');
# later
my $cl = AI::Classifier::Text->load('some-file');
my $res = $cl->classify("do cats eat bats?");

DESCRIPTION

AI::Classifier::Text combines a lexical analyzer (by default being AI::Classifier::Text::Analyzer) and a classifier (like AI::NaiveBayes) to perform text classification.

This is partially based on AI::TextCategorizer.

ATTRIBUTES

classifier

An object that'll perform classification of supplied feature vectors. Has to define a classify() method, which accepts a hash refence. The return value of AI::Classifier::Text-classify()> will be the return value of classifier's classify() method.

This attribute has to be supplied to the new() method during object creation.

analyzer

The class performing lexical analysis of the text in order to produce a feature vector. This defaults to AI::Classifier::Text::Analyzer.

METHODS

new(classifier => $foo)

Creates a new AI::Classifier::Text object. The classifier argument is mandatory.

classify($document, $features)

Categorize the given document. A lexical analyzer will be used to extract features from $document, and in addition to that the features from $features hash reference will be added. The return value comes directly from the classifier object's classify method.

SEE ALSO

AI::NaiveBayes (3), AI::Categorizer(3)

AUTHOR

Zbigniew Lukasiak <zlukasiak@opera.com>, Tadeusz Sośnierz <tsosnierz@opera.com>

COPYRIGHT AND LICENSE

This software is copyright (c) 2012 by Opera Software ASA.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 114:

Non-ASCII character seen before =encoding in 'Sośnierz'. Assuming UTF-8