NAME
AI::Categorizer::Hypothesis - Embodies a set of category assignments
SYNOPSIS
use AI::Categorizer::Hypothesis;
# Hypotheses are usually created by the Learner's categorize() method.
# (assume here that $learner and $document have been created elsewhere)
my $h = $learner->categorize($document);
print "Assigned categories: ", join ', ', $h->categories, "\n";
print "Best category: ", $h->best_category, "\n";
print "Assigned scores: ", join ', ', $h->scores( $h->categories ), "\n";
print "Chosen from: ", join ', ', $h->all_categories, "\n";
print +($h->in_category('geometry') ? '' : 'not '), "assigned to geometry\n";
DESCRIPTION
A Hypothesis embodies a set of category assignments that a categorizer makes about a single document. Because one may be interested in knowing different kinds of things about the assignments (for instance, what categories were assigned, which category had the highest score, whether a particular category was assigned), we provide a simple class to help facilitate these scenarios.
METHODS
- new(%parameters)
-
Returns a new Hypothesis object. Generally a user of
AI::Categorize
doesn't create a Hypothesis object directly - they are returned by the Learner'scategorize()
method. However, if you wish to create a Hypothesis directly (maybe passing it some fake data for testing purposes) you may do so using thenew()
method.The following parameters are accepted when creating a new Hypothesis:
- all_categories
-
A required parameter which gives the set of all categories that could possibly be assigned to. The categories should be specified as a reference to an array of category names (as strings).
- scores
-
A hash reference indicating the assignment score for each category. Any score higher than the
threshold
will be considered to be assigned. - threshold
-
A number controlling which categories should be assigned - any category whose score is greater than or equal to
threshold
will be assigned, any category whose score is lower thanthreshold
will not be assigned. - document_name
-
An optional string parameter indicating the name of the document about which this hypothesis was made.
- categories()
-
Returns an ordered list of the categories the document was placed in, with best matches first. Categories are returned by their string names.
- best_category()
-
Returns the name of the category with the highest score in this hypothesis.
- in_category($name)
-
Returns true or false depending on whether the document was placed in the given category.
- scores(@names)
-
Returns a list of result scores for the given categories. Since the interface is still changing, not very much can officially be said about the scores, except that a good score is higher than a bad score. Individual Learners will have their own procedures for determining scores, so you cannot compare one Learner's score with another Learner's. You often cannot compare scores from a single Learner on two different categorization tasks either (for instance, if the Learner always normalizes the top score to 1 or something).
- all_categories()
-
Returns the list of category names specified with the
all_categories
constructor parameter. - document_name()
-
Returns the value of the
document_name
parameter specified as a constructor parameter, orundef
if none was specified.
AUTHOR
Ken Williams <kenw@ee.usyd.edu.au>
COPYRIGHT
This distribution is free software; you can redistribute it and/or modify it under the same terms as Perl itself. These terms apply to every file in the distribution - if you have questions, please contact the author.