NAME
AI::Categorizer::Learner::Weka - Pass-through wrapper to Weka system
SYNOPSIS
# Here $k is an AI::Categorizer::KnowledgeSet object
my
$nb
= new AI::Categorizer::Learner::Weka(...parameters...);
$nb
->train(
knowledge_set
=>
$k
);
$nb
->save_state(
'filename'
);
...
time
passes ...
$nb
= AI::Categorizer::Learner->restore_state(
'filename'
);
my
$c
= new AI::Categorizer::Collection::Files(
path
=> ... );
while
(
my
$document
=
$c
->
next
) {
my
$hypothesis
=
$nb
->categorize(
$document
);
"Best assigned category: "
,
$hypothesis
->best_category,
"\n"
;
}
DESCRIPTION
This class doesn't implement any machine learners of its own, it merely passes the data through to the Weka machine learning system (http://www.cs.waikato.ac.nz/~ml/weka/). This can give you access to a collection of machine learning algorithms not otherwise implemented in AI::Categorizer
.
Currently this is a simple command-line wrapper that calls java
subprocesses. In the future this may be converted to an Inline::Java
wrapper for better performance (faster running times). However, if you're looking for really great performance, you're probably looking in the wrong place - this Weka wrapper is intended more as a way to try lots of different machine learning methods.
METHODS
This class inherits from the AI::Categorizer::Learner
class, so all of its methods are available unless explicitly mentioned here.
new()
Creates a new Weka Learner and returns it. In addition to the parameters accepted by the AI::Categorizer::Learner
class, the Weka subclass accepts the following parameters:
- java_path
-
Specifies where the
java
executable can be found on this system. The default is simplyjava
, meaning that it will search yourPATH
to find java. - java_args
-
Specifies a list of any additional arguments to give to the java process. Commonly it's necessary to allocate more memory than the default, using an argument like
-Xmx130MB
. - weka_path
-
Specifies the path to the
weka.jar
file containing the Weka bytecode. If Weka has been installed somewhere in your javaCLASSPATH
, you needn't specify aweka_path
. - weka_classifier
-
Specifies the Weka class to use for a categorizer. The default is
weka.classifiers.NaiveBayes
. Consult your Weka documentation for a list of other classifiers available. - weka_args
-
Specifies a list of any additional arguments to pass to the Weka classifier class when building the categorizer.
- tmpdir
-
A directory in which temporary files will be written when training the categorizer and categorizing new documents. The default is given by
File::Spec->tmpdir
.
train(knowledge_set => $k)
Trains the categorizer. This prepares it for later use in categorizing documents. The knowledge_set
parameter must provide an object of the class AI::Categorizer::KnowledgeSet
(or a subclass thereof), populated with lots of documents and categories. See AI::Categorizer::KnowledgeSet for the details of how to create such an object.
categorize($document)
Returns an AI::Categorizer::Hypothesis
object representing the categorizer's "best guess" about which categories the given document should be assigned to. See AI::Categorizer::Hypothesis for more details on how to use this object.
save_state($path)
Saves the categorizer for later use. This method is inherited from AI::Categorizer::Storable
.
AUTHOR
Ken Williams, ken@mathforum.org
COPYRIGHT
Copyright 2000-2003 Ken Williams. All rights reserved.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
SEE ALSO
AI::Categorizer(3)