NAME
Algorithm::AM::Batch - Classify items in batch mode
VERSION
version 3.03
SYNOPSIS
use Algorithm::AM;
use Algorithm::AM::Batch;
my $dataset = dataset_from_file('finnverb');
my $batch = Algorithm::AM::Batch->new(
training_set => $dataset,
# print the result of each classification as they are provided
end_test_hook => sub {
my ($batch, $test_item, $result) = @_;
print $test_item->comment . ' ' . $result->result . "\n";
}
);
my @results = $batch->classify_all($dataset);
DESCRIPTION
Batch provides a way to classify entire data sets by repeatedly calling classify with the provided configuration. Hooks are also provided so that the training set and classification parameters can be changed over time. All of the action happens in "classify_all".
EXPORTS
When this module is imported, it also imports the following:
- Algorithm::AM
- Algorithm::AM::Result
- Algorithm::AM::DataSet
-
Also imports the "dataset_from_file" in Algorithm::AM::DataSet function.
- Algorithm::AM::DataSet::Item
-
Also imports the "new_item" in Algorithm::AM::DataSet::Item function.
- Algorithm::AM::BigInt
-
Also imports the "bigcmp" in Algorithm::AM::BigInt function.
METHODS
new
Creates a new object instance. This method takes named parameters which call the methods described in the relevant documentation sections. The only required parameter is "training_set", which should be an instance of Algorithm::AM::DataSet, and which provides a pool of items to be used for training during classification. All of the accepted parameters are listed below:
training_set
Returns the dataset used for training.
test_set
Returns the test set currently providing the source of items to "classify_all". Before and after classify_all, this returns undef, and so is only useful when called from inside one of the hook subroutines.
repeat
Determines how many times each individual test item will be analyzed. As the analogical modeling algorithm is deterministics, it only makes sense to use this if the training set is modifed somehow during each iteration, i.e. via "probability" or "training_item_hook". The default value is 1.
probability
Get/set the probabibility that any one training item would be included among the training items used during classification, which is 1 by default.
max_training_items
Get/set the maximum number of items considered for addition to the training set. Note that this is the number considered, not actually added, so combined with "probability" or /training_item_hook your training set could be smaller than the amount specified.
exclude_nulls
This is passed directly to the new method of Algorithm::AM during each classification in the "classify_all" method.
exclude_given
This is passed directly to the new method of Algorithm::AM during each classification in the "classify_all" method.
linear
This is passed directly to the new method of Algorithm::AM during each classification in the "classify_all" method.
classify_all
Using the analogical modeling algorithm, this method classifies the test items in the project and returns a list of Result objects.
Log::Any is used to log information about the current progress and timing. The statistical summary, analogical set, and gang summary (without items listed) are logged at the info level, and the full gang summary with items listed is logged at the debug level.
Hooks are provided to the user for monitoring or modifying classification configuration. These hooks may be passed into the object constructor or set via one of the accessor methods. Batch classification proceeds as follows:
call begin_hook
loop all test set items
call begin_test_hook
repeat X times, where X is specified by the "repeat" setting
call begin_repeat_hook
create a training set;
- for each item in the provided training set,
up to max_training_items
exclude the item with probability 1 - probability
exclude the item if specified via training_item_hook
classify the item with the given training set
call end_repeat_hook
call end_test_hook
call end_hook
The Batch object itself is passed to these hooks, so the user is free to change settings such as "probability" or "max_training_items", or even add training data, at any point. Other information is passed to these hooks as well, as detailed in the method documentation.
begin_hook
$batch->begin_hook(sub {
my ($batch) = @_;
$batch->probability(.5);
});
This hook is called first thing in the "classify_all" method, and is given the Batch object instance.
begin_test_hook
$batch->begin_repeat_hook(sub {
my ($batch, $test_item) = @_;
$batch->probability(.5);
print $test_item->comment . "\n";
});
This hook is called by "classify_all" before any iterations of classification start for each test item. It is provided with the Batch object instance and the test item.
begin_repeat_hook
$batch->begin_repeat_hook(sub {
my ($batch, $test_item, $iteration) = @_;
$batch->probability(.5);
print $test_item->comment . "\n";
print "I'm on iteration $iteration\n";
});
This hook is called during "classify_all" at the beginning of each iteration of classification of a test item. It is provided with the Batch object instance, the test item, and the iteration number, which will vary between 1 and the setting for "repeat".
training_item_hook
$batch->begin_repeat_hook(sub {
my ($batch, $test_item, $iteration, $training_item) = @_;
$batch->probability(.5);
print $test_item->comment . "\n";
print "I'm on iteration $iteration\n";
if($training_item->comment eq 'include me!'){
return 1;
}else{
return 0;
}
});
This hook is called by "classify_all" while populating a training set during each iteration of classification. It is provided with the Batch object instance, the test item, the iteration number, and an item which may be included in the training set. If the return value is true, then the item will be included in the training set; otherwise, it will not.
end_repeat_hook
$batch->begin_repeat_hook(sub {
my ($batch, $test_item, $iteration, $excluded_items, $result) = @_;
$batch->probability(.5);
print $test_item->comment . "\n";
print "I finished iteration $iteration\n";
print 'I excluded ' . scalar @$excluded_items .
" items from training\n";
print ${$result->statistical_summary};
});
This hook is called during "classify_all" at the end of each iteration of classification of a test item. It is provided with the Batch object instance, the test item, the iteration number, an array ref containing training items excluded from the training set, and the result object returned by classify.
end_test_hook
$batch->begin_repeat_hook(sub {
my ($batch, $test_item, @results) = @_;
$batch->probability(.5);
print $test_item->comment . "\n";
my $iterations = @results;
my $correct = 0;
for my $result (@result){
$correct++ if $result->result ne 'incorrect';
}
print 'Item ' . $item->comment .
" correct $correct/$iterations times\n";
});
This hook is called by "classify_all" after all classifications of a single item are finished. It is provided with the Batch object instance as well as a list of the Result objects returned by "classify" in Algorithm::AM during each iteration of classification.
end_hook
$batch->end_hook(sub {
my ($batch, @results) = @_;
for my $result(@results){
print ${$result->statistical_summary};
}
});
This hook is called after all classifications are finished. It is provided with the Batch object instance as well as a list of all of the Result objects returned by "classify" in Algorithm::AM.
AUTHOR
Theron Stanford <shixilun@yahoo.com>, Nathan Glenn <garfieldnate@gmail.com>
COPYRIGHT AND LICENSE
This software is copyright (c) 2013 by Royal Skousen.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.