NAME
Data::Classifier - A tool for classifying data with regular expressions
SYNOPSIS
use strict;
use warnings;
use Data::Classifier;
my $yaml = <<EOY;
---
name: Root
children:
- name: BMW
children:
- name: Diesel
match:
model: "d\$"
- name: Sports
match:
model: "i\$"
seats: 2
- name: Really Expensive
match:
model: "^M"
EOY
my $classifier = Data::Classifier->new(yaml => $yaml);
my $attributes1 = { model => '325i', seats => 4 };
my $class1 = $classifier->process($attributes1);
my $attributes2 = { model => '535d', seats => 4 };
my $class2 = $classifier->process($attributes2);
my $attributes3 = { model => 'M3', seats => 2 };
my $class3 = $classifier->process($attributes3);
print "$attributes2->{model}: ", $class2->fqn, "\n";
print "$attributes3->{model}: ", $class3->fqn, "\n";
#no real sports car has 4 seats
print "$attributes1->{model}: ", $class1->fqn, "\n";
OVERVIEW
This module provides tools to classify sets of data contained in hashes against a predefined class hierarchy. Testing against a class is performed using regular expressions stored in the class hierarchy. It is also possible to modify the behavior of the system by subclassing and overloading a few methods.
Note that this module may not be particularly usefull on its own. It is designed to be used as a base class for implementing other systems, such as Config::BuildHelper.
USAGE
Using this module involves creating an instance of the classifier object, passing the class hierarchy in via a YAML file, a YAML string, or prebuilt data structure, and any optional arguments:
$classifier = Data::Classifier->new(file => 'classes.yaml', debug => 1);
$classifier = Data::Classifier->new(yaml => $yaml_string);
$classifier = Data::Classifier->new(tree => $hashref);
Class Definition File
The class definition file is a very specific tree format, normally stored in a YAML file. Each node of the tree is a map with the same set of keys, some of which are optional:
- name
-
The textual name of the node being defined.
- data (optional)
-
Extra data to be returned with classification results.
- children (optional)
-
A sequence of nodes that exists under this node.
- match (optional)
-
A map of keys to test against incomming data and regular expressions to apply to that data. For a match to be true, all items in the map must match the data.
Matching Semantics
By default, this class has very specific matching semantics. For a dataset to match a node, everything listed under the match definition must match the specified data. Additionally, a node which contains no match definition will have all of it's children searched but can never be a match itself.
Methods
- $result = $classifier->process($attr)
-
Classify the data contained in the hash reference stored in $attr and return an instance of Data::Classifier::Result. See the documentation for that class for more information.
- $classifier->dump
-
Return a textual representation of the class hierarchy stored in RAM.
More Information
The rest of this module is documented in Data::Classifier::Result, which you use to access the results of classification.
SUBCLASSING
This class can be subclassed to change its behavior. The following methods are available for overloading:
- $classifier->return_result($result)
-
This method is invoked by $classifier->process() when it needs to return a new instance of a result class. Simply return an instance of your class here, such as:
sub return_result { my ($self, $result) = @_; return Data::Classifier::Result->new($result); }
- $classifier->check_match($matchlist, $attributes)
-
This method is invoked by $classifier->recursive_match() at each node of the tree that contains a match attribute. The entire contents of the match attribute will be passed in as $matchlist and the hashref given to $classifier->process() will be passed in via $attributes. Return true to indicate a match and false to indicate no match.
- $classifier->recursive_search($attributes, $node)
-
This method is invoked by $classifier->process() to recursively search the entire tree. If you need to change the semantics of how the classifier treats matches against nodes with out a match attribute, you would do that here.
IMPROVEMENTS
Here are a few ideas for improvements to this class:
- Data::Classifier::SQLTree
-
A class that stores it's tree in a SQL database, reconstructs it at startup, and passes it in using the tree argument to new.
AUTHORS
This module was created and documented by Tyler Riddle <triddle@gmail.com>.
BUGS
There are no known bugs at this time.
Please report any bugs or feature requests to bug-data-classifier@rt.cpan.org
, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Data::Classifier. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.