NAME
Algorithm::BayesianSets - perl implementation of Bayesian Sets
SYNOPSIS
use Algorithm::BayesianSets;
my $bs = Algorithm::BayesianSets->new;
# add documents
my %documents = (
apple => {
fruit => 1,
red => 1,
},
banana => {
fruit => 1,
yellow => 1,
},
cherry => {
fruit => 1,
pink => 1,
},
);
foreach my $id (keys %documents) {
$bs->add_document($id, $documents{$id});
}
# calc alpha/beta parameters
$bs->calc_parameters();
# get similar documents
my @queries = qw(apple);
my $scores = $bs->calc_similarities(\@queries);
# show output
foreach my $id (keys %{ $scores }) {
printf "%s\t%.4f\n", $id, $scores->{$id};
}
DESCRIPTION
Algorithm::BayesianSets is a perl implementation of Bayesian Sets algorithm.
METHODS
new($threshold)
Create a new instance.
$threshold parameter is the threshold of the degree of document features. In add_document method, if the degree of the feature is less than the threshold, the feature isn't used.
add_document($id, $vector)
Add an input document to the instance of Algorithm::BayesianSets. $id parameter is the identifier of a document, and $vector parameter is the feature vector of a document. $vector parameter must be a hash reference, each key of $vector parameter is the identifier of the feature of documents and each value of $vector is the degree of the feature.
calc_parameters($c)
Calculate the alpha and beta parameters which are used in Bayesian Sets algorithm. $c parameter must be a real number (Default: 2.0).
calc_similarities($queries)
Calculate the similarities between the queries and input documents using Bayesian Sets algorithm. $queries parameter must be array reference, and each query in $queries needs to be included in the identifiers of input documents.
The output of this method is a hash reference, each key of the hash reference is the identifier of an input document and each value is the similarity between the queries and an input document.
_average_vector($vectors)
Get the average vector of input vectors.
_inner_product($vector1, $vector2)
Calculate the inner product value of input vectors.
AUTHOR
Mizuki Fujisawa <fujisawa@bayon.cc>
SEE ALSO
- Bayesian Sets (Paper)
-
http://www.gatsby.ucl.ac.uk/~heller/bsets.pdf
- bsets, The Bayesian Sets Algorithm (Matlab code)
-
http://chasen.org/~daiti-m/dist/bsets/
LICENSE
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.