NAME

Algorithm::BayesianSets - perl implementation of Bayesian Sets

SYNOPSIS

use Algorithm::BayesianSets;

my $bs = Algorithm::BayesianSets->new;

# add documents
my %documents = (
    apple  => {
        fruit => 1,
        red   => 1,
    },
    banana => {
        fruit  => 1,
        yellow => 1,
    },
    cherry => {
        fruit => 1,
        pink  => 1,
    },
);
foreach my $id (keys %documents) {
    $bs->add_document($id, $documents{$id});
}

# calc alpha/beta parameters
$bs->calc_parameters();

# get similar documents
my @queries = qw(apple);
my $scores = $bs->calc_similarities(\@queries);

# show output
foreach my $id (keys %{ $scores }) {
    printf "%s\t%.4f\n", $id, $scores->{$id};
}

DESCRIPTION

Algorithm::BayesianSets is a perl implementation of Bayesian Sets algorithm.

METHODS

new($threshold)

Create a new instance.

$threshold parameter is the threshold of the degree of document features. In add_document method, if the degree of the feature is less than the threshold, the feature isn't used.

add_document($id, $vector)

Add an input document to the instance of Algorithm::BayesianSets. $id parameter is the identifier of a document, and $vector parameter is the feature vector of a document. $vector parameter must be a hash reference, each key of $vector parameter is the identifier of the feature of documents and each value of $vector is the degree of the feature.

calc_parameters($c)

Calculate the alpha and beta parameters which are used in Bayesian Sets algorithm. $c parameter must be a real number (Default: 2.0).

calc_similarities($queries)

Calculate the similarities between the queries and input documents using Bayesian Sets algorithm. $queries parameter must be array reference, and each query in $queries needs to be included in the identifiers of input documents.

The output of this method is a hash reference, each key of the hash reference is the identifier of an input document and each value is the similarity between the queries and an input document.

_average_vector($vectors)

Get the average vector of input vectors.

_inner_product($vector1, $vector2)

Calculate the inner product value of input vectors.

AUTHOR

Mizuki Fujisawa <fujisawa@bayon.cc>

SEE ALSO

Bayesian Sets (Paper)

http://www.gatsby.ucl.ac.uk/~heller/bsets.pdf

bsets, The Bayesian Sets Algorithm (Matlab code)

http://chasen.org/~daiti-m/dist/bsets/

LICENSE

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.