NAME

Digest::bBitMinHash - Perl implementation of b-Bit Minwise Hashing algorithm

SYNOPSIS

use Digest::bBitMinHash;

my $bbmh = Digest::bBitMinHash->new({"k"=>128, "b"=>2});
# Or my $bbmh = Digest::bBitMinHash->new({"k"=>128, "b"=>2});

my @data1 = split / /, "巨人 中井 左膝 靭帯 損傷 登録 抹消";
my @data2 = split / /, "中井 左膝 登録 抹消 阪神 右肩 大阪";

my $vectors1 = $db->get_bit_vectors(\@data1);
my $vectors2 = $db->get_bit_vectors(\@data2);
# Or $vectors1 = $db->get(\@data1);

my $match_bit_count = $db->compare_bit_vectors($vectors1, $vectors2);
# Or $match_bit_count = $db->compare($vectors1, $vectors2);

my $score = $db->estimate_resemblance(\@data1, \@data2, $match_bit_count);
# Or $score = $db->estimate(\@data1, \@data2)

# $score is under 0.8. So @data1 and @data2 are not similar.

DESCRIPTION

Digest::bBitMinHash is the Perl implementation of b-Bit Minwise Hashing algorithm.

LICENSE

Copyright (C) 2013 by Toshinori Sato (@overlast).

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

AUTHOR

Toshinori Sato (@overlast) <overlasting@gmail.com>