NAME
Set::Similarity::Jaccard - Jaccard coefficent for sets
SYNOPSIS
use Set::Similarity::Jaccard;
# object method
my $jaccard = Set::Similarity::Jaccard->new;
my $similarity = $jaccard->similarity('Photographer','Fotograf');
# class method
my $jaccard = 'Set::Similarity::Dice';
my $similarity = $jaccard->similarity('Photographer','Fotograf');
# from 2-grams
my $width = 2;
my $similarity = $jaccard->similarity('Photographer','Fotograf',$width);
# from arrayref of tokens
my $similarity = $jaccard->similarity(['a','b'],['b']);
# from hashref of features
my $bird = {
wings => true,
eyes => true,
feathers => true,
hairs => false,
legs => true,
arms => false,
};
my $mammal = {
wings => false,
eyes => true,
feathers => false,
hairs => true,
legs => true,
arms => true,
};
my $similarity = $jaccard->similarity($bird,$mammal);
# from hashref sets
my $bird = {
wings => undef,
eyes => undef,
feathers => undef,
legs => undef,
};
my $mammal = {
eyes => undef,
hairs => undef,
legs => undef,
arms => undef,
};
my $similarity = $jaccard->from_sets($bird,$mammal);
DESCRIPTION
Jaccard Index
The Jaccard coefficient measures similarity between sample sets, and is defined as the size of the intersection divided by the size of the union of the sample sets
( A intersect B ) / (A union B)
The Tanimoto coefficient is the ratio of the number of elements common to both sets to the total number of elements, i.e.
( A intersect B ) / ( A + B - ( A intersect B ) ) # the same as Jaccard
The range is 0 to 1 inclusive.
METHODS
Set::Similarity::Jaccard inherits all methods from Set::Similarity and implements the following new ones.
from_sets
my $similarity = $object->from_sets({'a' => undef},{'b' => undef});
SOURCE REPOSITORY
http://github.com/wollmers/Set-Similarity
AUTHOR
Helmut Wollmersdorfer, <helmut.wollmersdorfer@gmail.com>
COPYRIGHT AND LICENSE
Copyright (C) 2013 by Helmut Wollmersdorfer
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.