NAME
Algorithm::FuzzyCmeans - perl implementation of Fuzzy c-means clustering
SYNOPSIS
use Algorithm::FuzzyCmeans;
# input documents
my %documents = (
Alex => { 'Pop' => 10, 'R&B' => 6, 'Rock' => 4 },
Bob => { 'Jazz' => 8, 'Reggae' => 9 },
Dave => { 'Classic' => 4, 'World' => 4 },
Ted => { 'Jazz' => 9, 'Metal' => 2, 'Reggae' => 6 },
Fred => { 'Hip-hop' => 3, 'Rock' => 3, 'Pop' => 3 },
Sam => { 'Classic' => 8, 'Rock' => 1 },
);
my $fcm = Algorithm::FuzzyCmeans->new(
distance_class => 'Algorithm::FuzzyCmeans::Distance::Cosine',
m => 2.0,
);
foreach my $id (keys %documents) {
$fcm->add_document($id, $documents{$id});
}
my $num_cluster = 3;
my $num_iter = 20;
$fcm->do_clustering($num_cluster, $num_iter);
# show clustering result
foreach my $id (sort { $a cmp $b } keys %{ $fcm->memberships }) {
printf "%s\t%s\n", $id,
join "\t", map { sprintf "%.4f", $_ } @{ $fcm->memberships->{$id} };
}
# show cluster centroids
foreach my $centroid (@{ $fcm->centroids }) {
print join "\t", map { sprintf "%s:%.4f", $_, $centroid->{$_} }
keys %{ $centroid };
print "\n";
}
DESCRIPTION
Algorithm::FuzzyCmeans is a perl implementation of Fuzzy c-means clustering.
METHODS
new
Create a new instance.
`m' option is a fuzzyness coefficient, and must be more than 1.0 (default: 2.0).
`distance_class' option is a class name with distance function between vectors. Currently, 'Algorithm::FuzzyCmeans::Distance::Euclid'(euclid distance) and 'Algorithm::FuzzyCmeans::Distance::Cosine'(cosine distance) are supported (default: cosine).
add_document($id, $vector)
Add an input document to the instance of Algorithm::FuzzyCmeans. $id parameter is the identifier of a document, and $vector parameter is the feature vector of a document. $vector parameter must be a hash reference, each key of $vector parameter is the identifier of the feature of documents and each value of $vector is the degree of the feature.
do_clustering($num_cluster, $num_iter)
Do clustering input documents. $num_cluster parameter specifies the number of output clusters, and $num_iter parameter specifies the number of clustering iterations.
memberships
This method is the accessor of clustering result. The output of the method is a hash reference, the key is the identifier of each input document, and the value is the list of the degrees of membership of each input document in output clusters.
centroids
This method is the accessor of the vectors of cluster centroids.
AUTHOR
Mizuki Fujisawa <fujisawa@bayon.cc>
SEE ALSO
- Wikipedia: Fuzzy c-means clustering http://en.wikipedia.org/wiki/Cluster_Analysis#Fuzzy_c-means_clustering
LICENSE
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.