NAME
Algorithm::Kmeanspp - perl implementation of K-means++
SYNOPSIS
use Algorithm::Kmeanspp;
# input documents
my %documents = (
Alex => { 'Pop' => 10, 'R&B' => 6, 'Rock' => 4 },
Bob => { 'Jazz' => 8, 'Reggae' => 9 },
Dave => { 'Classic' => 4, 'World' => 4 },
Ted => { 'Jazz' => 9, 'Metal' => 2, 'Reggae' => 6 },
Fred => { 'Hip-hop' => 3, 'Rock' => 3, 'Pop' => 3 },
Sam => { 'Classic' => 8, 'Rock' => 1 },
);
my $kmp = Algorithm::Kmeanspp->new;
foreach my $id (keys %documents) {
$kmp->add_document($id, $documents{$id});
}
my $num_cluster = 3;
my $num_iter = 20;
$kmp->do_clustering($num_cluster, $num_iter);
# show clustering result
foreach my $cluster (@{ $kmp->clusters }) {
print join "\t", @{ $cluster };
print "\n";
}
# show cluster centroids
foreach my $centroid (@{ $kmp->centroids }) {
print join "\t", map { sprintf "%s:%.4f", $_, $centroid->{$_} }
keys %{ $centroid };
print "\n";
}
DESCRIPTION
Algorithm::Kmeanspp is a perl implementation of K-means++.
METHODS
new
Create a new instance.
add_document($id, $vector)
Add an input document to the instance of Algorithm::Kmeanspp. $id parameter is the identifier of a document, and $vector parameter is the feature vector of a document. $vector parameter must be a hash reference, each key of $vector parameter is the identifier of the feature of documents and each value of $vector is the degree of the feature.
do_clustering($num_cluster, $num_iter)
Do clustering input documents. $num_cluster parameter specifies the number of output clusters, and $num_iter parameter specifies the number of clustering iterations.
clusters
This method is the accessor of clustering result. The output of the method is a array reference, and each item in the array reference includes the list of the identifiers of input documents in each cluster.
# format of output clusters
[
[ document_id1, document_id2, ... ], # cluster-1
[ document_id3, document_id4, ... ], # cluster-2
...
]
centroids
This method is the accessor of the vectors of cluster centroids.
AUTHOR
Mizuki Fujisawa <fujisawa@bayon.cc>
SEE ALSO
LICENSE
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.