NAME

Algorithm::FuzzyCmeans - perl implementation of Fuzzy c-means clustering

SYNOPSIS

use Algorithm::FuzzyCmeans;

# input documents
my %documents = (
    Alex => { 'Pop'     => 10, 'R&B'    => 6, 'Rock'   => 4 },
    Bob  => { 'Jazz'    => 8,  'Reggae' => 9                },
    Dave => { 'Classic' => 4,  'World'  => 4                },
    Ted  => { 'Jazz'    => 9,  'Metal'  => 2, 'Reggae' => 6 },
    Fred => { 'Hip-hop' => 3,  'Rock'   => 3, 'Pop'    => 3 },
    Sam  => { 'Classic' => 8,  'Rock'   => 1                },
);

my $fcm = Algorithm::FuzzyCmeans->new(
    distance_class => 'Algorithm::FuzzyCmeans::Distance::Cosine',
    m              => 2.0,
);
foreach my $id (keys %documents) {
    $fcm->add_document($id, $documents{$id});
}

my $num_cluster = 3;
my $num_iter    = 20;
$fcm->do_clustering($num_cluster, $num_iter);             

# show clustering result
foreach my $id (sort { $a cmp $b } keys %{ $fcm->memberships }) {
    printf "%s\t%s\n", $id,
        join "\t", map { sprintf "%.4f", $_ } @{ $fcm->memberships->{$id} };
}
# show cluster centroids
foreach my $centroid (@{ $fcm->centroids }) {
    print join "\t", map { sprintf "%s:%.4f", $_, $centroid->{$_} }
        keys %{ $centroid };
    print "\n";
}

DESCRIPTION

Algorithm::FuzzyCmeans is a perl implementation of Fuzzy c-means clustering.

METHODS

new

Create a new instance.

`m' option is a fuzzyness coefficient, and must be more than 1.0 (default: 2.0).

`distance_class' option is a class name with distance function between vectors. Currently, 'Algorithm::FuzzyCmeans::Distance::Euclid'(euclid distance) and 'Algorithm::FuzzyCmeans::Distance::Cosine'(cosine distance) are supported (default: cosine).

add_document($id, $vector)

Add an input document to the instance of Algorithm::FuzzyCmeans. $id parameter is the identifier of a document, and $vector parameter is the feature vector of a document. $vector parameter must be a hash reference, each key of $vector parameter is the identifier of the feature of documents and each value of $vector is the degree of the feature.

do_clustering($num_cluster, $num_iter)

Do clustering input documents. $num_cluster parameter specifies the number of output clusters, and $num_iter parameter specifies the number of clustering iterations.

memberships

This method is the accessor of clustering result. The output of the method is a hash reference, the key is the identifier of each input document, and the value is the list of the degrees of membership of each input document in output clusters.

centroids

This method is the accessor of the vectors of cluster centroids.

AUTHOR

Mizuki Fujisawa <fujisawa@bayon.cc>

SEE ALSO

Wikipedia: Fuzzy c-means clustering http://en.wikipedia.org/wiki/Cluster_Analysis#Fuzzy_c-means_clustering

LICENSE

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.