NAME

Algorithm::ClusterPoints - find clusters inside a set of points

SYNOPSIS

use Algorithm::ClusterPoints;
my $clp = Algorithm::ClusterPoints->new( dimension => 3,
                                         radius => 1.0,
                                         minimum_size => 2,
                                         ordered => 1 );
for my $p (@points) {
    $clp->add_point($p->{x}, $p->{y}, $p->{z});
}
my @clusters = $clp->clusters_ix;
for my $i (0..$#clusters) {
    print( join( ' ',
                 "cluster $i:",
                 map {
                     my ($x, $y, $z) = $clp->point_coords($_);
                     "($_: $x, $y, $z)"
                 } @{$clusters[$i]}
               ), "\n"
         );
}

DESCRIPTION

This module implements and algorithm to find clusters of points inside a set.

Points can have any dimension from one to infinitum, though the algorithm performance degrades quickly as the dimension increases (it has O((2*D)^D) complexity).

The algorithm input parameters are:

$dimension

Dimension of the problem space. For instance, for finding clusters on a geometric plane, dimension will be 2.

$radius

A point P is part of a cluster when there is at least another point from the cluster inside the (hyper-)circunference with center P and radius $radius.

$minimum_size

Minimum_number of points required to form a cluster, the default is one.

@points

The coordinates of the points

$ordered

Order the points inside the clusters by their indexes and also order the clusters by the index of the contained points.

Ordering the output data is optinal because it can be an computational expensive operation.

API

This module has an object oriented interface with the following methods:

Algorithm::ClusterPoints->new(%args)

returns a new object.

The accepted arguments are:

dimension => $dimension

number of dimensions of the points (defaul is 2).

radius => $radius

maximum aceptable distance between two points to form a cluster (default is 1.0).

minimum_size => $minimum_size

minimun cluster size (default is 1).

ordered => $ordered

sort the returned data structures (default is false).

$clp->add_point($x, $y, $z, ...)
$clp->add_points($x0, $y0, $z0..., $x1, $y1, $z1..., ...);

These methods register points into the algorithm.

They return the index of the (first) point added.

$clp->radius
$clp->radius($radius)
$clp->minimum_size
$clp->minimum_size($minimum_size)
$clp->ordered
$clp->ordered($ordered)

These methods get or set the algorithm parameters.

@coords = $clp->point_coords($index)

returns the coordinates of the point at index $index.

@clusters_ix = $clp->clusters_ix

returns a list of clusters defined by the indexes of the points inside

The data estructure returned is a list of arrays. Every array represents a cluster and contains the indexes of the points inside.

For instance:

@clusters_ix = ( [ 0, 1, 5, 10, 13, 15, 17, 31, 32, 38 ],
                 [ 2, 12, 20, 26, 27, 29, 33 ],
                 [ 3, 22, 39 ],
                 [ 4, 11, 16, 30, 36 ],
                 [ 6, 14 ],
                 [ 7, 23, 24 ],
                 [ 18, 25 ],
                 [ 21, 40 ] );

You can get back the coordinates of the points using the method point_coords, as for instance:

for my $c (@clusters_ix) {
  for my $index (@$c) {
    my ($x, $y, $z) = $clp->point_coords($index);
    ...

Or you can use the method clusters described below that already returns 2D coordinates.

@clusters = $clp->clusters

returns a list of clusters defined by the 2D coordinates of the points inside.

This method is similar to clusters_ix but instead of the point indexes, it includes the point coordinates inside the cluster arrays.

This is a sample of the returned structure:

@clusters = ( [ 0.49, 0.32, 0.55, 0.32, 0.66, 0.33 ],
              [ 0.95, 0.20, 0.83, 0.27, 0.90, 0.20 ],
              [ 0.09, 0.09, 0.01, 0.08, 0.12, 0.15 ],
              [ 0.72, 0.42, 0.67, 0.47 ],
              [ 0.83, 0.11, 0.77, 0.13, 0.73, 0.07 ],
              [ 0.37, 0.38, 0.36, 0.44 ],
              [ 0.16, 0.79, 0.14, 0.74 ] );

Note that X and Y coordinate values are interleaved inside the arrays.

SEE ALSO

All began on this PerlMonks discussion: http://perlmonks.org/?node_id=694892.

COPYRIGHT AND LICENSE

Copyright (C) 2008 by Salvador Fandiño (sfandino@yahoo.com)

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.