NAME
AI::NeuralNet::SOM - Perl extension for Kohonen Maps
SYNOPSIS
use AI::NeuralNet::SOM::Rect;
my $nn = new AI::NeuralNet::SOM::Rect (output_dim => "5x6",
input_dim => 3);
$nn->initialize;
$nn->train (30,
[ 3, 2, 4 ],
[ -1, -1, -1 ],
[ 0, 4, -3]);
my @mes = $nn->train (30, ...); # learn about the smallest errors
# during training
print $nn->as_data; # dump the raw data
print $nn->as_string; # prepare a somehow formatted string
use AI::NeuralNet::SOM::Torus;
# similar to above
use AI::NeuralNet::SOM::Hexa;
my $nn = new AI::NeuralNet::SOM::Hexa (output_dim => 6,
input_dim => 4);
$nn->initialize ( [ 0, 0, 0, 0 ] ); # all get this value
$nn->value (3, 2, [ 1, 1, 1, 1 ]); # change value for a neuron
print $nn->value (3, 2);
$nn->label (3, 2, 'Danger'); # add a label to the neuron
print $nn->label (3, 2);
DESCRIPTION
This package is a stripped down implementation of the Kohonen Maps (self organizing maps). It is NOT meant as demonstration or for use together with some visualisation software. And while it is not (yet) optimized for speed, some consideration has been given that it is not overly slow.
Particular emphasis has been given that the package plays nicely with others. So no use of files, no arcane dependencies, etc.
Scenario
The basic idea is that the neural network consists of a 2-dimensional array of N-dimensional vectors. When the training is started these vectors may be completely random, but over time the network learns from the sample data, which is a set of N-dimensional vectors.
Slowly, the vectors in the network will try to approximate the sample vectors fed in. If in the sample vectors there were clusters, then these clusters will be neighbourhoods within the rectangle (or whatever topology you are using).
Technically, you have reduced your dimension from N to 2.
INTERFACE
Constructor
The constructor takes arguments:
input_dim
: (mandatory, no default)-
A positive integer specifying the dimension of the sample vectors (and hence that of the vectors in the grid).
learning_rate
: (optional, default0.1
)-
This is a magic number which controls how strongly the vectors in the grid can be influenced. Stronger movement can mean faster learning if the clusters are very pronounced. If not, then the movement is like noise and the convergence is not good. To mediate that effect, the learning rate is reduced over the iterations.
sigma0
: (optional, defaults to radius)-
A non-negative number representing the start value for the learning radius. Practically, the value should be chosen in such a way to cover a larger part of the map. During the learning process this value will be narrowed down, so that the learning radius impacts less and less neurons.
NOTE: Do not choose
1
as thelog
function is used on this value.
Subclasses will (re)define some of these parameters and add others:
Example:
my $nn = new AI::NeuralNet::SOM::Rect (output_dim => "5x6",
input_dim => 3);
Methods
- initialize
-
$nn->initialize
You need to initialize all vectors in the map before training. There are several options how this is done:
- providing data vectors
-
If you provide a list of vectors, these will be used in turn to seed the neurons. If the list is shorter than the number of neurons, the list will be started over. That way it is trivial to zero everything:
$nn->initialize ( [ 0, 0, 0 ] );
- providing no data
-
Then all vectors will get randomized values (in the range [ -0.5 .. 0.5 ]).
- using eigenvectors (see "HOWTOS")
- train
-
$nn->train ( $epochs, @vectors )
@mes = $nn->train ( $epochs, @vectors )
The training uses the list of sample vectors to make the network learn. Each vector is simply a reference to an array of values.
The
epoch
parameter controls how many vectors are processed. The vectors are NOT used in sequence, but picked randomly from the list. For this reason it is wise to run several epochs, not just one. But within one epoch all vectors are visited exactly once.Example:
$nn->train (30, [ 3, 2, 4 ], [ -1, -1, -1 ], [ 0, 4, -3]);
- bmu
-
($x, $y, $distance) = $nn->bmu ($vector)
This method finds the best matching unit, i.e. that neuron which is closest to the vector passed in. The method returns the coordinates and the actual distance.
- mean_error
-
$me = $nn->mean_error (@vectors)
This method takes a number of vectors and produces the mean distance, i.e. the average error which the SOM makes when finding the
bmu
s for the vectors. At least one vector must be passed in.Obviously, the longer you let your SOM be trained, the smaller the error should become.
- neighbors
-
$ns = $nn->neighbors ($sigma, $x, $y)
Finds all neighbors of (X, Y) with a distance smaller than SIGMA. Returns a list reference of (X, Y, distance) triples.
- output_dim (read-only)
-
$dim = $nn->output_dim
Returns the output dimensions of the map as passed in at constructor time.
- radius (read-only)
-
$radius = $nn->radius
Returns the radius of the map. Different topologies interpret this differently.
- map
-
$m = $nn->map
This method returns a reference to the map data. See the appropriate subclass of the data representation.
- value
-
$val = $nn->value ($x, $y)
$nn->value ($x, $y, $val)
Set or get the current vector value for a particular neuron. The neuron is addressed via its coordinates.
- label
-
$label = $nn->label ($x, $y)
$nn->label ($x, $y, $label)
Set or get the label for a particular neuron. The neuron is addressed via its coordinates. The label can be anything, it is just attached to the position.
- as_string
-
print $nn->as_string
This methods creates a pretty-print version of the current vectors.
- as_data
-
print $nn->as_data
This methods creates a string containing the raw vector data, row by row. This can be fed into gnuplot, for instance.
HOWTOs
- using Eigenvectors to initialize the SOM
-
See the example script in the directory
examples
provided in the distribution. It uses PDL (for speed and scalability, but the results are not as good as I had thought). - loading and saving a SOM
-
See the example script in the directory
examples
. It usesStorable
to directly dump the data structure onto disk. Storage and retrieval is quite fast.
FAQs
- I get 'uninitialized value ...' warnings, many of them
-
There is most likely something wrong with the
input_dim
you specified and your vectors should be having.
TODOs
- maybe implement the SOM on top of PDL?
- provide a ::SOM::Compat to have compatibility with the original AI::NeuralNet::SOM?
- implement different window forms (bubble/gaussian), linear/random
- implement the format mentioned in the original AI::NeuralNet::SOM
- add methods as_html to individual topologies
- add iterators through vector lists for initialize and train
SUPPORT
Bugs should always be submitted via the CPAN bug tracker https://rt.cpan.org/Dist/Display.html?Status=Active&Queue=AI-NeuralNet-SOM
SEE ALSO
Explanation of the algorithm:
http://www.ai-junkie.com/ann/som/som1.html
Old version of AI::NeuralNet::SOM from Alexander Voischev:
http://backpan.perl.org/authors/id/V/VO/VOISCHEV/
Subclasses:
AI::NeuralNet::Hexa AI::NeuralNet::Rect AI::NeuralNet::Torus
AUTHOR
Robert Barta, <rho@devc.at>
COPYRIGHT AND LICENSE
Copyright (C) 200[78] by Robert Barta
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.