NAME
AI::NeuralNet::Kohonen - Kohonen's Self-organising Maps
SYNOPSIS
$_ = AI::NeuralNet::Kohonen->new(
map_dim_x => 39,
map_dim_y => 19,
epochs => 100,
table =>
"R G B
1 0 0
0 1 0
0 0 1
1 1 0
1 0 1
0 1 1
1 1 1
");
$_->dump;
$_->tk_dump;
$_->train;
$_->dump;
$_->tk_dump;
exit;
DESCRIPTION
An illustrative implimentation of Kohonen's Self-organising Feature Maps (SOMs) in Perl.
It's not fast - it's illustrative.
In fact, it's slow: but it's illustrative....
Have a look at AI::NeuralNet::Kohonen::Demo::RGB.
I'll add some more text here later.
DEPENDENCIES
None
EXPORTS
None
CONSTRUCTOR new
Instantiates object fields:
- input_file
-
A SOM_PAK training file to load. This does not prevent other input methods (
input
,table
) being processed, but it does over-ride any specifications (weight_dim
) which may have been explicitly handed to the constructor. - input
-
A reference to an array of training vectors, within which each vector is represented by an array:
[ [v1a, v1b, v1c], [v2a,v2b,v2c], ..., [vNa,vNb,vNc] ]
See also
table
. - table
-
A scalar that is a table, lines delimited by
[\r\f\n]+
, columns by whitespace, initial whitespace stripped. First line should be column names, the following lines should be just data. See alsoinput
. - input_names
-
A name for each dimension of the input vectors.
- map_dim_x
- map_dim_y
-
The dimensions of the feature map to create - defaults to a toy 19. (note: this is Perl indexing, starting at zero).
- epochs
-
Number of epochs to run for (see "METHOD train").
- learning_rate
-
The initial learning rate.
- train_start
-
Reference to code to call at the begining of training.
- epoch_end
-
Reference to code to call at the end of every epoch (such as a display routine).
- train_end
-
Reference to code to call at the end of training.
- targeting
-
If undefined, random targets are chosen; otherwise they're iterated over. Just for experimental purposes.
- smoothing
-
The amount of smoothing to apply by default when
smooth
is applied (see "METHOD smooth").
Private fields:
- time_constant
-
The number of iterations (epochs) to be completed, over the log of the map radius.
- t
-
The current epoch, or moment in time.
- l
-
The current learning rate.
- map_dim_a
-
Average of the map dimensions.
METHOD train
Optionally accepts a parameter that is the number of epochs to train for - default is the value in the epochs
field.
For every epoch
, iterates:
- selects a random target from the input array;
- finds the best bmu
- adjusts neighbours of the bmu
- decays the learning rate
METHOD find_bmu
Find the Best Matching Unit in the map and return the x/y index.
Accepts: a reference to an array that is the target.
Returns: a reference to an array that is the BMU (and should perhaps be abstracted as an object in its own right), indexed as follows:
See "METHOD get_weight_at", and "distance_from" in AI::NeuralNet::Kohonen::Node,
METHOD get_weight_at
Returns a reference to the weight array at the supplied x,y co-ordinates.
Accepts: x,y co-ordinates, each a scalar.
Returns: reference to an array that is the weight of the node, or undef
on failure.
PRIVATE METHOD find_bmu
Depreciated - should have been public to begin with.
METHOD get_results
Finds and returns the results for all input vectors (input
), placing the values in the array reference that is the results
field, and, depending on calling context, returning it either as an array or as it is.
Individual results are in the array format as described in "METHOD find_bmu".
See "METHOD find_bmu", and "METHOD get_weight_at".
METHOD dump
Print the current weight values to the screen.
METHOD smooth
Perform gaussian smoothing upon the map.
Accepts: the length of the side of the square gaussian mask to apply. If not supplied, uses the value in the field smoothing
; if that is empty, uses the square root of the average of the map dimensions (map_dim_a
).
Returns: a true value.
METHOD tk_dump;
Extended and moved to the package AI::NeuralNet::Kohonen::Demo::RGB
.
PRIVATE METHOD _select_target
Return a random target from the training set in the input
field, unless the targeting
field is defined, when the targets are iterated over.
PRIVATE METHOD _adjust_neighbours_of
Accepts: a reference to an array containing the distance of the BMU from the target, and the x and y co-ordinates of the BMU in the map; a reference to an array that is the target.
Returns: true.
FINDING THE NEIGHBOURS OF THE BMU
( t )
sigma(t) = sigma(0) exp ( - ------ )
( lambda )
Where sigma
is the width of the map at any stage in time (t
), and lambda
is a time constant.
Lambda is our field time_constant
.
The map radius is naturally just half the map width.
ADJUSTING THE NEIGHBOURS OF THE BMU
W(t+1) = W(t) + THETA(t) L(t)( V(t)-W(t) )
Where L
is the learning rate, V
the target vector, and W
the weight. THETA(t) represents the influence of distance from the BMU upon a node's learning, and is calculated by the Node
class - see "distance_effect" in AI::NeuralNet::Kohonen::Node.
PRIVATE METHOD _decay_learning_rate
Performs a gaussian decay upon the learning rate (our l
field).
( t )
L(t) = L exp ( - ------ )
0 ( lambda )
PRIVATE FUNCTION _make_gaussian_mask
Accepts: size of mask.
Returns: reference to a 2d array that is the mask.
PRIVATE FUNCTION _gauss_weight
Accepts: two paramters: the first, r
, gives the distance from the mask centre, the second, sigma
, specifies the width of the mask.
Returns the gaussian weight.
See also _decay_learning_rate.
FILE FORMAT
SOM_PAK file format version 3.1 (April 7, 1995), Helsinki University of Technology, Espoo:
The input data is stored in ASCII-form as a list of entries, one line ...for each vectorial sample.
The first line of the file is reserved for status knowledge of the entries; in the present version it is used to define the following items (these items MUST occur in the indicated order):
- Dimensionality of the vectors (integer, compulsory).
- Topology type, either hexa or rect (string, optional, case-sensitive).
- Map dimension in x-direction (integer, optional).
- Map dimension in y-direction (integer, optional).
- Neighborhood type, either bubble or gaussian (string, optional, case-sen-
sitive).
...
Subsequent lines consist of n floating-point numbers followed by an optional class label (that can be any string) and two optional qualifiers (see below) that determine the usage of the corresponding data entry in training programs. The data files can also contain an arbitrary number of comment lines that begin with '#', and are ignored. (One '#' for each comment line is needed.)
If some components of some data vectors are missing (due to data collection failures or any other reason) those components should be marked with 'x'...[in processing, these] are ignored.
Not implimented (yet):
I<neighbourhood type>, which is always gaussian.
i<x> for missing data.
class labels
the two optional qualifiers
Requires: a path to a file.
Returns undef
on failure.
METHOD save_file
Saves the map file in SOM_PAK format (see "METHOD load_file") at the path specified in the first argument.
Return undef
on failure, a true value on success.
SEE ALSO
See "distance_from" in AI::NeuralNet::Kohonen::Node; AI::NeuralNet::Kohonen::Demo::RGB.
A very nice explanation of Kohonen's algorithm: AI-Junkie SOM tutorial part 1
AUTHOR AND COYRIGHT
This implimentation Copyright (C) Lee Goddard, 2003. All Rights Reserved.
Available under the same terms as Perl itself.
3 POD Errors
The following errors were encountered while parsing the POD:
- Around line 281:
Expected text after =item, not a number
- Around line 285:
Expected text after =item, not a number
- Around line 612:
=back without =over