NAME
AI::NeuralNet::Kohonen - Kohonen's Self-organising Maps
SYNOPSIS
$_ = AI::NeuralNet::Kohonen->new(
map_dim_x => 39,
map_dim_y => 19,
epochs => 100,
table =>
"R G B
1 0 0
0 1 0
0 0 1
1 1 0
1 0 1
0 1 1
1 1 1
");
$_->train;
$_->save_file('mydata.txt');
exit;
DESCRIPTION
An illustrative implimentation of Kohonen's Self-organising Feature Maps (SOMs) in Perl. It's not fast - it's illustrative. In fact, it's slow: but it is illustrative....
Have a look at AI::NeuralNet::Kohonen::Demo::RGB for an example of visualisation of the map.
This module has not yet been tested for accuracy, and should be considered alpha - everything may change.
I'll add some more text here later.
DEPENDENCIES
AI::NeuralNet::Kohonen::Node
AI::NeuralNet::Kohonen::Input
EXPORTS
None
CONSTRUCTOR new
Instantiates object fields:
- input_file
-
A SOM_PAK training file to load. This does not prevent other input methods (
input
,table
) being processed, but it does over-ride any specifications (weight_dim
) which may have been explicitly handed to the constructor.See also "FILE FORMAT" and "METHOD load_input".
- input
-
A reference to an array of training vectors, within which each vector is represented by an array:
[ [v1a, v1b, v1c], [v2a,v2b,v2c], ..., [vNa,vNb,vNc] ]
See also
table
. - table
-
The contents of a file of the format that could be supplied to the
input_file
field. - input_names
-
A name for each dimension of the input vectors.
- map_dim_x
- map_dim_y
-
The dimensions of the feature map to create - defaults to a toy 19. (note: this is Perl indexing, starting at zero).
- epochs
-
Number of epochs to run for (see "METHOD train"). Minimum number is
1
. - learning_rate
-
The initial learning rate.
- train_start
-
Reference to code to call at the begining of training.
- epoch_start
-
Reference to code to call at the begining of every epoch (such as a colour calibration routine).
- epoch_end
-
Reference to code to call at the end of every epoch (such as a display routine).
- train_end
-
Reference to code to call at the end of training.
- targeting
-
If undefined, random targets are chosen; otherwise they're iterated over. Just for experimental purposes.
- smoothing
-
The amount of smoothing to apply by default when
smooth
is applied (see "METHOD smooth"). - neighbour_factor
-
When working out the size of the neighbourhood of influence, the average of the dimensions of the map are divided by this variable, before the exponential function is applied: the default value is 2.5, but you may with to use 2 or 4.
- missing_mask
-
Used to signify data is missing in an input vector. Defaults to
x
.
Private fields:
- time_constant
-
The number of iterations (epochs) to be completed, over the log of the map radius.
- t
-
The current epoch, or moment in time.
- l
-
The current learning rate.
- map_dim_a
-
Average of the map dimensions.
METHOD randomise_map
Populates the map
with nodes that contain random real nubmers.
See "CONSTRUCTOR new" in AI::NerualNet::Kohonen::Node.
METHOD clear_map
As "METHOD randomise_map" but sets all map
nodes to either the value supplied as the only paramter, or undef
.
METHOD train
Optionally accepts a parameter that is the number of epochs for which to train: the default is the value in the epochs
field.
An epoch is composed of A number of generations, the number being the total number of input vectors.
For every generation, iterates:
selects a target from the input array (see "PRIVATE METHOD _select_target");
finds the best-matching unit (see "METHOD find_bmu");
adjusts the neighbours of the BMU (see "PRIVATE METHOD _adjust_neighbours_of");
At the end of every generation, the learning rate is decayed (see "PRIVATE METHOD _decay_learning_rate").
See CONSTRUCTOR new
for details of applicable callbacks.
Returns a true value.
METHOD find_bmu
For a specific taraget, finds the Best Matching Unit in the map and return the x/y index.
Accepts: a reference to an array that is the target.
Returns: a reference to an array that is the BMU (and should perhaps be abstracted as an object in its own right), indexed as follows:
See "METHOD get_weight_at", and "distance_from" in AI::NeuralNet::Kohonen::Node,
METHOD get_weight_at
Returns a reference to the weight array at the supplied x,y co-ordinates.
Accepts: x,y co-ordinates, each a scalar.
Returns: reference to an array that is the weight of the node, or undef
on failure.
METHOD get_results
Finds and returns the results for all input vectors in the supplied reference to an array of arrays, placing the values in the results
field (array reference), and, returning it either as an array or as it is, depending on the calling context
If no array reference of input vectors is supplied, will use the values in the input
field.
Individual results are in the array format as described in "METHOD find_bmu".
See "METHOD find_bmu", and "METHOD get_weight_at".
METHOD map_results
Clears the map
and fills it with the results.
The sole paramter is passed to the "METHOD clear_map". "METHOD get_results" is then called, and the results returned fed into the object field map
.
This may change, as it seems misleading to re-use that field.
METHOD dump
Print the current weight values to the screen.
METHOD smooth
Perform gaussian smoothing upon the map.
Accepts: the length of the side of the square gaussian mask to apply. If not supplied, uses the value in the field smoothing
; if that is empty, uses the square root of the average of the map dimensions (map_dim_a
).
Returns: a true value.
METHOD load_input
Loads a SOM_PAK-format file of input vectors.
This method is automatically accessed if the constructor is supplied with an input_file
field.
Requires: a path to a file.
Returns undef
on failure.
See "FILE FORMAT".
METHOD save_file
Saves the map file in SOM_PAK format (see "METHOD load_input") at the path specified in the first argument.
Return undef
on failure, a true value on success.
PRIVATE METHOD _select_target
Return a random target from the training set in the input
field, unless the targeting
field is defined, when the targets are iterated over.
PRIVATE METHOD _adjust_neighbours_of
Accepts: a reference to an array containing the distance of the BMU from the target, as well as the x and y co-ordinates of the BMU in the map; a reference to the target, which is an AI::NeuralNet::Kohonen::Input
object.
Returns: true.
FINDING THE NEIGHBOURS OF THE BMU
( t )
sigma(t) = sigma(0) exp ( - ------ )
( lambda )
Where sigma
is the width of the map at any stage in time (t
), and lambda
is a time constant.
Lambda is our field time_constant
.
The map radius is naturally just half the map width.
ADJUSTING THE NEIGHBOURS OF THE BMU
W(t+1) = W(t) + THETA(t) L(t)( V(t)-W(t) )
Where L
is the learning rate, V
the target vector, and W
the weight. THETA(t) represents the influence of distance from the BMU upon a node's learning, and is calculated by the Node
class - see "distance_effect" in AI::NeuralNet::Kohonen::Node.
PRIVATE METHOD _decay_learning_rate
Performs a gaussian decay upon the learning rate (our l
field).
( t )
L(t) = L exp ( - ------ )
0 ( lambda )
PRIVATE FUNCTION _make_gaussian_mask
Accepts: size of mask.
Returns: reference to a 2d array that is the mask.
PRIVATE FUNCTION _gauss_weight
Accepts: two paramters: the first, r
, gives the distance from the mask centre, the second, sigma
, specifies the width of the mask.
Returns the gaussian weight.
See also _decay_learning_rate.
PUBLIC METHOD quantise_error
Returns the quantise error for either the supplied points, or those in the input
field.
PRIVATE METHOD _add_input_from_str
Adds to the input
field an input vector in SOM_PAK-format whitespace-delimited ASCII.
Returns undef
on failure to add an item (perhaps because the data passed was a comment, or the weight_dim
flag was not set); a true value on success.
FILE FORMAT
This module has begun to attempt the SOM_PAK format: SOM_PAK file format version 3.1 (April 7, 1995), Helsinki University of Technology, Espoo:
The input data is stored in ASCII-form as a list of entries, one line ...for each vectorial sample.
The first line of the file is reserved for status knowledge of the entries; in the present version it is used to define the following items (these items MUST occur in the indicated order):
- Dimensionality of the vectors (integer, compulsory).
- Topology type, either hexa or rect (string, optional, case-sensitive).
- Map dimension in x-direction (integer, optional).
- Map dimension in y-direction (integer, optional).
- Neighborhood type, either bubble or gaussian (string, optional, case-sen-
sitive).
...
Subsequent lines consist of n floating-point numbers followed by an optional class label (that can be any string) and two optional qualifiers (see below) that determine the usage of the corresponding data entry in training programs. The data files can also contain an arbitrary number of comment lines that begin with '#', and are ignored. (One '#' for each comment line is needed.)
If some components of some data vectors are missing (due to data collection failures or any other reason) those components should be marked with 'x'...[in processing, these] are ignored.
...
Each data line may have two optional qualifiers that determine the usage of the data entry during training. The qualifiers are of the form codeword=value, where spaces are not allowed between the parts of the qualifier. The optional qualifiers are the following:
- -
-
Enhancement factor: e.g. weight=3. The training rate for the corresponding input pattern vector is multiplied by this parameter so that the reference vectors are updated as if this input vector were repeated 3 times during training (i.e., as if the same vector had been stored 2 extra times in the data file).
- -
-
Fixed-point qualifier: e.g. fixed=2,5. The map unit defined by the fixed-point coordinates (x = 2; y = 5) is selected instead of the best-matching unit for training. (See below for the definition of coordinates over the map.) If several inputs are forced to known locations, a wanted orientation results in the map.
Not (yet) implimented:
hexa/rect is only visual, and only in the ::Demo::RGB package atm
I<neighbourhood type> is always gaussian.
i<x> for missing data.
the two optional qualifiers
DEPRACATED METHOS
- PRIVATE METHOD _find_bmu
-
Has become the public method,
find_bmu
. - METHOD tk_dump;
-
Extended and moved to the package
AI::NeuralNet::Kohonen::Demo::RGB
.
SEE ALSO
See "distance_from" in AI::NeuralNet::Kohonen::Node; AI::NeuralNet::Kohonen::Demo::RGB.
The documentation for SOM_PAK
, which has lots of advice on map building that may or may not be applicable yet.
A very nice explanation of Kohonen's algorithm: AI-Junkie SOM tutorial part 1
AUTHOR AND COYRIGHT
This implimentation Copyright (C) Lee Goddard, 2003. All Rights Reserved.
Available under the same terms as Perl itself.