NAME

AI::NeuralNet::Kohonen - Kohonen's Self-organising Maps

SYNOPSIS

$_ = AI::NeuralNet::Kohonen->new(
	map_dim_x => 39,
	map_dim_y => 19,
	epochs    => 100,
	table     =>
"R G B
1 0 0
0 1 0
0 0 1
1 1 0
1 0 1
0 1 1
1 1 1
");

$_->dump;
$_->tk_dump;

$_->train;

$_->dump;
$_->tk_dump;
exit;

DESCRIPTION

An illustrative implimentation of Kohonen's Self-organising Feature Maps (SOMs) in Perl.

It's not fast - it's illustrative.

In fact, it's slow: but it's illustrative....

Have a look at AI::NeuralNet::Kohonen::Demo::RGB.

I'll add some more text here later.

DEPENDENCIES

None

EXPORTS

None

CONSTRUCTOR new

Instantiates object fields:

input_file

A SOM_PAK training file to load. This does not prevent other input methods (input, table) being processed, but it does over-ride any specifications (weight_dim) which may have been explicitly handed to the constructor.

input

A reference to an array of training vectors, within which each vector is represented by an array:

[ [v1a, v1b, v1c], [v2a,v2b,v2c], ..., [vNa,vNb,vNc] ]

See also table.

table

A scalar that is a table, lines delimited by [\r\f\n]+, columns by whitespace, initial whitespace stripped. First line should be column names, the following lines should be just data. See also input.

input_names

A name for each dimension of the input vectors.

map_dim_x
map_dim_y

The dimensions of the feature map to create - defaults to a toy 19. (note: this is Perl indexing, starting at zero).

epochs

Number of epochs to run for (see "METHOD train").

learning_rate

The initial learning rate.

train_start

Reference to code to call at the begining of training.

epoch_end

Reference to code to call at the end of every epoch (such as a display routine).

train_end

Reference to code to call at the end of training.

targeting

If undefined, random targets are chosen; otherwise they're iterated over. Just for experimental purposes.

smoothing

The amount of smoothing to apply by default when smooth is applied (see "METHOD smooth").

Private fields:

time_constant

The number of iterations (epochs) to be completed, over the log of the map radius.

t

The current epoch, or moment in time.

l

The current learning rate.

map_dim_a

Average of the map dimensions.

METHOD train

Optionally accepts a parameter that is the number of epochs to train for - default is the value in the epochs field.

For every epoch, iterates:

- selects a random target from the input array;
- finds the best bmu
- adjusts neighbours of the bmu
- decays the learning rate

METHOD find_bmu

Find the Best Matching Unit in the map and return the x/y index.

Accepts: a reference to an array that is the target.

Returns: a reference to an array that is the BMU (and should perhaps be abstracted as an object in its own right), indexed as follows:

0

euclidean distance from the supplied target

1

x co-ordinate in the map

2

y co-ordinate in the map

See "METHOD get_weight_at", and "distance_from" in AI::NeuralNet::Kohonen::Node,

METHOD get_weight_at

Returns a reference to the weight array at the supplied x,y co-ordinates.

Accepts: x,y co-ordinates, each a scalar.

Returns: reference to an array that is the weight of the node, or undef on failure.

PRIVATE METHOD find_bmu

Depreciated - should have been public to begin with.

METHOD get_results

Finds and returns the results for all input vectors (input), placing the values in the array reference that is the results field, and, depending on calling context, returning it either as an array or as it is.

Individual results are in the array format as described in "METHOD find_bmu".

See "METHOD find_bmu", and "METHOD get_weight_at".

METHOD dump

Print the current weight values to the screen.

METHOD smooth

Perform gaussian smoothing upon the map.

Accepts: the length of the side of the square gaussian mask to apply. If not supplied, uses the value in the field smoothing; if that is empty, uses the square root of the average of the map dimensions (map_dim_a).

Returns: a true value.

METHOD tk_dump;

Extended and moved to the package AI::NeuralNet::Kohonen::Demo::RGB.

PRIVATE METHOD _select_target

Return a random target from the training set in the input field, unless the targeting field is defined, when the targets are iterated over.

PRIVATE METHOD _adjust_neighbours_of

Accepts: a reference to an array containing the distance of the BMU from the target, and the x and y co-ordinates of the BMU in the map; a reference to an array that is the target.

Returns: true.

FINDING THE NEIGHBOURS OF THE BMU

                        (      t   )
sigma(t) = sigma(0) exp ( - ------ )
                        (   lambda )

Where sigma is the width of the map at any stage in time (t), and lambda is a time constant.

Lambda is our field time_constant.

The map radius is naturally just half the map width.

ADJUSTING THE NEIGHBOURS OF THE BMU

W(t+1) = W(t) + THETA(t) L(t)( V(t)-W(t) )

Where L is the learning rate, V the target vector, and W the weight. THETA(t) represents the influence of distance from the BMU upon a node's learning, and is calculated by the Node class - see "distance_effect" in AI::NeuralNet::Kohonen::Node.

PRIVATE METHOD _decay_learning_rate

Performs a gaussian decay upon the learning rate (our l field).

              (       t   )
L(t) = L  exp ( -  ------ )
        0     (    lambda )

PRIVATE FUNCTION _make_gaussian_mask

Accepts: size of mask.

Returns: reference to a 2d array that is the mask.

PRIVATE FUNCTION _gauss_weight

Accepts: two paramters: the first, r, gives the distance from the mask centre, the second, sigma, specifies the width of the mask.

Returns the gaussian weight.

See also _decay_learning_rate.

FILE FORMAT

SOM_PAK file format version 3.1 (April 7, 1995), Helsinki University of Technology, Espoo:

    The input data is stored in ASCII-form as a list of entries, one line ...for each vectorial sample.

    The first line of the file is reserved for status knowledge of the entries; in the present version it is used to define the following items (these items MUST occur in the indicated order):

    - Dimensionality of the vectors (integer, compulsory).
    - Topology type, either hexa or rect (string, optional, case-sensitive).
    - Map dimension in x-direction (integer, optional).
    - Map dimension in y-direction (integer, optional).
    - Neighborhood type, either bubble or gaussian (string, optional, case-sen-
       sitive).

    ...

    Subsequent lines consist of n floating-point numbers followed by an optional class label (that can be any string) and two optional qualifiers (see below) that determine the usage of the corresponding data entry in training programs. The data files can also contain an arbitrary number of comment lines that begin with '#', and are ignored. (One '#' for each comment line is needed.)

    If some components of some data vectors are missing (due to data collection failures or any other reason) those components should be marked with 'x'...[in processing, these] are ignored.

Not implimented (yet):

I<neighbourhood type>, which is always gaussian.
i<x> for missing data.
class labels
the two optional qualifiers

Requires: a path to a file.

Returns undef on failure.

METHOD save_file

Saves the map file in SOM_PAK format (see "METHOD load_file") at the path specified in the first argument.

Return undef on failure, a true value on success.

SEE ALSO

See "distance_from" in AI::NeuralNet::Kohonen::Node; AI::NeuralNet::Kohonen::Demo::RGB.

A very nice explanation of Kohonen's algorithm: AI-Junkie SOM tutorial part 1

AUTHOR AND COYRIGHT

This implimentation Copyright (C) Lee Goddard, 2003. All Rights Reserved.

Available under the same terms as Perl itself.

3 POD Errors

The following errors were encountered while parsing the POD:

Around line 281:

Expected text after =item, not a number

Around line 285:

Expected text after =item, not a number

Around line 612:

=back without =over