NAME

AI::Perceptron::Simple

A Newbie Friendly Module to Create, Train, Validate and Test Perceptrons / Neurons

VERSION

Version 1.00

SYNOPSIS

#!/usr/bin/perl

use AI::Perceptron::Simple;

# create a new nerve / neuron / perceptron
$perceptron = AI::Perceptron::Simple->new( {
    initial_value => $any_value_that_makes_sense, # size of each dendrite :)
    learning_rate => 0.3, # optional
    threshold => 0.85, # optional
    attribs => \@attributes, # dendrites: array ref of header names in csv file to train
} );

# training
$perceptron->train( $training_data_csv, $expected_column_name, $save_nerve_to );
# or
$perceptron->train(
    $training_data_csv, $expected_column_name, $save_nerve_to, 
    $show_progress, $identifier); # these two parameters must go together


# validating
# fill results to original file
$perceptron->validate( { 
    stimuli_validate => $validation_data_csv, 
    predicted_column_index => 4,
 } );
# or        
# fill results to a new file
$perceptron->validate( {
    stimuli_validate => $validation_data_csv,
    predicted_column_index => 4,
    results_write_to => $new_csv
} );


# testing, parameters same as validate
$perceptron->test( { 
    stimuli_validate => $testing_data_csv, 
    predicted_column_index => 4,
 } );
# or        
# fill results to a new file
$perceptron->test( {
    stimuli_validate => $testing_data_csv,
    predicted_column_index => 4,
    results_write_to => $new_csv
} );


# confusion matrix
my %c_matrix = $perceptron->get_confusion_matrix( { 
    full_data_file => $file_csv, 
    actual_output_header => $header_name,
    predicted_output_header => $predicted_header_name
} );

# accessing the confusion matrix
my @keys = qw( true_positive true_negative false_positive false_negative 
               total_entries accuracy sensitivity );
for ( @keys ) {
    print $_, " => ", $c_matrix{ $_ }, "\n";
}

# output to console
$perceptron->display_confusion_matrix( \%c_matrix, { 
    zero_as => "bad apples", # cat  milk   green  etc.
    one_as => "good apples", # dog  honey  pink   etc.
} );


# save data of the trained perceptron
my $nerve_file = "apples.nerve";
AI::Perceptron::Simple::save_perceptron( $perceptron, $nerve_file );

# load data of percpetron for use in actual program
my $apple_nerve = AI::Perceptron::Simple::load_perceptron( $nerve_file ); # :)

DESCRIPTION

This module provides methods to build, train, validate and test a perceptron. It can also save the data of the perceptron for future use for any actual AI programs.

This module is also aimed to help newbies grasp hold of the concept of perceptron, training, validation and testing as much as possible. Hence, all the methods and subroutines in this module are decoupled as much as possible so that the actual scripts can be written as simple complete programs.

The implementation here is super basic as it only takes in input of the dendrites and calculate the output. If the output is higher than the threshold, the final result (category) will be 1 aka perceptron is activated. If not, then the result will be 0 (not activated).

Depending on how you view or categorize the final result, the perceptron will fine tune itself (aka train) based on the learning rate until the desired result is met. Everything from here on is all mathematics and numbers which only makes sense to the computer and not humans anymore.

Whenever the perceptron fine tunes itself, it will increase/decrease all the dendrites that is significant (attributes labelled 1) for each input. This means that even when the perceptron successfully fine tunes itself to suite all the data in your file for the first round, the perceptron might still get some of the things wrong for the next round of training. Therefore, the perceptron should be trained for as many rounds as possible. The more "confusion" the perceptron is able to correctly handle, the more "mature" the perceptron is. No one defines how "mature" it is except the programmer himself/herself :)

EXPORT

None.

Almost everything is OO with some exceptions of course :)

CONVENTIONS USED

Please take note that not all subroutines/method must be used to make things work. All the subroutines and methods are listed out for the sake of writing the documentation.

Private methods/subroutines are prefixed with _ or &_ and they aren't meant to be called directly. You can if you want to.

"Synonyms" are placed before the actual subroutines/methods with the actual/technical terminologies. You will see ... as the parameters if they are synonyms. So move to the next subroutine/method until you find something like \%options as the parameter or anything that isn't ...

DATASETS STRUCTURE

Any field ie columns that will be used for processing must be binary ie. either 0 or 1 only. Your dataset can contain other columns with non-binary data as long as they are not use for the calculation.

Since there isn't any tutorial written for this module yet, you might need to go and find the data (CSV) files in the t directory. The original dataset can also be found in docs/book_list.csv.

This module can only process CSV files.

PERCEPTRON DATA

The perceptron/neuron data is stored using the Storable module. More file types might be supported in the future.

CREATION RELATED SUBROUTINES/METHODS

new ( \%options )

Creates a brand new perceptron and initializes the value of each ATTRIBUTE or "thickness" of the dendrites :)

For %options, the followings are needed unless mentioned:

initial_value => $decimal

The value or thickness :) of ALL the dendrites when a new perceptron is created.

Generally speaking, this value is usually between 0 and 1. However, it all depend on your combination of numbers for the other options.

attribs => $array_ref

An array reference containing all the attributes the perceptron should have.

learning_rate => $decimal

Optional. The default is 0.05.

The learning rate or the "rest duration" of the perceptron for the fine-tuning process (between 0 and 1).

Generally speaking, the smaller the value the better. This value is usually between 0 and 1. However, it all depend on your combination of numbers for the other options.

threshold => $decimal

Optional. The default is 0.5

This is the passing rate to determine the neuron output (0 or 1).

Generally speaking, this value is usually between 0 and 1. However, it all depend on your combination of numbers for the other options.

get_attributes

Obtains a hash of all the attributes of the perceptron

learning_rate ( $value )

learning_rate

If $value is given, sets the learning rate to $value. If not, then it returns the learning rate.

The $value should be between 0 and 1. Default is 0.05

threshold ( $value )

threshold

If $value is given, sets the threshold / passing rate to $value. If not, then it returns the passing rate.

The $value should be between 0 and 1. Default is 0.5

TRAINING RELATED SUBROUTINES/METHODS

All the training methods here have the same parameters as the two actual train method and they all do the same stuff. They are also used in the same way.

tame ( ... )

exercise ( ... )

train ( $stimuli_train_csv, $expected_output_header, $save_nerve_to_file )

train ( $stimuli_train_csv, $expected_output_header, $save_nerve_to_file, $display_stats, $identifier )

Trains the perceptron.

$stimuli_train_csv is the set of data / input (in CSV format) to train the perceptron while $save_nerve_to_file is the filename that will be generate each time the perceptron finishes the training process. This data file is the data of the AI::Perceptron::Simple object and it is used in the validate method.

$expected_output_header is the header name of the columns in the csv file with the actual category or the exepcted values. This is used to determine to tune the nerve up or down. This value should only be 0 or 1 for the sake of simplicity.

$display_stats is optional and the default is 0. It will display more output about the tuning process. It will show the followings:

tuning status

Indicates the nerve was tuned up or down

old sum

The original sum of all weightage * input

threshold

The threshold of the nerve

new sum

The new sum of all weightage * input after fine-tuning the nerve

If $display_stats is set to 1, then you must specify the $identifier. This is the column / header name that is used to identify a specific row of data in $stimuli_train_csv.

&_calculate_output( $self, \%stimuli_hash )

Calculates and returns the sum(weightage*input) for each individual row of data. For the coding part, it justs add up all the existing weight since input is always 1 for now :)

%stimuli_hash is the actual data to be used for training. It might contain useless columns.

This will get all the avaible dendrites through the get_attributes method and then use all the keys ie. headers to access the corresponding values.

This subroutine should be called in the procedural way for now.

&_tune( $self, \%stimuli_hash, $tune_up_or_down )

Fine tunes the nerve. This will directly alter the attributes values in $self according to the attributes / dendrites specified in new.

The %stimuli_hash here is the same as the one in the _calculate_output method.

%stimuli_hash will be used to determine which dendrite in $self needs to be fine-tuned. As long as the value of any key in %stimuli_hash returns true (1) then that dendrite in $self will be tuned.

Tuning up or down depends on $tune_up_or_down specifed by the &_calculate_output subroutine. The following constants can be used for $tune_up_or_down:

TUNE_UP

Value is 1

TUNE_DOWN

Value is 0

This subroutine should be called in the procedural way for now.

VALIDATION RELATED METHODS

All the validation methods here have the same parameters as the actual validate method and they all do the same stuff. They are also used in the same way.

take_mock_exam (...)

take_lab_test (...)

validate ( \%options )

This method validates the perceptron against another set of data after it has undergone the training process.

This method calculates the output of each row of data and write the result into the predicted column. The data begin written into the new file or the original file will maintain it's sequence.

Please take note that this method will load all the data of the validation stimuli, so please split your stimuli into multiple files if possible and call this method a few more times.

For %options, the followings are needed unless mentioned:

stimuli_validate => $csv_file

This is the CSV file containing the validation data, make sure that it contains a column with the predicted values as it is needed in the next key mentioned: predicted_column_index

predicted_column_index => $column_number

This is the index of the column that contains the predicted output values. $index starts from 0.

This part will be filled and saved to results_write_to.

results_write_to => $new_csv_file

Optional.

The default behaviour will write the predicted output into stimuli_validate ie the original data while maintaining the original sequence.

*This method will call &_real_validate_or_test to do the actual work.

TESTING RELATED SUBROUTINES/METHODS

All the testing methods here have the same parameters as the actual test method and they all do the same stuff. They are also used in the same way.

take_real_exam (...)

work_in_real_world (...)

test ( \%options )

This method is used to put the trained nerve to the test. You can think of it as deploying the nerve for the actual work.

This method works and behaves the same way as the validate method. See validate for the details.

*This method will call &_real_validate_or_test to do the actual work.

_real_validate_or_test ( $data_hash_ref )

This is where the actual validation or testing takes place.

$data_hash_ref is the list of parameters passed into the validate or test methods.

This is a method, so use the OO way. This is one of the exceptions to the rules where private subroutines are treated as methods :)

&_fill_predicted_values ( $self, $stimuli_validate, $predicted_index, $aoa )

This is where the filling in of the predicted values takes place. Take note that the parameters naming are the same as the ones used in the validate and test method.

This subroutine should be called in the procedural way.

RESULTS RELATED SUBROUTINES/METHODS

This part is related to generating the confusion matrix.

get_exam_results ( ... )

The parameters and usage are the same as get_confusion_matrix. See the next method.

get_confusion_matrix ( \%options )

Returns the confusion matrix in the form of a hash. The hash will contain these keys: true_positive, true_negative, false_positive, false_negative, accuracy, sensitivity.

Take note that the accuracy and sensitivity are in percentage (%) in decimal (if any).

For %options, the followings are needed unless mentioned:

full_data_file => $filled_test_file

This is the CSV file filled with the predicted values.

Make sure that you don't do anything to the actual and predicted output in this file after testing the nerve. These two columns must contain binary values only!

actual_output_header => $actual_column_name
predicted_output_header => $predicted_column_name

The binary values are treated as follows:

0 is negative
1 is positive

&_collect_stats ( \%options )

Generates a hash of confusion matrix based on %options given in the get_confusion_matrix method.

&_calculate_total_entries ( $c_matrix_ref )

Calculates and adds the data for the total_entries key in the confusion matrix hash.

&_calculate_accuracy ( $c_matrix_ref )

Calculates and adds the data for the accuracy key in the confusion matrix hash.

&_calculate_sensitivity ( $c_matrix_ref )

Calculates and adds the data for the sensitivity key in the confusion matrix hash.

display_exam_results ( ... )

The parameters are the same as display_confusion_matrix. See the next method.

display_confusion_matrix ( \%confusion_matrix, \%labels )

Display the confusion matrix.

%confusion_matrix is the same confusion matrix returned by the get_confusion_matrix method.

Surely 0's and 1's don't make much sense for the output. Therefore, for %labels, the following keys must be specified:

zero_as => $category_zero_name
one_as => $category_one_name

Please take note that non-ascii characters ie. non-English alphabets will cause the output to go off :)

For the %labels, there is no need to enter "actual X", "predicted X" etc. It will be indicated with A: for actual and P: for the predicted values.

NERVE DATA RELATED SUBROUTINES

This part is about saving the state of the nerve.

The subroutines are to be called in the procedural way. No checking is done currently.

The subroutines here are not exported in any way whatsoever.

preserve ( ... )

The parameters and usage are the same as save_perceptron. See the next subroutine.

save_perceptron ( $nerve_file )

Saves the AI::Perceptron::Simple object into a Storable file. There shouldn't be a need to call this method manually since after every training process this will be called automatically.

revive (...)

The parameters and usage are the same as load_perceptron. See the next subroutine.

load_perceptron ( $nerve_file_to_load )

Loads the data and turns it into a AI::Perceptron::Simple object as the return value.

TO DO

These are the to-do's that MIGHT be done in the future. Don't put too much hope in them please :)

Clean up source codes
Implement shuffling data feature
Implement fast/smart training feature
Write a tutorial or something for this module
and something yet to be known...

AUTHOR

Raphael Jong Jun Jie, <ellednera at cpan.org>

BUGS

Please report any bugs or feature requests to bug-ai-perceptron-simple at rt.cpan.org, or through the web interface at https://rt.cpan.org/NoAuth/ReportBug.html?Queue=AI-Perceptron-Simple. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

perldoc AI::Perceptron::Simple

You can also look for information at:

ACKNOWLEDGEMENTS

Besiyata d'shmaya, Wikipedia

SEE ALSO

AI::Perceptron, Text::Matrix

LICENSE AND COPYRIGHT

This software is Copyright (c) 2021 by Raphael Jong Jun Jie.

This is free software, licensed under:

The Artistic License 2.0 (GPL Compatible)