NAME
AI::Perceptron::Simple
A Newbie Friendly Module to Create, Train, Validate and Test Perceptrons / Neurons
VERSION
Version 1.02
SYNOPSIS
#!/usr/bin/perl
use AI::Perceptron::Simple qw(...);
# create a new nerve / neuron / perceptron
$perceptron = AI::Perceptron::Simple->new( {
initial_value => $any_value_that_makes_sense, # size of each dendrite :)
learning_rate => 0.3, # optional
threshold => 0.85, # optional
attribs => \@attributes, # dendrites
} );
# train
$perceptron->tame( ... );
$perceptron->exercise( ... );
$perceptron->train( $training_data_csv, $expected_column_name, $save_nerve_to );
# or
$perceptron->train(
$training_data_csv, $expected_column_name, $save_nerve_to,
$show_progress, $identifier); # these two parameters must go together
# validate
$perceptron->take_lab_test( ... );
$perceptron->take_mock_exam( ... );
# fill results to original file
$perceptron->validate( {
stimuli_validate => $validation_data_csv,
predicted_column_index => 4,
} );
# or
# fill results to a new file
$perceptron->validate( {
stimuli_validate => $validation_data_csv,
predicted_column_index => 4,
results_write_to => $new_csv
} );
# test - see "validate" method, same usage
$perceptron->take_real_exam( ... );
$perceptron->work_in_real_world( ... );
$perceptron->test( ... );
# confusion matrix
my %c_matrix = $perceptron->get_confusion_matrix( {
full_data_file => $file_csv,
actual_output_header => $header_name,
predicted_output_header => $predicted_header_name
} );
# accessing the confusion matrix
my @keys = qw( true_positive true_negative false_positive false_negative
total_entries accuracy sensitivity );
for ( @keys ) {
print $_, " => ", $c_matrix{ $_ }, "\n";
}
# output to console
$perceptron->display_confusion_matrix( \%c_matrix, {
zero_as => "bad apples", # cat milk green etc.
one_as => "good apples", # dog honey pink etc.
} );
# saving and loading data of perceptron locally
# NOTE: nerve data is automatically saved after each trainning process
use AI::Perceptron::Simple ":local_data";
my $nerve_file = "apples.nerve";
preserve( ... );
save_perceptron( $perceptron, $nerve_file );
# load data of percpetron for use in actual program
my $apple_nerve = revive( ... );
my $apple_nerve = load_perceptron( $nerve_file );
# for portability of nerve data
use AI::Perceptron::Simple ":portable_data";
my $yaml_nerve_file = "pearls.yaml";
preserve_as_yaml ( ... );
save_perceptron_yaml ( $perceptron, $yaml_nerve_file );
# load nerve data on the other computer
my $pearl_nerve = revive_from_yaml ( ... );
my $pearl_nerve = load_perceptron_yaml ( $yaml_nerve_file );
EXPORT
None by default.
All the subroutines from NERVE DATA RELATED SUBROUTINES
and NERVE PORTABILITY RELATED SUBROUTINES
sections are exportable.
Export tags include the following:
:local_data
- subroutines underNERVE DATA RELATED SUBROUTINES
section.:portable_data
- subroutines underNERVE PORTABILITY RELATED SUBROUTINES
section.
Most of the stuff are OO.
DESCRIPTION
This module provides methods to build, train, validate and test a perceptron. It can also save the data of the perceptron for future use for any actual AI programs.
This module is also aimed to help newbies grasp hold of the concept of perceptron, training, validation and testing as much as possible. Hence, all the methods and subroutines in this module are decoupled as much as possible so that the actual scripts can be written as simple complete programs.
The implementation here is super basic as it only takes in input of the dendrites and calculate the output. If the output is higher than the threshold, the final result (category) will be 1 aka perceptron is activated. If not, then the result will be 0 (not activated).
Depending on how you view or categorize the final result, the perceptron will fine tune itself (aka train) based on the learning rate until the desired result is met. Everything from here on is all mathematics and numbers which only makes sense to the computer and not humans anymore.
Whenever the perceptron fine tunes itself, it will increase/decrease all the dendrites that is significant (attributes labelled 1) for each input. This means that even when the perceptron successfully fine tunes itself to suite all the data in your file for the first round, the perceptron might still get some of the things wrong for the next round of training. Therefore, the perceptron should be trained for as many rounds as possible. The more "confusion" the perceptron is able to correctly handle, the more "mature" the perceptron is. No one defines how "mature" it is except the programmer himself/herself :)
CONVENTIONS USED
Please take note that not all subroutines/method must be used to make things work. All the subroutines and methods are listed out for the sake of writing the documentation.
Private methods/subroutines are prefixed with _
or &_
and they aren't meant to be called directly. You can if you want to. There are quite a number of them to be honest, just ignore them if you happen to see them :)
Synonyms are placed before the actual ie. technical subroutines/methods. You will see ...
as the parameters if they are synonyms. Move to the next subroutine/method until you find something like \%options
as the parameter or anything that isn't ...
for the description.
DATASET STRUCTURE
This module can only process CSV files.
Any field ie columns that will be used for processing must be binary ie. 0
or 1
only. Your dataset can contain other columns with non-binary data as long as they are not one of the dendrites.
There are soem sample dataset which can be found in the t
directory. The original dataset can also be found in docs/book_list.csv
. The files can also be found here.
PERCEPTRON DATA
The perceptron/neuron data is stored using the Storable
module.
See Portability of Nerve Data
section below for more info on some known issues.
CREATION RELATED SUBROUTINES/METHODS
new ( \%options )
Creates a brand new perceptron and initializes the value of each attribute / densrite aka. weight. Think of it as the thickness or plasticity of the dendrites.
For %options
, the followings are needed unless mentioned:
- initial_value => $decimal
-
The value or thickness of ALL the dendrites when a new perceptron is created.
Generally speaking, this value is usually between 0 and 1. However, it all depend on your combination of numbers for the other options.
- attribs => $array_ref
-
An array reference containing all the attributes / dendrites names. Yes, give them some names :)
- learning_rate => $decimal
-
Optional. The default is
0.05
.The learning rate of the perceptron for the fine-tuning process.
This value is usually between 0 and 1. However, it all depends on your combination of numbers for the other options.
- threshold => $decimal
-
Optional. The default is
0.5
This is the passing rate to determine the neuron output (
0
or1
).Generally speaking, this value is usually between
0
and1
. However, it all depend on your combination of numbers for the other options.
get_attributes
Obtains a hash of all the attributes of the perceptron
learning_rate ( $value )
learning_rate
If $value
is given, sets the learning rate to $value
. If not, then it returns the learning rate.
threshold ( $value )
threshold
If $value
is given, sets the threshold / passing rate to $value
. If not, then it returns the passing rate.
TRAINING RELATED SUBROUTINES/METHODS
All the training methods here have the same parameters as the two actual train
method and they all do the same stuff. They are also used in the same way.
tame ( ... )
exercise ( ... )
train ( $stimuli_train_csv, $expected_output_header, $save_nerve_to_file )
train ( $stimuli_train_csv, $expected_output_header, $save_nerve_to_file, $display_stats, $identifier )
Trains the perceptron.
$stimuli_train_csv
is the set of data / input (in CSV format) to train the perceptron while $save_nerve_to_file
is the filename that will be generate each time the perceptron finishes the training process. This data file is the data of the AI::Perceptron::Simple
object and it is used in the validate
method.
$expected_output_header
is the header name of the columns in the csv file with the actual category or the exepcted values. This is used to determine to tune the nerve up or down. This value should only be 0 or 1 for the sake of simplicity.
$display_stats
is optional and the default is 0. It will display more output about the tuning process. It will show the followings:
- tuning status
-
Indicates the nerve was tuned up, down or no tuning needed
- old sum
-
The original sum of all
weightage * input
ordendrite_size * binary_input
- threshold
-
The threshold of the nerve
- new sum
-
The new sum of all
weightage * input
after fine-tuning the nerve
If $display_stats
is specified ie. set to 1
, then you MUST specify the $identifier
. $identifier
is the column / header name that is used to identify a specific row of data in $stimuli_train_csv
.
&_calculate_output( $self, \%stimuli_hash )
Calculates and returns the sum(weightage*input)
for each individual row of data. Actually, it justs add up all the existing weight since the input
is always 1 for now :)
%stimuli_hash
is the actual data to be used for training. It might contain useless columns.
This will get all the avaible dendrites using the get_attributes
method and then use all the keys ie. headers to access the corresponding values.
This subroutine should be called in the procedural way for now.
&_tune( $self, \%stimuli_hash, $tune_up_or_down )
Fine tunes the nerve. This will directly alter the attributes values in $self
according to the attributes / dendrites specified in new
.
The %stimuli_hash
here is the same as the one in the _calculate_output
method.
%stimuli_hash
will be used to determine which dendrite in $self
needs to be fine-tuned. As long as the value of any key in %stimuli_hash
returns true (1) then that dendrite in $self
will be tuned.
Tuning up or down depends on $tune_up_or_down
specifed by the train
method. The following constants can be used for $tune_up_or_down
:
- TUNE_UP
-
Value is
1
- TUNE_DOWN
-
Value is
0
This subroutine should be called in the procedural way for now.
VALIDATION RELATED METHODS
All the validation methods here have the same parameters as the actual validate
method and they all do the same stuff. They are also used in the same way.
take_mock_exam (...)
take_lab_test (...)
validate ( \%options )
This method validates the perceptron against another set of data after it has undergone the training process.
This method calculates the output of each row of data and write the result into the predicted column. The data begin written into the new file or the original file will maintain it's sequence.
Please take note that this method will load all the data of the validation stimuli, so please split your stimuli into multiple files if possible and call this method a few more times.
For %options
, the followings are needed unless mentioned:
- stimuli_validate => $csv_file
-
This is the CSV file containing the validation data, make sure that it contains a column with the predicted values as it is needed in the next key mentioned:
predicted_column_index
- predicted_column_index => $column_number
-
This is the index of the column that contains the predicted output values.
$index
starts from0
.This column will be filled with binary numbers and the full new data will be saved to the file specified in the
results_write_to
key. - results_write_to => $new_csv_file
-
Optional.
The default behaviour will write the predicted output back into
stimuli_validate
ie the original data. The sequence of the data will be maintained.
*This method will call _real_validate_or_test
to do the actual work.
TESTING RELATED SUBROUTINES/METHODS
All the testing methods here have the same parameters as the actual test
method and they all do the same stuff. They are also used in the same way.
take_real_exam (...)
work_in_real_world (...)
test ( \%options )
This method is used to put the trained nerve to the test. You can think of it as deploying the nerve for the actual work or maybe putting the nerve into an empty brain and see how well the brain survives :)
This method works and behaves the same way as the validate
method. See validate
for the details.
*This method will call &_real_validate_or_test to do the actual work.
_real_validate_or_test ( $data_hash_ref )
This is where the actual validation or testing takes place.
$data_hash_ref
is the list of parameters passed into the validate
or test
methods.
This is a method, so use the OO way. This is one of the exceptions to the rules where private subroutines are treated as methods :)
&_fill_predicted_values ( $self, $stimuli_validate, $predicted_index, $aoa )
This is where the filling in of the predicted values takes place. Take note that the parameters naming are the same as the ones used in the validate
and test
method.
This subroutine should be called in the procedural way.
RESULTS RELATED SUBROUTINES/METHODS
This part is related to generating the confusion matrix.
get_exam_results ( ... )
The parameters and usage are the same as get_confusion_matrix
. See the next method.
get_confusion_matrix ( \%options )
Returns the confusion matrix in the form of a hash. The hash will contain these keys: true_positive
, true_negative
, false_positive
, false_negative
, accuracy
, sensitivity
.
Take note that the accuracy
and sensitivity
are in percentage (%) in decimal (if any).
For %options
, the followings are needed unless mentioned:
- full_data_file => $filled_test_file
-
This is the CSV file filled with the predicted values.
Make sure that you don't do anything to the actual and predicted output in this file after testing the nerve. These two columns must contain binary values only!
- actual_output_header => $actual_column_name
- predicted_output_header => $predicted_column_name
The binary values are treated as follows:
&_collect_stats ( \%options )
Generates a hash of confusion matrix based on %options
given in the get_confusion_matrix
method.
&_calculate_total_entries ( $c_matrix_ref )
Calculates and adds the data for the total_entries
key in the confusion matrix hash.
&_calculate_accuracy ( $c_matrix_ref )
Calculates and adds the data for the accuracy
key in the confusion matrix hash.
&_calculate_sensitivity ( $c_matrix_ref )
Calculates and adds the data for the sensitivity
key in the confusion matrix hash.
display_exam_results ( ... )
The parameters are the same as display_confusion_matrix
. See the next method.
display_confusion_matrix ( \%confusion_matrix, \%labels )
Display the confusion matrix.
%confusion_matrix
is the same confusion matrix returned by the get_confusion_matrix
method.
For %labels
, since 0
's and 1
's won't make much sense as the output in most cases, therefore, the following keys must be specified:
Please take note that non-ascii characters ie. non-English alphabets might cause the output to go off :)
For the %labels
, there is no need to enter "actual X", "predicted X" etc. It will be prefixed with A:
for actual and P:
for the predicted values by default.
&_build_matrix ( $c_matrix, $labels )
Builds the matrix using Text::Matrix
module.
$c_matrix
and $labels
are the same as the ones passed to display_exam_results
and display_confusion_matrix.
Returns a list ( $matrix, $c_matrix )
which can directly be passed to _print_extended_matrix
.
&_print_extended_matrix ( $matrix, $c_matrix )
Extends and outputs the matrix on the screen.
$matrix
and $c_matrix
are the same as returned by &_build_matrix
.
NERVE DATA RELATED SUBROUTINES
This part is about saving the data of the nerve. These subroutines can be exported using the :local_data
tag.
The subroutines are to be called in the procedural way. No checking is done currently.
The subroutines here are not exported in any way whatsoever.
See PERCEPTRON DATA
and KNOWN ISSUES
sections for more details on the subroutines in this section.
preserve ( ... )
The parameters and usage are the same as save_perceptron
. See the next subroutine.
save_perceptron ( $nerve_file )
Saves the AI::Perceptron::Simple
object into a Storable
file. There shouldn't be a need to call this method manually since after every training process this will be called automatically.
revive (...)
The parameters and usage are the same as load_perceptron
. See the next subroutine.
load_perceptron ( $nerve_file_to_load )
Loads the data and turns it into a AI::Perceptron::Simple
object as the return value.
NERVE PORTABILITY RELATED SUBROUTINES
These subroutines can be exported using the :portable_data
tag.
The file type currently supported is YAML. Please be careful with the data as you won't want the nerve data accidentally modified.
preserve_as_yaml ( ... )
The parameters and usage are the same as save_perceptron_yaml
. See the next subroutine.
save_perceptron_yaml ( $nerve_file )
Saves the AI::Perceptron::Simple
object into a YAML
file.
revive_from_yaml (...)
The parameters and usage are the same as load_perceptron
. See the next subroutine.
load_perceptron_yaml ( $yaml_nerve_file )
Loads the YAML data and turns it into a AI::Perceptron::Simple
object as the return value.
TO DO
These are the to-do's that MIGHT be done in the future. Don't put too much hope in them please :)
Clean up and refactor source codes
Add more useful data for confusion matrix
Implement shuffling data feature
Implement fast/smart training feature
Write a tutorial or something for this module
and something yet to be known...
KNOWN ISSUES
Portability of Nerve Data
Take note that the Storable
nerve data is not compatible across different versions. See Storable
's documentation for more information.
If you really need to send the nerve data to different computers with different versions of Storable
module, see the docs of the following subroutines:
&preserve_as_yaml
or&save_perceptron_yaml
for storing data.&revive_from_yaml
or&load_perceptron_yaml
for retrieving the data.
AUTHOR
Raphael Jong Jun Jie, <ellednera at cpan.org>
BUGS
Please report any bugs or feature requests to bug-ai-perceptron-simple at rt.cpan.org
, or through the web interface at https://rt.cpan.org/NoAuth/ReportBug.html?Queue=AI-Perceptron-Simple. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
SUPPORT
You can find documentation for this module with the perldoc command.
perldoc AI::Perceptron::Simple
You can also look for information at:
RT: CPAN's request tracker (report bugs here)
https://rt.cpan.org/NoAuth/Bugs.html?Dist=AI-Perceptron-Simple
CPAN Ratings
Search CPAN
ACKNOWLEDGEMENTS
Besiyata d'shmaya, Wikipedia
SEE ALSO
AI::Perceptron, Text::Matrix
LICENSE AND COPYRIGHT
This software is Copyright (c) 2021 by Raphael Jong Jun Jie.
This is free software, licensed under:
The Artistic License 2.0 (GPL Compatible)