NAME
AI::Perceptron::Simple
A Newbie Friendly Module to Create, Train, Validate and Test Perceptrons / Neurons
VERSION
Version 1.00
SYNOPSIS
#!/usr/bin/perl
use AI::Perceptron::Simple;
# create a new nerve / neuron / perceptron
$perceptron = AI::Perceptron::Simple->new( {
initial_value => $any_value_that_makes_sense, # size of each dendrite :)
learning_rate => 0.3, # optional
threshold => 0.85, # optional
attribs => \@attributes, # dendrites: array ref of header names in csv file to train
} );
# training
$perceptron->train( $training_data_csv, $expected_column_name, $save_nerve_to );
# or
$perceptron->train(
$training_data_csv, $expected_column_name, $save_nerve_to,
$show_progress, $identifier); # these two parameters must go together
# validating
# fill results to original file
$perceptron->validate( {
stimuli_validate => $validation_data_csv,
predicted_column_index => 4,
} );
# or
# fill results to a new file
$perceptron->validate( {
stimuli_validate => $validation_data_csv,
predicted_column_index => 4,
results_write_to => $new_csv
} );
# testing, parameters same as validate
$perceptron->test( {
stimuli_validate => $testing_data_csv,
predicted_column_index => 4,
} );
# or
# fill results to a new file
$perceptron->test( {
stimuli_validate => $testing_data_csv,
predicted_column_index => 4,
results_write_to => $new_csv
} );
# confusion matrix
my %c_matrix = $perceptron->get_confusion_matrix( {
full_data_file => $file_csv,
actual_output_header => $header_name,
predicted_output_header => $predicted_header_name
} );
# accessing the confusion matrix
my @keys = qw( true_positive true_negative false_positive false_negative
total_entries accuracy sensitivity );
for ( @keys ) {
print $_, " => ", $c_matrix{ $_ }, "\n";
}
# output to console
$perceptron->display_confusion_matrix( \%c_matrix, {
zero_as => "bad apples", # cat milk green etc.
one_as => "good apples", # dog honey pink etc.
} );
# save data of the trained perceptron
my $nerve_file = "apples.nerve";
AI::Perceptron::Simple::save_perceptron( $perceptron, $nerve_file );
# load data of percpetron for use in actual program
my $apple_nerve = AI::Perceptron::Simple::load_perceptron( $nerve_file ); # :)
DESCRIPTION
This module provides methods to build, train, validate and test a perceptron. It can also save the data of the perceptron for future use for any actual AI programs.
This module is also aimed to help newbies grasp hold of the concept of perceptron, training, validation and testing as much as possible. Hence, all the methods and subroutines in this module are decoupled as much as possible so that the actual scripts can be written as simple complete programs.
The implementation here is super basic as it only takes in input of the dendrites and calculate the output. If the output is higher than the threshold, the final result (category) will be 1 aka perceptron is activated. If not, then the result will be 0 (not activated).
Depending on how you view or categorize the final result, the perceptron will fine tune itself (aka train) based on the learning rate until the desired result is met. Everything from here on is all mathematics and numbers which only makes sense to the computer and not humans anymore.
Whenever the perceptron fine tunes itself, it will increase/decrease all the dendrites that is significant (attributes labelled 1) for each input. This means that even when the perceptron successfully fine tunes itself to suite all the data in your file for the first round, the perceptron might still get some of the things wrong for the next round of training. Therefore, the perceptron should be trained for as many rounds as possible. The more "confusion" the perceptron is able to correctly handle, the more "mature" the perceptron is. No one defines how "mature" it is except the programmer himself/herself :)
EXPORT
None.
Almost everything is OO with some exceptions of course :)
CONVENTIONS USED
Please take note that not all subroutines/method must be used to make things work. All the subroutines and methods are listed out for the sake of writing the documentation.
Private methods/subroutines are prefixed with _
or &_
and they aren't meant to be called directly. You can if you want to.
"Synonyms" are placed before the actual subroutines/methods with the actual/technical terminologies. You will see ...
as the parameters if they are synonyms. So move to the next subroutine/method until you find something like \%options
as the parameter or anything that isn't ...
DATASETS STRUCTURE
Any field ie columns that will be used for processing must be binary ie. either 0 or 1 only. Your dataset can contain other columns with non-binary data as long as they are not use for the calculation.
Since there isn't any tutorial written for this module yet, you might need to go and find the data (CSV) files in the t
directory. The original dataset can also be found in docs/book_list.csv
.
This module can only process CSV files.
PERCEPTRON DATA
The perceptron/neuron data is stored using the Storable
module. More file types might be supported in the future.
CREATION RELATED SUBROUTINES/METHODS
new ( \%options )
Creates a brand new perceptron and initializes the value of each ATTRIBUTE or "thickness" of the dendrites :)
For %options
, the followings are needed unless mentioned:
- initial_value => $decimal
-
The value or thickness :) of ALL the dendrites when a new perceptron is created.
Generally speaking, this value is usually between 0 and 1. However, it all depend on your combination of numbers for the other options.
- attribs => $array_ref
-
An array reference containing all the attributes the perceptron should have.
- learning_rate => $decimal
-
Optional. The default is
0.05
.The learning rate or the "rest duration" of the perceptron for the fine-tuning process (between 0 and 1).
Generally speaking, the smaller the value the better. This value is usually between 0 and 1. However, it all depend on your combination of numbers for the other options.
- threshold => $decimal
-
Optional. The default is
0.5
This is the passing rate to determine the neuron output (0 or 1).
Generally speaking, this value is usually between 0 and 1. However, it all depend on your combination of numbers for the other options.
get_attributes
Obtains a hash of all the attributes of the perceptron
learning_rate ( $value )
learning_rate
If $value
is given, sets the learning rate to $value
. If not, then it returns the learning rate.
The $value
should be between 0
and 1
. Default is 0.05
threshold ( $value )
threshold
If $value
is given, sets the threshold / passing rate to $value
. If not, then it returns the passing rate.
The $value
should be between 0
and 1
. Default is 0.5
TRAINING RELATED SUBROUTINES/METHODS
All the training methods here have the same parameters as the two actual train
method and they all do the same stuff. They are also used in the same way.
tame ( ... )
exercise ( ... )
train ( $stimuli_train_csv, $expected_output_header, $save_nerve_to_file )
train ( $stimuli_train_csv, $expected_output_header, $save_nerve_to_file, $display_stats, $identifier )
Trains the perceptron.
$stimuli_train_csv
is the set of data / input (in CSV format) to train the perceptron while $save_nerve_to_file
is the filename that will be generate each time the perceptron finishes the training process. This data file is the data of the AI::Perceptron::Simple
object and it is used in the validate
method.
$expected_output_header
is the header name of the columns in the csv file with the actual category or the exepcted values. This is used to determine to tune the nerve up or down. This value should only be 0 or 1 for the sake of simplicity.
$display_stats
is optional and the default is 0. It will display more output about the tuning process. It will show the followings:
- tuning status
-
Indicates the nerve was tuned up or down
- old sum
-
The original sum of all
weightage * input
- threshold
-
The threshold of the nerve
- new sum
-
The new sum of all
weightage * input
after fine-tuning the nerve
If $display_stats
is set to 1
, then you must specify the $identifier
. This is the column / header name that is used to identify a specific row of data in $stimuli_train_csv
.
&_calculate_output( $self, \%stimuli_hash )
Calculates and returns the sum(weightage*input)
for each individual row of data. For the coding part, it justs add up all the existing weight since input
is always 1 for now :)
%stimuli_hash
is the actual data to be used for training. It might contain useless columns.
This will get all the avaible dendrites through the get_attributes
method and then use all the keys ie. headers to access the corresponding values.
This subroutine should be called in the procedural way for now.
&_tune( $self, \%stimuli_hash, $tune_up_or_down )
Fine tunes the nerve. This will directly alter the attributes values in $self
according to the attributes / dendrites specified in new
.
The %stimuli_hash
here is the same as the one in the _calculate_output
method.
%stimuli_hash
will be used to determine which dendrite in $self
needs to be fine-tuned. As long as the value of any key in %stimuli_hash
returns true (1) then that dendrite in $self
will be tuned.
Tuning up or down depends on $tune_up_or_down
specifed by the &_calculate_output
subroutine. The following constants can be used for $tune_up_or_down
:
- TUNE_UP
-
Value is
1
- TUNE_DOWN
-
Value is
0
This subroutine should be called in the procedural way for now.
VALIDATION RELATED METHODS
All the validation methods here have the same parameters as the actual validate
method and they all do the same stuff. They are also used in the same way.
take_mock_exam (...)
take_lab_test (...)
validate ( \%options )
This method validates the perceptron against another set of data after it has undergone the training process.
This method calculates the output of each row of data and write the result into the predicted column. The data begin written into the new file or the original file will maintain it's sequence.
Please take note that this method will load all the data of the validation stimuli, so please split your stimuli into multiple files if possible and call this method a few more times.
For %options
, the followings are needed unless mentioned:
- stimuli_validate => $csv_file
-
This is the CSV file containing the validation data, make sure that it contains a column with the predicted values as it is needed in the next key mentioned:
predicted_column_index
- predicted_column_index => $column_number
-
This is the index of the column that contains the predicted output values.
$index
starts from0
.This part will be filled and saved to
results_write_to
. - results_write_to => $new_csv_file
-
Optional.
The default behaviour will write the predicted output into
stimuli_validate
ie the original data while maintaining the original sequence.
*This method will call &_real_validate_or_test to do the actual work.
TESTING RELATED SUBROUTINES/METHODS
All the testing methods here have the same parameters as the actual test
method and they all do the same stuff. They are also used in the same way.
take_real_exam (...)
work_in_real_world (...)
test ( \%options )
This method is used to put the trained nerve to the test. You can think of it as deploying the nerve for the actual work.
This method works and behaves the same way as the validate
method. See validate
for the details.
*This method will call &_real_validate_or_test to do the actual work.
_real_validate_or_test ( $data_hash_ref )
This is where the actual validation or testing takes place.
$data_hash_ref
is the list of parameters passed into the validate
or test
methods.
This is a method, so use the OO way. This is one of the exceptions to the rules where private subroutines are treated as methods :)
&_fill_predicted_values ( $self, $stimuli_validate, $predicted_index, $aoa )
This is where the filling in of the predicted values takes place. Take note that the parameters naming are the same as the ones used in the validate
and test
method.
This subroutine should be called in the procedural way.
RESULTS RELATED SUBROUTINES/METHODS
This part is related to generating the confusion matrix.
get_exam_results ( ... )
The parameters and usage are the same as get_confusion_matrix
. See the next method.
get_confusion_matrix ( \%options )
Returns the confusion matrix in the form of a hash. The hash will contain these keys: true_positive
, true_negative
, false_positive
, false_negative
, accuracy
, sensitivity
.
Take note that the accuracy
and sensitivity
are in percentage (%) in decimal (if any).
For %options
, the followings are needed unless mentioned:
- full_data_file => $filled_test_file
-
This is the CSV file filled with the predicted values.
Make sure that you don't do anything to the actual and predicted output in this file after testing the nerve. These two columns must contain binary values only!
- actual_output_header => $actual_column_name
- predicted_output_header => $predicted_column_name
The binary values are treated as follows:
&_collect_stats ( \%options )
Generates a hash of confusion matrix based on %options
given in the get_confusion_matrix
method.
&_calculate_total_entries ( $c_matrix_ref )
Calculates and adds the data for the total_entries
key in the confusion matrix hash.
&_calculate_accuracy ( $c_matrix_ref )
Calculates and adds the data for the accuracy
key in the confusion matrix hash.
&_calculate_sensitivity ( $c_matrix_ref )
Calculates and adds the data for the sensitivity
key in the confusion matrix hash.
display_exam_results ( ... )
The parameters are the same as display_confusion_matrix
. See the next method.
display_confusion_matrix ( \%confusion_matrix, \%labels )
Display the confusion matrix.
%confusion_matrix
is the same confusion matrix returned by the get_confusion_matrix
method.
Surely 0
's and 1
's don't make much sense for the output. Therefore, for %labels
, the following keys must be specified:
Please take note that non-ascii characters ie. non-English alphabets will cause the output to go off :)
For the %labels
, there is no need to enter "actual X", "predicted X" etc. It will be indicated with A:
for actual and P:
for the predicted values.
NERVE DATA RELATED SUBROUTINES
This part is about saving the state of the nerve.
The subroutines are to be called in the procedural way. No checking is done currently.
The subroutines here are not exported in any way whatsoever.
preserve ( ... )
The parameters and usage are the same as save_perceptron
. See the next subroutine.
save_perceptron ( $nerve_file )
Saves the AI::Perceptron::Simple
object into a Storable
file. There shouldn't be a need to call this method manually since after every training process this will be called automatically.
revive (...)
The parameters and usage are the same as load_perceptron
. See the next subroutine.
load_perceptron ( $nerve_file_to_load )
Loads the data and turns it into a AI::Perceptron::Simple
object as the return value.
TO DO
These are the to-do's that MIGHT be done in the future. Don't put too much hope in them please :)
- Clean up source codes
- Implement shuffling data feature
- Implement fast/smart training feature
- Write a tutorial or something for this module
- and something yet to be known...
AUTHOR
Raphael Jong Jun Jie, <ellednera at cpan.org>
BUGS
Please report any bugs or feature requests to bug-ai-perceptron-simple at rt.cpan.org
, or through the web interface at https://rt.cpan.org/NoAuth/ReportBug.html?Queue=AI-Perceptron-Simple. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
SUPPORT
You can find documentation for this module with the perldoc command.
perldoc AI::Perceptron::Simple
You can also look for information at:
RT: CPAN's request tracker (report bugs here)
https://rt.cpan.org/NoAuth/Bugs.html?Dist=AI-Perceptron-Simple
CPAN Ratings
Search CPAN
ACKNOWLEDGEMENTS
Besiyata d'shmaya, Wikipedia
SEE ALSO
AI::Perceptron, Text::Matrix
LICENSE AND COPYRIGHT
This software is Copyright (c) 2021 by Raphael Jong Jun Jie.
This is free software, licensed under:
The Artistic License 2.0 (GPL Compatible)