NAME
AI::XGBoost - Perl wrapper for XGBoost library https://github.com/dmlc/xgboost
VERSION
version 0.1
SYNOPSIS
use 5.010;
use aliased 'AI::XGBoost::DMatrix';
use AI::XGBoost qw(train);
# We are going to solve a binary classification problem:
# Mushroom poisonous or not
my $train_data = DMatrix->From(file => 'agaricus.txt.train');
my $test_data = DMatrix->From(file => 'agaricus.txt.test');
# With XGBoost we can solve this problem using 'gbtree' booster
# and as loss function a logistic regression 'binary:logistic'
# (Gradient Boosting Regression Tree)
# XGBoost Tree Booster has a lot of parameters that we can tune
# (https://github.com/dmlc/xgboost/blob/master/doc/parameter.md)
my $booster = train(data => $train_data, number_of_rounds => 10, params => {
objective => 'binary:logistic',
eta => 1.0,
max_depth => 2,
silent => 1
});
# For binay classification predictions are probability confidence scores in [0, 1]
# indicating that the label is positive (1 in the first column of agaricus.txt.test)
my $predictions = $booster->predict(data => $test_data);
say join "\n", @$predictions[0 .. 10];
use aliased 'AI::XGBoost::DMatrix';
use AI::XGBoost qw(train);
use Data::Dataset::Classic::Iris;
# We are going to solve a multiple classification problem:
# determining plant species using a set of flower's measures
# XGBoost uses number for "class" so we are going to codify classes
my %class = (
setosa => 0,
versicolor => 1,
virginica => 2
);
my $iris = Data::Dataset::Classic::Iris::get();
# Split train and test, label and features
my $train_dataset = [map {$iris->{$_}} grep {$_ ne 'species'} keys %$iris];
my $test_dataset = [map {$iris->{$_}} grep {$_ ne 'species'} keys %$iris];
sub transpose {
# Transposing without using PDL, Data::Table, Data::Frame or other modules
# to keep minimal dependencies
my $array = shift;
my @aux = ();
for my $row (@$array) {
for my $column (0 .. scalar @$row - 1) {
push @{$aux[$column]}, $row->[$column];
}
}
return \@aux;
}
$train_dataset = transpose($train_dataset);
$test_dataset = transpose($test_dataset);
my $train_label = [map {$class{$_}} @{$iris->{'species'}}];
my $test_label = [map {$class{$_}} @{$iris->{'species'}}];
my $train_data = DMatrix->From(matrix => $train_dataset, label => $train_label);
my $test_data = DMatrix->From(matrix => $test_dataset, label => $test_label);
# Multiclass problems need a diferent objective function and the number
# of classes, in this case we are using 'multi:softprob' and
# num_class => 3
my $booster = train(data => $train_data, number_of_rounds => 20, params => {
max_depth => 3,
eta => 0.3,
silent => 1,
objective => 'multi:softprob',
num_class => 3
});
my $predictions = $booster->predict(data => $test_data);
DESCRIPTION
Perl wrapper for XGBoost library.
The easiest way to use the wrapper is using train
, but beforehand you need the data to be used contained in a DMatrix
object
This is a work in progress, feedback, comments, issues, suggestion and pull requests are welcome!!
Currently this module need the xgboost binary available in your system. I'm going to make an Alien module for xgboost but meanwhile you need to compile yourself xgboost: https://github.com/dmlc/xgboost
FUNCTIONS
train
Performs gradient boosting using the data and parameters passed
Returns a trained AI::XGBoost::Booster used
Parameters
- params
-
Parameters for the booster object.
Full list available: https://github.com/dmlc/xgboost/blob/master/doc/parameter.md
- data
-
AI::XGBoost::DMatrix object used for training
- number_of_rounds
-
Number of boosting iterations
ROADMAP
The goal is to make a full wrapper for XGBoost.
VERSIONS
- 0.2
-
Full C API "easy" to use, with PDL support as AI::XGBoost::CAPI
Easy means clients don't have to use FFI::Platypus or modules dealing with C structures
- 0.25
-
Alien package for libxgboost.so/xgboost.dll
- 0.3
-
Object oriented API Moose based with DMatrix and Booster classes
- 0.4
-
Complete object oriented API
- 0.5
-
Use perl signatures (https://metacpan.org/pod/distribution/perl/pod/perlexperiment.pod#Subroutine-signatures)
SEE ALSO
AUTHOR
Pablo Rodríguez González <pablo.rodriguez.gonzalez@gmail.com>
COPYRIGHT AND LICENSE
Copyright (c) 2017 by Pablo Rodríguez González.
CONTRIBUTOR
Ruben <me@ruben.tech>