NAME
AI::Genetic::Pro::Macromolecule - Genetic Algorithms to evolve DNA, RNA and Protein sequences
VERSION
version 0.09280.0_001
SYNOPSIS
use AI::Genetic::Pro::Macromolecule;
my @proteins = ($seq1, $seq2, $seq3, ... );
my $m = AI::Genetic::Pro::Macromolecule->new(
type => 'protein',
fitness => \&hydrophobicity,
initial_population => \@proteins,
);
sub hydrophobicity {
my $seq = shift;
my $score = f($seq)
return $score;
}
$m->evolve(10) # evolve for 10 generations;
my $most_hydrophobic = $m->fittest->{seq}; # get the best sequence
my $highest_score = $m->fittest->{score}; # get top score
# Want the score stats throughout generations?
my $history = $m->history;
my $mean_history = $history->{mean}; # [ mean1, mean2, mean3, ... ]
my $min_history = $history->{min}; # [ min1, min2, min3, ... ]
my $max_history = $history->{max}; # [ max1, max2, max3, ... ]
DESCRIPTION
AI::Genetic::Pro::Macromolecule is a wrapper over AI::Genetic::Pro, aimed at easily evolving protein, DNA or RNA sequences using arbitrary fitness functions.
Its purpose it to allow optimization of macromolecule sequences using Genetic Algorithms, with as little set up time and burdain as possible.
Standing atop AI::Genetic::Pro, it is reasonably fast and memory efficient. It is also highly customizable, although I've chosen what I think are sensible defaults for every parameter, so that you don't have to worry about them if you don't know what they mean.
ATTRIBUTES
fitness
Accepts a CodeRef
that should assign a numeric score to each string sequence that it's passed to it as an argument. Required.
sub fitness {
my $seq = shift;
# Do something with $seq and return a score
my $score = f($seq);
return $score;
}
my $m = AI::Genetic::Pro::Macromolecule->new(
fitness => \&fitness,
...
);
terminate
Accepts a CodeRef
. It will be applied once at the end of each generation. If returns true, evolution will stop, disregarding the generation steps passed to the evolve
method.
The CodeRef
should accept an AI::Genetic::Pro::Macromolecule
object as argument, and should return either true or false.
sub reached_max {
my $m = shift; # an AI::G::P::Macromolecule object
my $highest_score = $m->fittest->{score};
if ( $highest_score > 9000 ) {
warn "It's over 9000!";
return 1;
}
}
my $m = AI::Genetic::Pro::Macromolecule->new(
terminate => \&reached_max,
...
);
In the above example, evolution will stop the moment the top score in any generation exceeds the value 9000.
variable_length
Decide whether the sequences can have different lengths. Accepts a Bool
value. Defaults to 1.
length
Manually set the allowed maximum length of the sequences, accepts Int
.
This attribute is required unless an initial population is provided. In that case, length
will be set as equal to the length of the longest sequence provided if it's not explicity specified.
type
Macromolecule type: protein, dna, or rna. Required.
initial_population
Sequences to add to the initial pool before evolving. Accepts an ArrayRef[Str]
.
my $m = AI::Genetic::Pro::Macromolecule->new(
initial_population => ['ACGT', 'CAAC', 'GTTT'],
...
);
cache
Accepts a Bool
value. When true, score results for each sequence will be stored, to avoid costly and unnecesary recomputations. Set to 1 by default.
mutation
Mutation rate, a Num
between 0 and 1. Default is 0.05.
crossover
Crossover rate, a Num
between 0 and 1. Default is 0.95.
population_size
Number of sequences per generation. Default is 300.
parents
Number of parents sequences in recombinations. Default is 2.
selection
Defines how sequences are selected to crossover. It expects an ArrayRef
:
selection => [ $type, @params ]
See docs in AI::Genetic::Pro for details on available selection strategies, parameters, and their meanings. Default is Roulette, in which at first the best individuals/chromosomes are selected. From this collection parents are selected with probability poportionaly to its fitness.
strategy
Defines strategy of crossover operation. It expects an ArrayRef
:
strategy => [ $strategy, @params ]
See docs in AI::Genetic::Pro for details on available crossover strategies, parameters, and their meanings. Default is [ Points, 2 ], in which parents are crossed at 2 points and the best child is moved to the next generation.
preserve
Whether to inject the best sequences for next generation, and if so, how many. Defaults to 5.
METHODS
evolve
$m->evolve($n);
Evolve the sequence population for the specified number of generations. Accepts an optional single Int
argument. If $n is 0 or undef, it will evolve undefinitely or terminate
returns true.
generation
Returns the current generation number.
fittest
Returns an Array[HashRef]
with the desired number of top scoring sequences. The hash reference has two keys, 'seq' which points to the sequence string, and 'score' which points to the sequence's score.
my @top_2 = $m->fittest(2);
# (
# { seq => 'VIKP', score => 10 },
# { seq => 'VLKP', score => 9 },
# )
When called with no arguments, it returns a HashRef
with the top scoring sequence.
my $fittest = $m->fittest;
# { seq => 'VIKP', score => 10 }
history
Returns a HashRef
with the minimum, maximum and mean score for each generation.
my $history = $m->history;
# {
# min => [ 0, 0, 0, 1, 2, ... ],
# max => [ 1, 2, 2, 3, 4, ... ],
# mean => [ 0.2, 0.3, 0.5, 1.5, 3, ... ],
# }
To access the mean score for the $n
-th generation, for instance:
$m->history->{mean}->[$n - 1];
current_stats
Returns a HashRef
with the minimum, maximum and mean score fore the current generation.
$m->current_stats;
# { min => 2, max => 10, mean => 3.5 }
current_population
Returns an Array[HashRef]
with all the sequences of the current generation and their scores, in no particular order.
my @seqs = $m->current_population;
# (
# { seq => 'VIKP', score => 10 },
# { seq => 'VLKP', score => 9 },
# ...
# )
AUTHOR
Bruno Vecchi <vecchi.b gmail.com>
COPYRIGHT AND LICENSE
This software is copyright (c) 2009 by Bruno Vecchi.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.