NAME

Bio::ProteinFeatures - Deriving features of amino acid sequences

SYNOPSIS

use Bio::ProteinFeatures;

$pf = new Bio::ProteinFeatures;

$pf->sequence($sequence_string);


# you may use Data::Dumper to see the result.
use Data::Dumper;
print Dumper $pf->features();

DESCRIPTION

This module applies several statistical methods on amino acid sequences for deriving various useful features for identifying sequences and they may be used to measure similarities between sequences. You may also use this module to do coarse matching before doing Blast.

METHODS

new

You can set the sequence on invoking the constructor.

$pf = new Bio::ProteinFeatures(sequence => $sequence_string);

Or set it using the next method.

sequence

Set or get the sequence string

# set the sequence
$pf->sequence($sequence);

# return the sequence
$pf->sequence();

features

The features this module deals with are listed below.

composition

Amino acids are grouped into three categories: polar, neutral, and hydrophobic. The methods calculates the compositions of the three groups of amino acids.

transition probability

Characterizes the percent frequency with which group A is followed by group B or B is followed by A.

accumulative distribution

Sequences are cut into 5 sections. It calculates the accumulative probabilities of a certain group within a section.

per-amino-acid probability

Calculates per-se probability of each amino acid.

first order energy

summation prob(i**2) for each i of amino acids.

first order entropy

summation -prob(i)*log(prob(i)) for each i of amino acids.

histogram difference

Calculates the difference of the numbers of two neighboring amino acids.

AA pair probability

Probabilities of amino acid bigrams.

average seperation between two amino acid of the same group

Counts the average number of characters between two amino acids of the same group.

COPYRIGHT

xern <xern@cpan.org>

This module is free software; you can redistribute it or modify it under the same terms as Perl itself.