NAME
Bio::Phylo::Matrices::MatrixRole - Extra behaviours for a character state matrix
SYNOPSIS
use
Bio::Phylo::Factory;
my
$fac
= Bio::Phylo::Factory->new;
# instantiate taxa object
my
$taxa
=
$fac
->create_taxa;
for
(
'Homo sapiens'
,
'Pan paniscus'
,
'Pan troglodytes'
) {
$taxa
->insert(
$fac
->create_taxon(
'-name'
=>
$_
) );
}
# instantiate matrix object, 'standard' data type. All categorical
# data types follow semantics like this, though with different
# symbols in lookup table and matrix
my
$standard_matrix
=
$fac
->create_matrix(
'-type'
=>
'STANDARD'
,
'-taxa'
=>
$taxa
,
'-lookup'
=> {
'-'
=> [],
'0'
=> [
'0'
],
'1'
=> [
'1'
],
'?'
=> [
'0'
,
'1'
],
},
'-labels'
=> [
'Opposable big toes'
,
'Opposable thumbs'
,
'Not a pygmy'
],
'-matrix'
=> [
[
'Homo sapiens'
=>
'0'
,
'1'
,
'1'
],
[
'Pan paniscus'
=>
'1'
,
'1'
,
'0'
],
[
'Pan troglodytes'
=>
'1'
,
'1'
,
'1'
],
],
);
# note: complicated constructor for mixed data!
my
$mixed_matrix
= Bio::Phylo::Matrices::Matrix->new(
# if you want to create 'mixed', value for '-type' is array ref...
'-type'
=> [
# ...with first field 'mixed'...
'mixed'
,
# ...second field is an array ref...
[
# ...with _ordered_ key/value pairs...
'dna'
=> 10,
# value is length of type range
'standard'
=> 10,
# value is length of type range
# ... or, more complicated, value is a hash ref...
'rna'
=> {
'-length'
=> 10,
# value is length of type range
# ...value for '-args' is an array ref with args
# as can be passed to 'unmixed' datatype constructors,
# for example, here we modify the lookup table for
# rna to allow both 'U' (default) and 'T'
'-args'
=> [
'-lookup'
=> {
'A'
=> [
'A'
],
'C'
=> [
'C'
],
'G'
=> [
'G'
],
'U'
=> [
'U'
],
'T'
=> [
'T'
],
'M'
=> [
'A'
,
'C'
],
'R'
=> [
'A'
,
'G'
],
'S'
=> [
'C'
,
'G'
],
'W'
=> [
'A'
,
'U'
,
'T'
],
'Y'
=> [
'C'
,
'U'
,
'T'
],
'K'
=> [
'G'
,
'U'
,
'T'
],
'V'
=> [
'A'
,
'C'
,
'G'
],
'H'
=> [
'A'
,
'C'
,
'U'
,
'T'
],
'D'
=> [
'A'
,
'G'
,
'U'
,
'T'
],
'B'
=> [
'C'
,
'G'
,
'U'
,
'T'
],
'X'
=> [
'G'
,
'A'
,
'U'
,
'T'
,
'C'
],
'N'
=> [
'G'
,
'A'
,
'U'
,
'T'
,
'C'
],
},
],
},
],
],
);
# prints 'mixed(Dna:1-10, Standard:11-20, Rna:21-30)'
$mixed_matrix
->get_type;
DESCRIPTION
This module defines a container object that holds Bio::Phylo::Matrices::Datum objects. The matrix object inherits from Bio::Phylo::Listable, so the methods defined there apply here.
METHODS
CONSTRUCTOR
- new()
-
Matrix constructor.
Type : Constructor
Title : new
Usage :
my
$matrix
= Bio::Phylo::Matrices::Matrix->new;
Function: Instantiates a Bio::Phylo::Matrices::Matrix
object.
Returns : A Bio::Phylo::Matrices::Matrix object.
Args :
-type
=> optional, but
if
used must be FIRST argument,
defines datatype, one of dna|rna|protein|
continuous|standard|restriction|[
mixed
=> [] ]
-taxa
=> optional,
link
to taxa object
-lookup
=> character state lookup hash
ref
-labels
=> array
ref
of character labels
-matrix
=> two-dimensional array, first element of every
row is label, subsequent are characters
- new_from_bioperl()
-
Matrix constructor from Bio::Align::AlignI argument.
Type : Constructor
Title : new_from_bioperl
Usage :
my
$matrix
=
Bio::Phylo::Matrices::Matrix->new_from_bioperl(
$aln
);
Function: Instantiates a
Bio::Phylo::Matrices::Matrix object.
Returns : A Bio::Phylo::Matrices::Matrix object.
Args : An alignment that implements Bio::Align::AlignI
MUTATORS
- set_special_symbols
-
Sets three special symbols in one call
Type : Mutator
Title : set_special_symbols
Usage :
$matrix
->set_special_symbols(
-missing
=>
'?'
,
-gap
=>
'-'
,
-matchchar
=>
'.'
);
Function: Assigns state labels.
Returns :
$self
Args : Three args (
with
distinct
$x
,
$y
and
$z
):
-missing
=>
$x
,
-gap
=>
$y
,
-matchchar
=>
$z
Notes : This method is here to ensure
same symbol
for
missing AND gap
- set_charlabels()
-
Sets argument character labels.
Type : Mutator
Title : set_charlabels
Usage :
$matrix
->set_charlabels( [
'char1'
,
'char2'
,
'char3'
] );
Function: Assigns character labels.
Returns :
$self
Args : ARRAY, or nothing (to
reset
);
- set_raw()
-
Set contents using two-dimensional array argument.
Type : Mutator
Title : set_raw
Usage :
$matrix
->set_raw( [ [
'taxon1'
=>
'acgt'
], [
'taxon2'
=>
'acgt'
] ] );
Function: Syntax sugar to define
$matrix
data contents.
Returns :
$self
Args : A two-dimensional array; first dimension contains matrix rows,
second dimension contains taxon name / character string pair.
ACCESSORS
- get_special_symbols()
-
Retrieves hash ref for missing, gap and matchchar symbols
Type : Accessor
Title : get_special_symbols
Usage :
my
%syms
= %{
$matrix
->get_special_symbols };
Function: Retrieves special symbols
Returns : HASH
ref
, e.g. {
-missing
=>
'?'
,
-gap
=>
'-'
,
-matchchar
=>
'.'
}
Args : None.
- get_charlabels()
-
Retrieves character labels.
Type : Accessor
Title : get_charlabels
Usage :
my
@charlabels
= @{
$matrix
->get_charlabels };
Function: Retrieves character labels.
Returns : ARRAY
Args : None.
- get_nchar()
-
Calculates number of characters.
Type : Accessor
Title : get_nchar
Usage :
my
$nchar
=
$matrix
->get_nchar;
Function: Calculates number of characters (columns) in matrix (
if
the matrix
is non-rectangular, returns the
length
of the longest row).
Returns : INT
Args : none
- get_ntax()
-
Calculates number of taxa (rows) in matrix.
Type : Accessor
Title : get_ntax
Usage :
my
$ntax
=
$matrix
->get_ntax;
Function: Calculates number of taxa (rows) in matrix
Returns : INT
Args : none
- get_raw()
-
Retrieves a 'raw' (two-dimensional array) representation of the matrix's contents.
Type : Accessor
Title : get_raw
Usage :
my
$rawmatrix
=
$matrix
->get_raw;
Function: Retrieves a
'raw'
(two-dimensional array) representation
of the matrix's contents.
Returns : A two-dimensional array; first dimension contains matrix rows,
second dimension contains taxon name and characters.
Args : NONE
- get_ungapped_columns()
-
Type : Accessor
Title : get_ungapped_columns
Usage :
my
@ungapped
= @{
$matrix
->get_ungapped_columns };
Function: Retrieves the zero-based column indices of columns without gaps
Returns : An array reference
with
zero or more indices (i.e. integers)
Args : NONE
- get_invariant_columns()
-
Type : Accessor
Title : get_invariant_columns
Usage :
my
@invariant
= @{
$matrix
->get_invariant_columns };
Function: Retrieves the zero-based column indices of invariant columns
Returns : An array reference
with
zero or more indices (i.e. integers)
Args : Optional:
-gap
=>
if
true, counts the gap symbol (probably
'-'
) as a variant
-missing
=>
if
true, counts the missing symbol (probably
'?'
) as a variant
CALCULATIONS
- calc_indel_sizes()
-
Calculates size distribution of insertions or deletions
Type : Calculation
Title : calc_indel_sizes
Usage :
my
%sizes
= %{
$matrix
->calc_indel_sizes };
Function: Calculates the size distribution of indels.
Returns : HASH
Args : Optional:
-trim
=>
if
true, disregards indels at start and end
-insertions
=>
if
true, counts insertions,
if
false, counts deletions
- calc_prop_invar()
-
Calculates proportion of invariant sites.
Type : Calculation
Title : calc_prop_invar
Usage :
my
$pinvar
=
$matrix
->calc_prop_invar;
Function: Calculates proportion of invariant sites.
Returns : Scalar: a number
Args : Optional:
# if true, counts missing (usually the '?' symbol) as a state
# in the final tallies. Otherwise, missing states are ignored
-missing
=> 1
# if true, counts gaps (usually the '-' symbol) as a state
# in the final tallies. Otherwise, gap states are ignored
-gap
=> 1
- calc_state_counts()
-
Calculates occurrences of states.
Type : Calculation
Title : calc_state_counts
Usage :
my
%counts
= %{
$matrix
->calc_state_counts };
Function: Calculates occurrences of states.
Returns : Hashref:
keys
are states,
values
are counts
Args : Optional - one or more states to focus on
- calc_state_frequencies()
-
Calculates the frequencies of the states observed in the matrix.
Type : Calculation
Title : calc_state_frequencies
Usage :
my
%freq
= %{
$object
->calc_state_frequencies() };
Function: Calculates state frequencies
Returns : A hash,
keys
are state symbols,
values
are frequencies
Args : Optional:
# if true, counts missing (usually the '?' symbol) as a state
# in the final tallies. Otherwise, missing states are ignored
-missing
=> 1
# if true, counts gaps (usually the '-' symbol) as a state
# in the final tallies. Otherwise, gap states are ignored
-gap
=> 1
Comments: Throws exception
if
matrix holds continuous
values
- calc_distinct_site_patterns()
-
Identifies the distinct distributions of states for all characters and counts their occurrences. Returns an array-of-arrays, where the first cell of each inner array holds the occurrence count, the second cell holds the pattern, i.e. an array of states. For example, for a matrix like this:
taxon1 GTGTGTGTGTGTGTGTGTGTGTG
taxon2 AGAGAGAGAGAGAGAGAGAGAGA
taxon3 TCTCTCTCTCTCTCTCTCTCTCT
taxon4 TCTCTCTCTCTCTCTCTCTCTCT
taxon5 AAAAAAAAAAAAAAAAAAAAAAA
taxon6 CGCGCGCGCGCGCGCGCGCGCGC
taxon7 AAAAAAAAAAAAAAAAAAAAAAA
The following data structure will be returned:
[
[ 12, [
'G'
,
'A'
,
'T'
,
'T'
,
'A'
,
'C'
,
'A'
] ],
[ 11, [
'T'
,
'G'
,
'C'
,
'C'
,
'A'
,
'G'
,
'A'
] ]
]
The patterns are sorted from most to least frequently occurring, the states for each pattern are in the order of the rows in the matrix. (In other words, the original matrix can more or less be reconstructed by inverting the patterns, and multiplying them by their occurrence, although the order of the columns will be lost.)
Type : Calculation
Title : calc_distinct_site_patterns
Usage :
my
$patterns
=
$object
->calc_distinct_site_patterns;
Function: Calculates distinct site patterns.
Returns : A multidimensional array, see above.
Args : NONE
Comments:
- calc_gc_content()
-
Calculates the G+C content as a fraction on the total
Type : Calculation
Title : calc_gc_content
Usage :
my
$fraction
=
$obj
->calc_gc_content;
Function: Calculates G+C content
Returns : A number between 0 and 1 (inclusive)
Args : Optional:
# if true, counts missing (usually the '?' symbol) as a state
# in the final tallies. Otherwise, missing states are ignored
-missing
=> 1
# if true, counts gaps (usually the '-' symbol) as a state
# in the final tallies. Otherwise, gap states are ignored
-gap
=> 1
Comments: Throws
'BadArgs'
exception
if
matrix holds anything other than DNA
or RNA. The calculation also takes the IUPAC symbol S (which is C|G)
into account, but
no
other symbols (such as V,
for
A|C|G);
- calc_median_sequence()
-
Calculates the median character sequence of the matrix
Type : Calculation
Title : calc_median_sequence
Usage :
my
$seq
=
$obj
->calc_median_sequence;
Function: Calculates median sequence
Returns : Array in list context, string in
scalar
context
Args : Optional:
-ambig
=>
if
true, uses ambiguity codes to summarize equally frequent
states
for
a
given
character. Otherwise picks a random one.
-missing
=>
if
true, keeps the missing symbol (probably
'?'
)
if
this
is the most frequent
for
a
given
character. Otherwise strips it.
-gaps
=>
if
true, keeps the gap symbol (probably
'-'
)
if
this is the most
frequent
for
a
given
character. Otherwise strips it.
Comments: The intent of this method is to provide a crude approximation of the most
commonly occurring sequences in an alignment,
for
example as a starting
sequence
for
a sequence simulator. This gives you something to work
with
if
ancestral sequence calculation is too computationally intensive and/or not
really necessary.
METHODS
- keep_chars()
-
Creates a cloned matrix that only keeps the characters at the supplied (zero-based) indices.
Type : Utility method
Title : keep_chars
Usage :
my
$clone
=
$object
->keep_chars([6,3,4,1]);
Function: Creates spliced clone.
Returns : A spliced clone of the invocant.
Args : Required, an array
ref
of integers
Comments: The columns are retained in the order in
which they were supplied.
- prune_chars()
-
Creates a cloned matrix that omits the characters at the supplied (zero-based) indices.
Type : Utility method
Title : prune_chars
Usage :
my
$clone
=
$object
->prune_chars([6,3,4,1]);
Function: Creates spliced clone.
Returns : A spliced clone of the invocant.
Args : Required, an array
ref
of integers
Comments: The columns are retained in the order in
which they were supplied.
- prune_invariant()
-
Creates a cloned matrix that omits the characters for which all taxa have the same state (or missing);
Type : Utility method
Title : prune_invariant
Usage :
my
$clone
=
$object
->prune_invariant;
Function: Creates spliced clone.
Returns : A spliced clone of the invocant.
Args : None
Comments: The columns are retained in the order in
which they were supplied.
- prune_uninformative()
-
Creates a cloned matrix that omits all uninformative characters. Uninformative are considered characters where all non-missing values are either invariant or autapomorphies.
Type : Utility method
Title : prune_uninformative
Usage :
my
$clone
=
$object
->prune_uninformative;
Function: Creates spliced clone.
Returns : A spliced clone of the invocant.
Args : None
Comments: The columns are retained in the order in
which they were supplied.
- prune_missing_and_gaps()
-
Creates a cloned matrix that omits all characters for which the invocant only has missing and/or gap states.
Type : Utility method
Title : prune_missing_and_gaps
Usage :
my
$clone
=
$object
->prune_missing_and_gaps;
Function: Creates spliced clone.
Returns : A spliced clone of the invocant.
Args : None
Comments: The columns are retained in the order in
which they were supplied.
- bootstrap()
-
Creates bootstrapped clone.
Type : Utility method
Title : bootstrap
Usage :
my
$bootstrap
=
$object
->bootstrap;
Function: Creates bootstrapped clone.
Returns : A bootstrapped clone of the invocant.
Args : Optional, a subroutine reference that returns a random
integer between 0 (inclusive) and the argument provided
to it (exclusive). The
default
implementation is to
use
sub
{
int
(
rand
(
shift
) ) }, a user might
override
this
by providing an implementation
with
a better random number
generator.
Comments: The bootstrapping algorithm uses perl's random number
generator to create a new series of indices (without
replacement) of the same
length
as the original matrix.
These indices are first sorted, then applied to the
cloned sequences. Annotations (
if
present) stay connected
to the resampled cells.
- jackknife()
-
Creates jackknifed clone.
Type : Utility method
Title : jackknife
Usage :
my
$bootstrap
=
$object
->jackknife(0.5);
Function: Creates jackknifed clone.
Returns : A jackknifed clone of the invocant.
Args : * Required, a number between 0 and 1, representing the
fraction of characters to jackknife.
* Optional, a subroutine reference that returns a random
integer between 0 (inclusive) and the argument provided
to it (exclusive). The
default
implementation is to
use
sub
{
int
(
rand
(
shift
) ) }, a user might
override
this
by providing an implementation
with
a better random number
generator.
Comments: The jackknife algorithm uses perl's random number
generator to create a new series of indices of cells to keep.
These indices are first sorted, then applied to the
cloned sequences. Annotations (
if
present) stay connected
to the resampled cells.
- replicate()
-
Creates simulated replicate.
Type : Utility method
Title : replicate
Usage :
my
$replicate
=
$matrix
->replicate(
$tree
);
Function: Creates simulated replicate.
Returns : A simulated replicate of the invocant.
Args : Tree to simulate the characters on.
Optional:
-seed
=> a random integer seed
-model
=> an object of class Bio::Phylo::Models::Substitution::Dna or
Bio::Phylo::Models::Substitution::Binary
-random_rootseq
=> start DNA sequence simulation from random ancestral sequence
instead of the median sequence in the alignment.
If model is not
given
as argument, it will be estimated.
- insert()
-
Insert argument in invocant.
Type : Listable method
Title : insert
Usage :
$matrix
->insert(
$datum
);
Function: Inserts
$datum
in
$matrix
.
Returns : Modified object
Args : A datum object
Comments: This method re-implements the method by the same
name in Bio::Phylo::Listable
- compress_lookup()
-
Removes unused states from lookup table
Type : Method
Title : validate
Usage :
$obj
->compress_lookup
Function: Removes unused states from lookup table
Returns :
$self
Args : None
- check_taxa()
-
Validates taxa associations.
Type : Method
Title : check_taxa
Usage :
$obj
->check_taxa
Function: Validates relation between matrix and taxa block
Returns : Modified object
Args : None
Comments: This method implements the interface method by the same
name in Bio::Phylo::Taxa::TaxaLinker
- make_taxa()
-
Creates a taxa block from the objects contents if none exists yet.
Type : Method
Title : make_taxa
Usage :
my
$taxa
=
$obj
->make_taxa
Function: Creates a taxa block from the objects contents
if
none
exists
yet.
Returns :
$taxa
Args : NONE
SERIALIZERS
- to_xml()
-
Serializes matrix to nexml format.
Type : Format convertor
Title : to_xml
Usage :
my
$data_block
=
$matrix
->to_xml;
Function: Converts matrix object into a nexml element structure.
Returns : Nexml block (SCALAR).
Args : Optional:
-compact
=> 1 (
for
compact representation of matrix)
- to_nexus()
-
Serializes matrix to nexus format.
Type : Format convertor
Title : to_nexus
Usage :
my
$data_block
=
$matrix
->to_nexus;
Function: Converts matrix object into a nexus data block.
Returns : Nexus data block (SCALAR).
Args : The following options are available:
# if set, writes TITLE & LINK tokens
'-links'
=> 1
# if set, writes block as a "data" block (deprecated, but used by mrbayes),
# otherwise writes "characters" block (default)
-data_block
=> 1
# if set, writes "RESPECTCASE" token
-respectcase
=> 1
# if set, writes "GAPMODE=(NEWSTATE or MISSING)" token
-gapmode
=> 1
# if set, writes "MSTAXA=(POLYMORPH or UNCERTAIN)" token
-polymorphism
=> 1
# if set, writes character labels
-charlabels
=> 1
# if set, writes state labels
-statelabels
=> 1
# if set, writes mesquite-style charstatelabels
-charstatelabels
=> 1
# by default, names for sequences are derived from $datum->get_name, if
# 'internal' is specified, uses $datum->get_internal_name, if 'taxon'
# uses $datum->get_taxon->get_name, if 'taxon_internal' uses
# $datum->get_taxon->get_internal_name, if $key, uses $datum->get_generic($key)
-seqnames
=> one of (internal|taxon|taxon_internal|
$key
)
- to_dom()
-
Analog to to_xml.
Type : Serializer
Title : to_dom
Usage :
$matrix
->to_dom
Function: Generates a DOM subtree from the invocant
and its contained objects
Returns : an Element object
Args : Optional:
-compact
=> 1 : renders characters as sequences,
not individual cells
SEE ALSO
There is a mailing list at https://groups.google.com/forum/#!forum/bio-phylo for any user or developer questions and discussions.
- Bio::Phylo::Taxa::TaxaLinker
-
This object inherits from Bio::Phylo::Taxa::TaxaLinker, so the methods defined therein are also applicable to Bio::Phylo::Matrices::Matrix objects.
- Bio::Phylo::Matrices::TypeSafeData
-
This object inherits from Bio::Phylo::Matrices::TypeSafeData, so the methods defined therein are also applicable to Bio::Phylo::Matrices::Matrix objects.
- Bio::Phylo::Manual
-
Also see the manual: Bio::Phylo::Manual and http://rutgervos.blogspot.com.
CITATION
If you use Bio::Phylo in published research, please cite it:
Rutger A Vos, Jason Caravas, Klaas Hartmann, Mark A Jensen and Chase Miller, 2011. Bio::Phylo - phyloinformatic analysis using Perl. BMC Bioinformatics 12:63. http://dx.doi.org/10.1186/1471-2105-12-63