NAME

Bio::Phylo::Matrices::MatrixRole - Extra behaviours for a character state matrix

SYNOPSIS

use Bio::Phylo::Factory;
my $fac = Bio::Phylo::Factory->new;

# instantiate taxa object
my $taxa = $fac->create_taxa;
for ( 'Homo sapiens', 'Pan paniscus', 'Pan troglodytes' ) {
    $taxa->insert( $fac->create_taxon( '-name' => $_ ) );
}

# instantiate matrix object, 'standard' data type. All categorical
# data types follow semantics like this, though with different
# symbols in lookup table and matrix
my $standard_matrix = $fac->create_matrix(
    '-type'   => 'STANDARD',
    '-taxa'   => $taxa,
    '-lookup' => {
        '-' => [],
        '0' => [ '0' ],
        '1' => [ '1' ],
        '?' => [ '0', '1' ],
    },
    '-labels' => [ 'Opposable big toes', 'Opposable thumbs', 'Not a pygmy' ],
    '-matrix' => [
        [ 'Homo sapiens'    => '0', '1', '1' ],
        [ 'Pan paniscus'    => '1', '1', '0' ],
        [ 'Pan troglodytes' => '1', '1', '1' ],
    ],
);

# note: complicated constructor for mixed data!
my $mixed_matrix = Bio::Phylo::Matrices::Matrix->new(

   # if you want to create 'mixed', value for '-type' is array ref...
   '-type' =>  [

       # ...with first field 'mixed'...
       'mixed',

       # ...second field is an array ref...
       [

           # ...with _ordered_ key/value pairs...
           'dna'      => 10, # value is length of type range
           'standard' => 10, # value is length of type range

           # ... or, more complicated, value is a hash ref...
           'rna'      => {
               '-length' => 10, # value is length of type range

               # ...value for '-args' is an array ref with args
               # as can be passed to 'unmixed' datatype constructors,
               # for example, here we modify the lookup table for
               # rna to allow both 'U' (default) and 'T'
               '-args'   => [
                   '-lookup' => {
                       'A' => [ 'A'                     ],
                       'C' => [ 'C'                     ],
                       'G' => [ 'G'                     ],
                       'U' => [ 'U'                     ],
                       'T' => [ 'T'                     ],
                       'M' => [ 'A', 'C'                ],
                       'R' => [ 'A', 'G'                ],
                       'S' => [ 'C', 'G'                ],
                       'W' => [ 'A', 'U', 'T'           ],
                       'Y' => [ 'C', 'U', 'T'           ],
                       'K' => [ 'G', 'U', 'T'           ],
                       'V' => [ 'A', 'C', 'G'           ],
                       'H' => [ 'A', 'C', 'U', 'T'      ],
                       'D' => [ 'A', 'G', 'U', 'T'      ],
                       'B' => [ 'C', 'G', 'U', 'T'      ],
                       'X' => [ 'G', 'A', 'U', 'T', 'C' ],
                       'N' => [ 'G', 'A', 'U', 'T', 'C' ],
                   },
               ],
           },
       ],
   ],
);

# prints 'mixed(Dna:1-10, Standard:11-20, Rna:21-30)'
print $mixed_matrix->get_type;

DESCRIPTION

This module defines a container object that holds Bio::Phylo::Matrices::Datum objects. The matrix object inherits from Bio::Phylo::Listable, so the methods defined there apply here.

METHODS

CONSTRUCTOR

new()

Matrix constructor.

Type    : Constructor
Title   : new
Usage   : my $matrix = Bio::Phylo::Matrices::Matrix->new;
Function: Instantiates a Bio::Phylo::Matrices::Matrix
          object.
Returns : A Bio::Phylo::Matrices::Matrix object.
Args    : -type   => optional, but if used must be FIRST argument,
                     defines datatype, one of dna|rna|protein|
                     continuous|standard|restriction|[ mixed => [] ]

          -taxa   => optional, link to taxa object
          -lookup => character state lookup hash ref
          -labels => array ref of character labels
          -matrix => two-dimensional array, first element of every
                     row is label, subsequent are characters
new_from_bioperl()

Matrix constructor from Bio::Align::AlignI argument.

Type    : Constructor
Title   : new_from_bioperl
Usage   : my $matrix =
          Bio::Phylo::Matrices::Matrix->new_from_bioperl(
              $aln
          );
Function: Instantiates a
          Bio::Phylo::Matrices::Matrix object.
Returns : A Bio::Phylo::Matrices::Matrix object.
Args    : An alignment that implements Bio::Align::AlignI

MUTATORS

set_special_symbols

Sets three special symbols in one call

Type    : Mutator
Title   : set_special_symbols
Usage   : $matrix->set_special_symbols(
		       -missing   => '?',
		       -gap       => '-',
		       -matchchar => '.'
		   );
Function: Assigns state labels.
Returns : $self
Args    : Three args (with distinct $x, $y and $z):
 		       -missing   => $x,
		       -gap       => $y,
		       -matchchar => $z
Notes   : This method is here to ensure
          you don't accidentally use the
          same symbol for missing AND gap
set_charlabels()

Sets argument character labels.

Type    : Mutator
Title   : set_charlabels
Usage   : $matrix->set_charlabels( [ 'char1', 'char2', 'char3' ] );
Function: Assigns character labels.
Returns : $self
Args    : ARRAY, or nothing (to reset);
set_raw()

Set contents using two-dimensional array argument.

Type    : Mutator
Title   : set_raw
Usage   : $matrix->set_raw( [ [ 'taxon1' => 'acgt' ], [ 'taxon2' => 'acgt' ] ] );
Function: Syntax sugar to define $matrix data contents.
Returns : $self
Args    : A two-dimensional array; first dimension contains matrix rows,
          second dimension contains taxon name / character string pair.

ACCESSORS

get_special_symbols()

Retrieves hash ref for missing, gap and matchchar symbols

Type    : Accessor
Title   : get_special_symbols
Usage   : my %syms = %{ $matrix->get_special_symbols };
Function: Retrieves special symbols
Returns : HASH ref, e.g. { -missing => '?', -gap => '-', -matchchar => '.' }
Args    : None.
get_charlabels()

Retrieves character labels.

Type    : Accessor
Title   : get_charlabels
Usage   : my @charlabels = @{ $matrix->get_charlabels };
Function: Retrieves character labels.
Returns : ARRAY
Args    : None.
get_nchar()

Calculates number of characters.

Type    : Accessor
Title   : get_nchar
Usage   : my $nchar = $matrix->get_nchar;
Function: Calculates number of characters (columns) in matrix (if the matrix
          is non-rectangular, returns the length of the longest row).
Returns : INT
Args    : none
get_ntax()

Calculates number of taxa (rows) in matrix.

Type    : Accessor
Title   : get_ntax
Usage   : my $ntax = $matrix->get_ntax;
Function: Calculates number of taxa (rows) in matrix
Returns : INT
Args    : none
get_raw()

Retrieves a 'raw' (two-dimensional array) representation of the matrix's contents.

Type    : Accessor
Title   : get_raw
Usage   : my $rawmatrix = $matrix->get_raw;
Function: Retrieves a 'raw' (two-dimensional array) representation
          of the matrix's contents.
Returns : A two-dimensional array; first dimension contains matrix rows,
          second dimension contains taxon name and characters.
Args    : NONE
get_ungapped_columns()
Type    : Accessor
Title   : get_ungapped_columns
Usage   : my @ungapped = @{ $matrix->get_ungapped_columns };
Function: Retrieves the zero-based column indices of columns without gaps
Returns : An array reference with zero or more indices (i.e. integers)
Args    : NONE
get_invariant_columns()
Type    : Accessor
Title   : get_invariant_columns
Usage   : my @invariant = @{ $matrix->get_invariant_columns };
Function: Retrieves the zero-based column indices of invariant columns
Returns : An array reference with zero or more indices (i.e. integers)
Args    : Optional:
          -gap     => if true, counts the gap symbol (probably '-') as a variant
          -missing => if true, counts the missing symbol (probably '?') as a variant

CALCULATIONS

calc_indel_sizes()

Calculates size distribution of insertions or deletions

Type    : Calculation
Title   : calc_indel_sizes
Usage   : my %sizes = %{ $matrix->calc_indel_sizes };
Function: Calculates the size distribution of indels.
Returns : HASH
Args    : Optional:
          -trim       => if true, disregards indels at start and end
          -insertions => if true, counts insertions, if false, counts deletions
calc_prop_invar()

Calculates proportion of invariant sites.

 Type    : Calculation
 Title   : calc_prop_invar
 Usage   : my $pinvar = $matrix->calc_prop_invar;
 Function: Calculates proportion of invariant sites.
 Returns : Scalar: a number
 Args    : Optional:
           # if true, counts missing (usually the '?' symbol) as a state
	       # in the final tallies. Otherwise, missing states are ignored
           -missing => 1
           # if true, counts gaps (usually the '-' symbol) as a state
	       # in the final tallies. Otherwise, gap states are ignored
	       -gap => 1
calc_state_counts()

Calculates occurrences of states.

Type    : Calculation
Title   : calc_state_counts
Usage   : my %counts = %{ $matrix->calc_state_counts };
Function: Calculates occurrences of states.
Returns : Hashref: keys are states, values are counts
Args    : Optional - one or more states to focus on
calc_state_frequencies()

Calculates the frequencies of the states observed in the matrix.

 Type    : Calculation
 Title   : calc_state_frequencies
 Usage   : my %freq = %{ $object->calc_state_frequencies() };
 Function: Calculates state frequencies
 Returns : A hash, keys are state symbols, values are frequencies
 Args    : Optional:
           # if true, counts missing (usually the '?' symbol) as a state
	       # in the final tallies. Otherwise, missing states are ignored
           -missing => 1
           # if true, counts gaps (usually the '-' symbol) as a state
	       # in the final tallies. Otherwise, gap states are ignored
	       -gap => 1
 Comments: Throws exception if matrix holds continuous values
calc_distinct_site_patterns()

Identifies the distinct distributions of states for all characters and counts their occurrences. Returns an array-of-arrays, where the first cell of each inner array holds the occurrence count, the second cell holds the pattern, i.e. an array of states. For example, for a matrix like this:

taxon1 GTGTGTGTGTGTGTGTGTGTGTG
taxon2 AGAGAGAGAGAGAGAGAGAGAGA
taxon3 TCTCTCTCTCTCTCTCTCTCTCT
taxon4 TCTCTCTCTCTCTCTCTCTCTCT
taxon5 AAAAAAAAAAAAAAAAAAAAAAA
taxon6 CGCGCGCGCGCGCGCGCGCGCGC
taxon7 AAAAAAAAAAAAAAAAAAAAAAA

The following data structure will be returned:

 [
	[ 12, [ 'G', 'A', 'T', 'T', 'A', 'C', 'A' ] ],
	[ 11, [ 'T', 'G', 'C', 'C', 'A', 'G', 'A' ] ]
 ]

The patterns are sorted from most to least frequently occurring, the states for each pattern are in the order of the rows in the matrix. (In other words, the original matrix can more or less be reconstructed by inverting the patterns, and multiplying them by their occurrence, although the order of the columns will be lost.)

Type    : Calculation
Title   : calc_distinct_site_patterns
Usage   : my $patterns = $object->calc_distinct_site_patterns;
Function: Calculates distinct site patterns.
Returns : A multidimensional array, see above.
Args    : NONE
Comments:
calc_gc_content()

Calculates the G+C content as a fraction on the total

 Type    : Calculation
 Title   : calc_gc_content
 Usage   : my $fraction = $obj->calc_gc_content;
 Function: Calculates G+C content
 Returns : A number between 0 and 1 (inclusive)
 Args    : Optional:
           # if true, counts missing (usually the '?' symbol) as a state
	       # in the final tallies. Otherwise, missing states are ignored
           -missing => 1
           # if true, counts gaps (usually the '-' symbol) as a state
	       # in the final tallies. Otherwise, gap states are ignored
	       -gap => 1
 Comments: Throws 'BadArgs' exception if matrix holds anything other than DNA
           or RNA. The calculation also takes the IUPAC symbol S (which is C|G)
	       into account, but no other symbols (such as V, for A|C|G);
calc_median_sequence()

Calculates the median character sequence of the matrix

Type    : Calculation
Title   : calc_median_sequence
Usage   : my $seq = $obj->calc_median_sequence;
Function: Calculates median sequence
Returns : Array in list context, string in scalar context
Args    : Optional:
          -ambig   => if true, uses ambiguity codes to summarize equally frequent
                      states for a given character. Otherwise picks a random one.
          -missing => if true, keeps the missing symbol (probably '?') if this
                      is the most frequent for a given character. Otherwise strips it.
          -gaps    => if true, keeps the gap symbol (probably '-') if this is the most
                      frequent for a given character. Otherwise strips it.
Comments: The intent of this method is to provide a crude approximation of the most
          commonly occurring sequences in an alignment, for example as a starting
          sequence for a sequence simulator. This gives you something to work with if
          ancestral sequence calculation is too computationally intensive and/or not
          really necessary.

METHODS

keep_chars()

Creates a cloned matrix that only keeps the characters at the supplied (zero-based) indices.

Type    : Utility method
Title   : keep_chars
Usage   : my $clone = $object->keep_chars([6,3,4,1]);
Function: Creates spliced clone.
Returns : A spliced clone of the invocant.
Args    : Required, an array ref of integers
Comments: The columns are retained in the order in
          which they were supplied.
prune_chars()

Creates a cloned matrix that omits the characters at the supplied (zero-based) indices.

Type    : Utility method
Title   : prune_chars
Usage   : my $clone = $object->prune_chars([6,3,4,1]);
Function: Creates spliced clone.
Returns : A spliced clone of the invocant.
Args    : Required, an array ref of integers
Comments: The columns are retained in the order in
          which they were supplied.
prune_invariant()

Creates a cloned matrix that omits the characters for which all taxa have the same state (or missing);

Type    : Utility method
Title   : prune_invariant
Usage   : my $clone = $object->prune_invariant;
Function: Creates spliced clone.
Returns : A spliced clone of the invocant.
Args    : None
Comments: The columns are retained in the order in
          which they were supplied.
prune_uninformative()

Creates a cloned matrix that omits all uninformative characters. Uninformative are considered characters where all non-missing values are either invariant or autapomorphies.

Type    : Utility method
Title   : prune_uninformative
Usage   : my $clone = $object->prune_uninformative;
Function: Creates spliced clone.
Returns : A spliced clone of the invocant.
Args    : None
Comments: The columns are retained in the order in
          which they were supplied.
prune_missing_and_gaps()

Creates a cloned matrix that omits all characters for which the invocant only has missing and/or gap states.

Type    : Utility method
Title   : prune_missing_and_gaps
Usage   : my $clone = $object->prune_missing_and_gaps;
Function: Creates spliced clone.
Returns : A spliced clone of the invocant.
Args    : None
Comments: The columns are retained in the order in
          which they were supplied.
bootstrap()

Creates bootstrapped clone.

Type    : Utility method
Title   : bootstrap
Usage   : my $bootstrap = $object->bootstrap;
Function: Creates bootstrapped clone.
Returns : A bootstrapped clone of the invocant.
Args    : Optional, a subroutine reference that returns a random
          integer between 0 (inclusive) and the argument provided
          to it (exclusive). The default implementation is to use
          sub { int( rand( shift ) ) }, a user might override this
          by providing an implementation with a better random number
          generator.
Comments: The bootstrapping algorithm uses perl's random number
          generator to create a new series of indices (without
          replacement) of the same length as the original matrix.
          These indices are first sorted, then applied to the
          cloned sequences. Annotations (if present) stay connected
          to the resampled cells.
jackknife()

Creates jackknifed clone.

Type    : Utility method
Title   : jackknife
Usage   : my $bootstrap = $object->jackknife(0.5);
Function: Creates jackknifed clone.
Returns : A jackknifed clone of the invocant.
Args    : * Required, a number between 0 and 1, representing the
            fraction of characters to jackknife.
          * Optional, a subroutine reference that returns a random
            integer between 0 (inclusive) and the argument provided
            to it (exclusive). The default implementation is to use
            sub { int( rand( shift ) ) }, a user might override this
            by providing an implementation with a better random number
            generator.
Comments: The jackknife algorithm uses perl's random number
          generator to create a new series of indices of cells to keep.
          These indices are first sorted, then applied to the
          cloned sequences. Annotations (if present) stay connected
          to the resampled cells.
replicate()

Creates simulated replicate.

 Type    : Utility method
 Title   : replicate
 Usage   : my $replicate = $matrix->replicate($tree);
 Function: Creates simulated replicate.
 Returns : A simulated replicate of the invocant.
 Args    : Tree to simulate the characters on.
           Optional:
           -seed           => a random integer seed
           -model          => an object of class Bio::Phylo::Models::Substitution::Dna or 
	                      Bio::Phylo::Models::Substitution::Binary
           -random_rootseq => start DNA sequence simulation from random ancestral sequence 
		              instead of the  median sequence in the alignment. 

 Comments: Requires Statistics::R, with 'ape', 'phylosim', 'phangorn' and 'phytools'.
           If model is not given as argument, it will be estimated.
insert()

Insert argument in invocant.

Type    : Listable method
Title   : insert
Usage   : $matrix->insert($datum);
Function: Inserts $datum in $matrix.
Returns : Modified object
Args    : A datum object
Comments: This method re-implements the method by the same
          name in Bio::Phylo::Listable
compress_lookup()

Removes unused states from lookup table

Type    : Method
Title   : validate
Usage   : $obj->compress_lookup
Function: Removes unused states from lookup table
Returns : $self
Args    : None
check_taxa()

Validates taxa associations.

Type    : Method
Title   : check_taxa
Usage   : $obj->check_taxa
Function: Validates relation between matrix and taxa block
Returns : Modified object
Args    : None
Comments: This method implements the interface method by the same
          name in Bio::Phylo::Taxa::TaxaLinker
make_taxa()

Creates a taxa block from the objects contents if none exists yet.

Type    : Method
Title   : make_taxa
Usage   : my $taxa = $obj->make_taxa
Function: Creates a taxa block from the objects contents if none exists yet.
Returns : $taxa
Args    : NONE

SERIALIZERS

to_xml()

Serializes matrix to nexml format.

Type    : Format convertor
Title   : to_xml
Usage   : my $data_block = $matrix->to_xml;
Function: Converts matrix object into a nexml element structure.
Returns : Nexml block (SCALAR).
Args    : Optional:
		   -compact => 1 (for compact representation of matrix)
to_nexus()

Serializes matrix to nexus format.

Type    : Format convertor
Title   : to_nexus
Usage   : my $data_block = $matrix->to_nexus;
Function: Converts matrix object into a nexus data block.
Returns : Nexus data block (SCALAR).
Args    : The following options are available:

           # if set, writes TITLE & LINK tokens
           '-links' => 1

           # if set, writes block as a "data" block (deprecated, but used by mrbayes),
           # otherwise writes "characters" block (default)
           -data_block => 1

           # if set, writes "RESPECTCASE" token
           -respectcase => 1

           # if set, writes "GAPMODE=(NEWSTATE or MISSING)" token
           -gapmode => 1

           # if set, writes "MSTAXA=(POLYMORPH or UNCERTAIN)" token
           -polymorphism => 1

           # if set, writes character labels
           -charlabels => 1

           # if set, writes state labels
           -statelabels => 1

           # if set, writes mesquite-style charstatelabels
           -charstatelabels => 1

           # by default, names for sequences are derived from $datum->get_name, if
           # 'internal' is specified, uses $datum->get_internal_name, if 'taxon'
           # uses $datum->get_taxon->get_name, if 'taxon_internal' uses
           # $datum->get_taxon->get_internal_name, if $key, uses $datum->get_generic($key)
           -seqnames => one of (internal|taxon|taxon_internal|$key)
to_dom()

Analog to to_xml.

Type    : Serializer
Title   : to_dom
Usage   : $matrix->to_dom
Function: Generates a DOM subtree from the invocant
          and its contained objects
Returns : an Element object
Args    : Optional:
          -compact => 1 : renders characters as sequences,
                          not individual cells

SEE ALSO

There is a mailing list at https://groups.google.com/forum/#!forum/bio-phylo for any user or developer questions and discussions.

Bio::Phylo::Taxa::TaxaLinker

This object inherits from Bio::Phylo::Taxa::TaxaLinker, so the methods defined therein are also applicable to Bio::Phylo::Matrices::Matrix objects.

Bio::Phylo::Matrices::TypeSafeData

This object inherits from Bio::Phylo::Matrices::TypeSafeData, so the methods defined therein are also applicable to Bio::Phylo::Matrices::Matrix objects.

Bio::Phylo::Manual

Also see the manual: Bio::Phylo::Manual and http://rutgervos.blogspot.com.

CITATION

If you use Bio::Phylo in published research, please cite it:

Rutger A Vos, Jason Caravas, Klaas Hartmann, Mark A Jensen and Chase Miller, 2011. Bio::Phylo - phyloinformatic analysis using Perl. BMC Bioinformatics 12:63. http://dx.doi.org/10.1186/1471-2105-12-63