NAME

Bio::Phylo::Manual - Bio::Phylo v.0.12 user guide.

DESCRIPTION

This is the manual for Bio::Phylo. Bio::Phylo is a perl5 package for phylogenetic analysis. For installation instructions, read the README file in the root directory of the distribution. The stable URL for the most recent distribution is http://search.cpan.org/~rvosa/Bio-Phylo/

INSTANT GRATIFICATION

The following sections will demonstrate some of the basic functionality, with immediate, useful results.

ONE-LINERS

One-liners are commands run immediately from the command line, using the -e '...command...' switch, invoking the interpreter directly. Often, you'll include the -MFoo::Bar switch to include Foo::Bar at runtime. (See perlrun for more info on executing the interpreter.) NOTE FOR WINDOWS USERS: in the following examples, switch the quotes around, i.e. use double quotes where single quotes are used and vice versa.

First steps
Problem

No concept is valid in Perl if it cannot be expressed in a one-liner. For the Bio::Phylo package, small operations can often be expressed using a single, erm, expression from the command line. This is a rather long one-liner, but it proves a point.

Solution
perl -MBio::Phylo::IO=parse -e 'print \
parse(-format=>"newick",-string=>"((A,B),C);")->first->calc_imbalance'
Discussion

The -MModule switch includes a module the way you would use use Module; in a script. Here we use the Bio::Phylo::IO module. The -e switch is used to evaluate the following expression. We parse a string, ((A,B),C);, of format newick. The parser returns a Bio::Phylo::Forest object (i.e. a set of trees, in this case a set of one). From this set we retrieve the first element, and calculate Colless' imbalance, which returns a number, which we print to standard out. This would print "1", obviously.

Sets of trees
Problem

You want a one-liner to iterate over a set of trees:

Solution
perl -MBio::Phylo::IO=parse -lne 'print \
parse(-format=>"newick",-string=>$_)->first->calc_i2' <file>
Discussion

The -n switch wraps a while(<) { ... }> around the program, so the trees from file (that is, if they are one newick tree description per line) are copied into $_ one tree at a time. The -l switch appends a line break to the printed output.

Stringifying trees
Problem

You don't want a number printed to STDOUT, you want a tree:

Solution
perl -MBio::Phylo::IO=parse -e 'print \
parse(-format=>"newick",-string=>"((A,B),C);")->first->to_newick'
Discussion

If you try to print a tree object, what's written is something like Bio::Phylo::Forest::Tree=SCALAR(0x1a337dc) (that is, the memory address that the object references). This is probably not what you want, so the tree object has a $tree-to_newick> method that stringifies the object to a newick string.

INPUT AND OUTPUT

The Bio::Phylo::IO module is the unified front end for parsing and unparsing phylogenetic data objects. It is a non-OO module that optionally exports the parse and unparse subroutines into the caller's namespace, using the use Bio::Phylo::IO qw(parse unparse); directive. Alternatively, you can call the subroutines as class methods. The parse and unparse subroutines load and dispatch the appropriate sub-modules at runtime, depending on the -format argument.

Parsing trees
Problem

You want to create a Bio::Phylo::Forest::Tree object from a newick string.

Solution
use Bio::Phylo::IO;

# get a newick string from some source
my $tree_string = '(((A,B),C),D);';

# Call class method parse from Bio::Phylo::IO
my $tree = Bio::Phylo::IO->parse(
   -string => $tree_string,
   -format => 'fastnewick'
)->first;

# note: newick parser returns 'Bio::Phylo::Forest'
# Call ->first to retrieve the first tree of the forest.

print ref $tree, "\n"; # prints 'Bio::Phylo::Forest::Tree'
Discussion

The Bio::Phylo::IO module invokes specific parser modules. It is essentially a façade for the parsers. In the solution the Bio::Phylo::Parsers::Fastnewick parser turns a tree description into a Bio::Phylo::Forest object.

Note that there are currently two newick parsers to choose between, 'newick' and 'fastnewick'. The former is an older implementation, which appends unique node labels to all the nodes in the tree. It is an implementation that has been tested more thoroughly. On the other hand, 'fastnewick' has so far worked without problems. It does not introduce node labels, and parses large trees at greater speed than 'newick' (similar considerations apply to 'nexus' versus 'fastnexus').

The returned forest object subclasses Bio::Phylo::Listable, as a forest models a set of trees that you can iterate over. By calling the -first> method, we get the first tree in the forest - a Bio::Phylo::Forest::Tree object (in the example it's a very small forest, consisting of just this single tree).

Parsing tables
Problem

You want to create a Bio::Phylo::Matrices::Matrix object from a string.

Solution
use Bio::Phylo::IO;

# parsing a table
my $table_string = qq(A,1,2|B,1,2|C,2,2|D,2,1);
my $matrix = Bio::Phylo::IO->parse(
   -string   => $table_string,
   -format   => 'table',     # See Bio::Phylo::Parsers::Table
   -type     => 'STANDARD',  # Data type
   -fieldsep => ',',         # field separator
   -linesep  => '|'          # line separator
);

print ref $matrix, "\n"; # prints 'Bio::Phylo::Matrices::Matrix'
Discussion

Here the Bio::Phylo::Parsers::Table module parses a string A,1,2|B,1,2|C,2,2|D,2,1, where the | is considered a record or line separator, and the , as a field separator.

Parsing taxa
Problem

You want to create a Bio::Phylo::Taxa object from a string.

Solution
use Bio::Phylo::IO;

# parsing a list of taxa
my $taxa_string = 'A:B:C:D';
my $taxa = Bio::Phylo::IO->parse(
   -string   => $taxa_string,
   -format   => 'taxlist',
   -fieldsep => ':'
);

print ref $taxa, "\n"; # prints 'Bio::Phylo::Taxa'
Discussion

Here the Bio::Phylo::Parsers::Taxlist module parses a string A:B:C:D, where the : is considered a field separator. The parser returns a Bio::Phylo::Taxa object. Note that the same result can be obtained by building the taxa object from scratch (a more feasible proposition than building trees or matrices from scratch):

use Bio::Phylo::Taxa;
use Bio::Phylo::Taxa::Taxon;

my $taxa = Bio::Phylo::Taxa->new;
for ( 'A', 'B', 'C', 'D' ) {
    $taxa->insert( Bio::Phylo::Taxa::Taxon->new( -name => $_ ) );
}

print ref $taxa, "\n"; # prints 'Bio::Phylo::Taxa';

ITERATING

The Bio::Phylo::Listable module is the superclass of all container objects. Container objects are objects that contain a set of objects of the same type. For example, a Bio::Phylo::Forest::Tree object is a container for Bio::Phylo::Forest::Node objects. Hence, the Bio::Phylo::Forest::Tree inherits from the Bio::Phylo::Listable class. You can therefore iterate over the nodes in a tree using the methods defined by Bio::Phylo::Listable.

Iterating over trees and nodes.
Problem

You want to access trees and nodes contained in a Bio::Phylo::Forest object.

Solution
use Bio::Phylo::IO qw(parse);

my $string = '((A,B),(C,D));(((A,B),C)D);';
my $forest = parse( -format => 'fastnewick', -string => $string );

print ref $forest; # prints 'Bio::Phylo::Forest'

# access trees in $forest
foreach my $tree ( @{ $forest->get_entities } ) {
    print ref $tree; # prints 'Bio::Phylo::Forest::Tree';

    # access nodes in $tree
    foreach my $node ( @{ $tree->get_entities } ) {
        print ref $node; # prints 'Bio::Phylo::Forest::Node';

    }
}
Discussion

Bio::Phylo::Forest and Bio::Phylo::Forest::Tree are nested subclasses of the iterator class Bio::Phylo::Listable. Nested iterator calls (such as -get_entities>) can be invoked on the objects.

Iterating over taxa.
Problem

You want to access the individual taxa in a Bio::Phylo::Taxa object.

Solution
use Bio::Phylo::IO qw(parse);

my $string = 'A|B|C|D|E|F|G|H';
my $taxa = parse(
    -string => $string,
    -format => 'taxlist',
    -fieldsep => '|'
);
print ref $taxa; # prints 'Bio::Phylo::Taxa';

while ( my $taxon = $taxa->next ) {
    print ref $taxon; # prints 'Bio::Phylo::Taxa::Taxon'
}
Discussion

A Bio::Phylo::Taxa object is a subclass of the Bio::Phylo::Listable class. Hence, you could also call -get_entities> on the taxa object, which returns a reference to an array of taxon objects contained by the taxa object. Note however the shorthand:

while ( my $taxon = $taxa->next ) { ... }
Iterating over datum objects.
Problem

You want to access the datum objects contained by a Bio::Phylo::Matrices::Matrix object.

Solution
use Bio::Phylo::IO;

# parsing a table
my $table_string = qq(A,1,2|B,1,2|C,2,2|D,2,1);
my $matrix = Bio::Phylo::IO->parse(
   -string   => $table_string,
   -format   => 'table',     # See Bio::Phylo::Parsers::Table
   -type     => 'STANDARD',  # Data type
   -fieldsep => ',',         # field separator
   -linesep  => '|'          # line separator
);

print ref $matrix, "\n"; # prints 'Bio::Phylo::Matrices::Matrix'

my $datum = $matrix->get_by_index( 0, -1 );
print ref $datum; # NOTE: prints 'ARRAY'! 
Discussion

The Bio::Phylo::Matrices::Matrix object subclasses the Bio::Phylo::Listable object. Hence, its iterator methods are applicable here as well. In the above example, the get_by_index method is used. With a single argument it returns a Bio::Phylo object. With multiple arguments the semantics are nearly identical to array slicing (see perldata), except that an array reference is returned. Bio::Phylo generally passes by reference (see perlref).

SIMULATING TREES

The Bio::Phylo::Generator module simulates trees under various models of clade growth.

Generating Yule trees.

Here's how to generate a forest of ten trees with ten tips:

use Bio::Phylo::Generator;
my $gen = Bio::Phylo::Generator->new;
my $trees = $gen->gen_rand_pure_birth(
    -trees => 10,
    -tips  => 10,
    -model => 'yule'
);
print ref $trees; # prints 'Bio::Phylo::Forest'
Expected versus randomly drawn waiting times.

The generator object simulates trees under the Yule or the Hey model, returning. The -gen_rand_pure_birth> method call returns branch lengths drawn from the appropriate distribution, while -gen_exp_pure_birth> returns the expected waiting times (e.g. 1/n where n=number of lineages for the Yule model).

FILTERING

Filtering objects by numerical value.

To retrieve, for example, the nodes from a tree that are close to the root, call:

my @deep_nodes = @{ $tree->get_by_value(
   -value => 'calc_nodes_to_root',
   -le    => 2
) };

Which retrieves the nodes no more than 2 ancestors away from the root. Any method that returns a numerical value can be specified with the -value flag. The -le flag specifies that the returned value is less-than-or-equal to 2.

Filtering objects by regular expression.

String values that are returned by objects can be filtered using a compiled regular expression. For example:

my @lemurs = @{ $tree->get_by_regular_expression(
     -value => 'get_name',
     -match => qr/[Ll]emur_.+$/
) };

Retrieves all nodes whose genus name matches Eulemur, Lemur or Hapalemur.

DRAWING TREES

You can create SVG drawings of tree objects using the Bio::Phylo::Treedrawer module:

use Bio::Phylo::Treedrawers;
use Bio::Phylo::IO;

my $treedrawer = Bio::Phylo::Treedrawers->new(
   -width  => 400,
   -height => 600,
   -shape  => 'CURVY',
   -mode   => 'CLADO',
   -format => 'SVG'
);

my $tree = Bio::Phylo::IO->parse(
   -format => 'newick',
   -string => '((A,B),C);'
)->first;

$treedrawer->set_tree($tree);
$treedrawer->set_padding(50);

my $string = $treedrawer->draw;

Read the Bio::Phylo::Treedrawer perldoc for more info.

TIPS AND TRICKS

Generic metadata

You can append generic key/value pairs to any object, by calling $obj-set_generic( 'key' => 'value');>. Subsequently calling $obj-get_generic('key');> returns 'value'. This is a very useful feature in many situations where you may want to attach, for example, results from analyses by outside programs (e.g. likelihood scores) to the tree objects they refer to. Likewise, multiple numbers (e.g. bootstrap values, posteriors, bremer values) can be attached to the same node in this way.

OBJECT AND DATA MODEL

Perl objects

Object-oriented perl is a massive subject. To learn about the basic syntax of OO-perl, the following perldocs might be of interest:

perlboot

Introduction to OO perl. Read at least this one if you have no experience with OO perl.

perlobj

Details about perl objects.

perltooc

Class data.

perltoot

Advanced objects: "Tom's object-oriented tutorial for perl"

perlbot

The "Bag'o Object Tricks" (the BOT).

The Bio::Phylo object model

The following sections discuss the nested objects that model phylogenetic information and entities.

The Bio::Phylo root object.

The Bio::Phylo object is never used directly. However, all other objects inherit from it, which means that all objects have getters and setters for their name, description, score. They can all return a globally unique ID, can all be stringified to XML, and keep track of more administrative things such as the version number of the release.

The Bio::Phylo::Forest::* namespace

According to Bio::Phylo, there is a Forest (which is modelled by the Bio::Phylo::Forest object), which contains Bio::Phylo::Forest::Tree objects, which contain Bio::Phylo::Forest::Node objects.

The Bio::Phylo::Forest::Node object

A node 'knows' a couple of things: its name, its branch length (i.e. the length of the branch connecting it and its parent), who its parent is, its next sister (on its right), its previous sister (on the left), its first daughter and its last daughter. Also, a taxon can be specified that the node refers to (this makes most sense when the node is terminal). These properties can be retrieved and modified by methods classified as ACCESSORS and MUTATORS.

From this set of properties follows a number of things which must be either true or false. For example, if a node has no children it is a terminal node. By asking a node whether it "is_terminal", it replies either with true (i.e. 1) or false (undef). Methods such as this are classified as TESTS.

Likewise, based on the properties of an individual node we can perform a query to retrieve nodes related to it. For example, by asking the node to "get_ancestors" it returns a list of its ancestors, being all the nodes and the path from its parent to, and including, the root. These methods are QUERIES.

Lastly, some CALCULATIONS can be performed by the node. By asking the node to "calc_path_to_root" it calculates the sum of the lengths of the branches connecting it and the root. Of course, in order to make all this possible, a node has to exist, so it needs to be constructed. The CONSTRUCTOR is the Bio::Phylo::Node->new() method.

Once a node has served its purpose it can be destroyed. For this purpose there is a DESTRUCTOR, which cleans up once we're done with the node. However, in most cases you don't have to worry about constructing and destroying nodes as this is done for you by a parser or a generator as needs arise.

For a detailed description of all the node methods, their arguments and return values, consult the node documentation, which, after install, can be viewed by issuing the "perldoc Bio::Phylo::Forest::Node" command.

The Bio::Phylo::Forest::Tree object

A tree knows very little. All it really holds is a set of nodes, which are there because of TREE POPULATION, i.e. the process of inserting nodes in the tree. The tree can be queried in a number of ways, for example, we can ask the tree to "get_entities", to which the tree replies with a list of all the nodes it holds. Be advised that this doesn't mean that the nodes are connected in a meaningful way, if at all. The tree doesn't care, the nodes are supposed to know who their parents, sisters, and daughters are. But, we can still get, for example, all the terminal nodes (i.e. the tips) in the tree by retrieving all the nodes in the tree and asking each one of them whether it "is_terminal", discarding the ones that aren't.

Based on the set of nodes the tree holds it can perform calculations, such as "calc_tree_length", which simply means that the tree iterates over all its nodes, summing their branch lengths, and returning the total.

The tree object also has a constructor and a destructor, but normally you don't have to worry about that. All the tree methods can be viewed by issuing the "perldoc Bio::Phylo::Forest::Tree" command.

The Bio::Phylo::Forest object

The object containing all others is the Forest object. It serves merely as a container to hold multiple trees, which are inserted in the Forest object using the "insert()" method, and retrieved using the "get_entities" method. More information can be found in the Bio::Phylo::Forest perldoc page.

The Bio::Phylo::Matrices::* namespace

Objects in the Bio::Phylo::Matrices namespace are used to handle comparative data, as single observations, and in larger container objects.

The Bio::Phylo::Matrices::Datum object

The datum object holds a single observation of a predefined type, such as molecular data, or a continuous character observation. The Datum object can be linked to a taxon object, to specify which OTU the observation refers to. A 'single observation' does not imply a single character state: Datum objects can hold a DNA sequence as well - which Bio::Phylo considers a single observation.

The Bio::Phylo::Matrices::Sequence object

The Sequence object holds a string of characters of a predefined type, such as a molecular sequence, or a series of continuous character observations. The Sequence object can be linked to a taxon object, to specify which OTU the characters refer to. The sequence object is often more suitable for larger data sets (e.g. DNA sequences), the datum object is more memory intensive, but provides for more per character metadata - hence it is perhaps more appropriate for individual morphological observations.

The Bio::Phylo::Matrices::Matrix object

The matrix object is used to aggregate datum objects into a larger, iterator object, which can be accessed using the methods of the Bio::Phylo::Listable class.

The Bio::Phylo::Matrices::Alignment object

The alignment object is used to aggregate sequence objects into a larger, iterator object, which can be accessed using the methods of the Bio::Phylo::Listable class.

WARNING

It is IMPORTANT to note that matrix and alignment objects are, by default, NOT rectangular. They are containers that hold datum and sequence objects, respectively, in the order in which they were inserted. Hence, taking a slice (using 'get_by_index') returns a set of their contained objects - not a set of columns!. If non-contiguous or overlapping datum objects have been inserted, iterating over the taxa linked to the matrix will yield EMPTY rows (not rows with '?' missing data). To make your matrix rectangular, use the 'flatten' method, which concatenates the contained datum objects and pads empty sections with '??'.

The Bio::Phylo::Matrices object

The top level opject in the Bio::Phylo::Matrices namespace is used to contain multiple matrix or alignment objects, again implementing an iterator interface.

The Bio::Phylo::Taxa::* namespace

Sets of taxa are modelled by the Bio::Phylo::Taxa object. It is a container that holds Bio::Phylo::Taxa::Taxon objects. The taxon objects at present provide no other functionality than to serve as a means of crossreferencing nodes in trees, and datum or sequence objects. This, however, is a very important feature. In order to be able to write, for example, files formatted for Mark Pagel's Discrete, Continuous and Multistate programs a taxa object, a matrix and a tree object must be crossreferenced.

The Bio::Phylo::Taxa object

The taxa object is analogous to a taxa block as implemented by Mesquite (http://mesquiteproject.org). Multiple matrix objects and forests can be linked to a single taxa object, using $taxa-set_matrix( $matrix )>. Conversely, the relationship from matrix to taxa and from forest to taxa is a one-to-one relationship.

The Bio::Phylo::Taxa::Taxon object

Just as forests can be linked to taxa objects, so too can indidividual node and datum objects be linked to individual taxon objects. Again, the taxon can hold references to multiple nodes or multiple datum objects, but conversely there is a one-to-one relationship. There is a constraint on these relationships: a node can only refer to a taxon that belongs to a taxa object that the forest object that contains the node references:

      YES!
 ______________    
|FOREST        |  The taxon and node objects can
|  __________  |  link to each other, because
| |TREE      | |  their containers do also.
| |  ______  | |  
| | |NODE  | | |  
| | |______| | |  
| |_____^____| |                 
|_______|______|              NO!       
     ^  |               ______________  
 ____|__|__            |FOREST 'B'    |  The taxon object 
|TAXA   |  |           |  __________  |  cannot reference
|  _____|  |           | |TREE      | |  forest 'A' while
| |TAXON | |           | |  ______  | |  its container 
| |______| |           | | |NODE  | | |  references forest
|__________|           | | |______| | |  'B'. 
                       | |__________| |  
                       |______________|    ______________   
                            ^             |FOREST 'A'    |   
                        ____|_____        |  __________  |  
                       |TAXA      |       | |TREE      | |  
                       |  ______  |       | |  ______  | |  
                       | |TAXON |_|______ |_|_|NODE  | | |  
                       | |______| |       | | |______| | |  
                       |__________|       | |__________| |  
                                          |______________|        
      

Trying to set the links in the example on the right will result in errors: "Attempt to link X to taxon from wrong block". So what happens if a taxon already links to a node in forest 'A', and you link its enclosing taxa block to forest 'B'? The links at the taxon and node level will be removed, and the link between forest and taxa object will be enforced, yielding the warning "Reset X references from node objects to taxa outside taxa block".

Encapsulation

Unlike most other implementations of tree structures (or any other perl objects) the Bio::Phylo objects are truly encapsulated: Most perl objects are hash references, so in most cases you can do $obj-{'key'} = 'value'>. Not so for Bio::Phylo. The objects are implemented as 'InsideOut' objects. How they work exactly is outside of the scope of this document, but the upshot as that the state of an object can only be changed through its methods. This is a feature that helps keep the code base maintainable as this project grows. Also, the way it is implemented is more memory-efficient and faster than the standard approach. The encapsulation forces users of this module to use the documented interfaces of the objects. This, however, is a good thing: as long as the interfaces stay the same, any code using Bio::Phylo will continue to work, regardless of the implementation under the surface.

'Is-a' relationships: Inheritance

The objects in Bio::Phylo are related in various ways. Some objects inherit from superclasses. Hence the object is a special case of the superclass This type of relationship is shown below:

# base class
Bio::Phylo

	# facade parser, child classes
	Bio::Phylo::IO
		Bio::Phylo::Parsers::Newick
		Bio::Phylo::Parsers::Nexus
		Bio::Phylo::Parsers::Table
		Bio::Phylo::Parsers::Taxlist
		Bio::Phylo::Unparsers::Newick
		Bio::Phylo::Unparsers::Pagel

	# listable interface, child classes
	Bio::Phylo::Listable
		Bio::Phylo::Forest
		Bio::Phylo::Forest::Tree
		Bio::Phylo::Matrices
		Bio::Phylo::Matrices::Matrix
		Bio::Phylo::Matrices::Alignment
		Bio::Phylo::Taxon

	 # direct children of Bio::Phylo
	 Bio::Phylo::Forest::Node
	 Bio::Phylo::Matrices::Datum
	 Bio::Phylo::Matrices::Sequence
	 Bio::Phylo::Taxa::Taxon
	 Bio::Phylo::Generator
	 Bio::Phylo::Util::CONSTANT
	 Bio::Phylo::Util::Exceptions
        Bio::Phylo::Util::IDPool
	 Bio::Phylo::Treedrawer
	 Bio::Phylo::Treedrawer::SVG

'Has-a' relationships

Some objects contain other objects. For example, a Bio::Phylo::Forest::Tree contains Bio::Phylo::Forest::Node objects, a matrix object holds datum objects, and so on. The container objects all behave like Bio::Phylo::Listable objects: you can iterate over them (also recursively). The contains / container relationships implemented by Bio::Phylo are shown below:

CONTAINERS

 ______________     ________________
|FOREST        |   |MATRICES        |
|  __________  |   |  __________    |
| |TREE      | |   | |MATRIX    |   |
| |  ______  | |   | |  ______  |   |
| | |NODE  | | |   | | |DATUM | |   |
| | |______| | |   | | |______| |   |
| |__________| |   | |__________|   |
|______________|   |                |
                   |  ____________  |
 __________        | |ALIGNMENT   | |
|TAXA      |       | |  ________  | |
|  ______  |       | | |SEQUENCE| | |
| |TAXON | |       | | |________| | |
| |______| |       | |____________| |
|__________|       |________________|

ARGUMENT FORMATS

Named arguments when number of arguments >= 2.

When the number of arguments to a method call exceeds 1, named arguments are used. The order in which the arguments are specified doesn't matter, but the arguments must be all lower case and preceded by a dash:

use Bio::Phylo::Forest::Tree;

my $node = Bio::Phylo::Forest::Tree->new(
    -name  => 'PHYLIP_1',
    -score => 123,
);

Type checking

Argument type is always checked. Numbers are checked for being numbers, names are checked for being sane strings, without '():;,'. Objects are checked for type. The only intentional exception is in object constructors, i.e. if you instantiate a node, and use extra arguments in the constructor call:

use Bio::Phylo::Forest::Node;

my $node = Bio::Phylo::Forest::Node->new(
    -name          => 'Node name',
    -branch_length => 0.439
);

These arguments are not checked. You can abuse this to gain a performance advantage, but be careful not to specify garbage.

RETURN VALUES AND EXCEPTIONS

Retun values

Apart from scalar variables, all other return values are passed by reference, either as a reference to an object or to an array.

Lists returned as array references

Multiple return values are never returned as a list, always as an array reference:

my $nodes = $tree->get_entities;
print ref $nodes;

#prints ARRAY.

To receive nodes in @nodes, dereference the returned array reference (for clarity, all array dereferencing in this document is indicated by using braces in addition to this sigil):

my @nodes = @{ $tree->get_entities };
Returns self on mutators

Mutator method calls always return the modified object, and so they can be chained:

$node->set_name('Homo_sapiens')->set_branch_length(0.2343);
False but defined return values

When a value requested through an Accessor hasn't been set, the return value is undef. Here you should take care how you test. For example:

if ( ! $node->get_parent ) {
	$root = $node;
}

This works as expected. $node has no parent, hence it must be the root. However:

if ( ! $node->get_branch_length ) {

	# is there really no branch length?
	if ( defined $node->get_branch_length ) {

		# perhaps there is, but of length 0.
	}
}

...warrants caution. Zero is evaluated as false-but-defined.

Exceptions

The Bio::Phylo modules throw exceptions that subclass Exception::Class. Exceptions are thrown when something exceptional has happened. Not when the value requested through an accessor method is undefined. If a node has no parent, undef is returned. Usually, you will encounter exceptions in response to invalid input.

Trying/Catching exceptions

If some method call returns an exception, wrap the call inside an eval block. The error now becomes non-fatal:

# try something:
eval { $node->set_branch_length('a bad value'); };

# handle exception, if any
if ($@) {
   # do something, e.g.:
   print $@->trace->as_string; # <- $@ is an object!
}
Stack traces

If an exception of a particular type is caught, you can print a stack trace and find out what might have gone wrong starting from your script drilling into the module code.

# exception caught.
if ( UNIVERSAL::isa( $@, 'Bio::Phylo::Util::Exceptions::BadNumber' ) ) {

   # prints stack trace in addition to error
   warn $@->error, "\n, $@->trace->as_string, "\n";

   # further metadata from exception object
   warn join ' ',  $@->euid, $@->egid, $@->uid, $@->gid, $@->pid, $@->time;
   exit;
}
Exception types

Several exception classes are defined. The type of the thrown exception should give you a hint as to what might be wrong. The types are specified in the Bio::Phylo::Util::Exceptions perldoc.

TO DO

Below is a list of things that hopefully will be implemented in future versions of Bio::Phylo.

Scripts

Change the utility scripts, add to them.

More DNA sequence methods

Such as $seq-complement;>. This would imply larger constant translation tables, including various tables for mtDNA and so on. Will probably be implemented, must likely using BioPerl tools.

Databases

Implement access to TreeBase, Pandit and TolWeb.

Tests

Test coverage is reasonable, but some of the newer features need to be exercised more.

Interoperability with BioPerl and CIPRES

The eventual aim of the Bio::Phylo project is to glue together the phylogenetics aspects of BioPerl (http://www.bioperl.org) and the CIPRES project (http://www.phylo.org). The following does work (in pre-alpha). It is a program that: i) downloads sequences from genbank; ii) aligns them using Clustal; iii) turns the bioperl alignment in a Bio::Phylo::Matrices::Matrix object; iv) sends this to paup (http://paup.csit.fsu.edu) via CORBA; v) retrieves the inferred phylogeny; vi) displays it using TreeView. Note that this is not near release ready, but to wit:

#!/usr/bin/perl
use strict;
use warnings;
use Bio::DB::GenBank;
use Bio::Phylo::Taxa;
use Bio::Phylo::Taxa::Taxon;
use Bio::Phylo::Matrices::Datum;
use Bio::Phylo::Matrices::Matrix;
use Bio::Tools::Run::Alignment::Clustalw;
use CipresIDL::Scriptable;
use CipresIDL::LifeCycle;
use CipresIDL::TreeInfer;
use Cipres::Util::Registry;
use COPE::CORBA::ORB;
use CosEventChannelAdmin::ProxyPushConsumer_impl;
use Data::Dumper;

my $DEBUG = 0;

# get sequences from genbank
my $gb = Bio::DB::GenBank->new;
my $query = 'Daubentonia madagascariensis[Organism] AND mtDNA'
my $query = Bio::DB::Query::GenBank->new(
    '-query'   => $query,
    '-db'      => 'nucleotide'
);
my $seqio  = $gb->get_Stream_by_query( $query );

# array ref for clustalw
my $sequences = [];
while( my $sequence = $seqio->next_seq ) {
    print 'seq length is ', $sequence->length,"\n";
    push @{ $sequences }, $sequence;

}

# Align sequences using clustalw
my $factory = Bio::Tools::Run::Alignment::Clustalw->new(
    'ktuple' => 2, 
    'matrix' => 'BLOSUM',
);
my $alignment = $factory->align( $sequences );

# import alignment in Bio::Phylo
my $matrix = Bio::Phylo::Matrices::Matrix->new;
my $taxa   = Bio::Phylo::Taxa->new;
foreach my $sequence ( $alignment->each_seq ) {
    my $taxon = Bio::Phylo::Taxa::Taxon->new( 
        '-name' => $sequence->id 
    );
    $taxa->insert( $taxon );
    my $i = 1;
    my $datum = Bio::Phylo::Matrices::Datum->new(
        '-type'  => 'DNA',
        '-name'  => $sequence->id,
        '-char'  => [ split(//, $sequence->seq) ],
        '-taxon' => $taxon,
    );
    $matrix->insert( $datum );
    $taxon->set_data( $datum );
    $i++;
    print $i, "\n" if $DEBUG;
}

# start TreeInfer service
$ENV{'CIPRES_CONFIG_DIR'} = 'C:/cipres/build/CIPRES_winxp/resources';
my $reg = Cipres::Util::Registry->new( 
    'C:/cipres/build/CIPRES_winxp/resources/cipres_registry.xml' 
);
$reg->set_vars( 
    '%%IORFILE'     => 'service.ior',
    '%%CIPRES_ROOT' => 'C:/cipres/build/CIPRES_winxp',
    '%%SYSTEM_ROOT' => 'C:/',
    '%%PYTHON'      => 'C:/Python24/python.exe',
    '%%PAUP_PATH'   => 'C:/phylo/paupWin32/paup.exe',
    '%%PERL'        => 'C:/mod_perl/Perl/bin/perl.exe',
    '%%JAVA'        => 'C:/j2sdk1.4.2_10/bin/java.exe',
);
my $treeinf_service = shift @{ $reg->get_services_by_name('TreeInfer') };
$treeinf_service->start;

# start orb
my $orb = CORBA::ORB_init();
my $tree_inf = $orb->string_to_object( $treeinf_service->get_ior );
$tree_inf = CipresIDL::TreeInfer->_narrow( $tree_inf );
print Dumper( $tree_inf );

# infer tree
$tree_inf->setMatrix( $matrix->to_cipres );
my $proxypusher = CosEventChannelAdmin::ProxyPushConsumer_impl->new;
my $tree = $tree_inf->inferTree( $proxypusher );
print Dumper( $tree );

# open treeview
my $treefile = 'outfile.dnd';
{
    open( my $FH, '>', $treefile ) or die $!;
    print $FH $tree->{'m_newick'};
    close $FH;
}
system('C:\Program Files\Rod Page\TreeView\treev32.exe', $treefile);

FORUM

CPAN hosts a discussion forum for Bio::Phylo. If you have trouble using this module the discussion forum is a good place to start posting questions (NOT bug reports, see below): http://www.cpanforum.com/dist/Bio-Phylo

BUGS

Please report any bugs or feature requests to bug-bio-phylo@rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Bio-Phylo. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes. Be sure to include the following in your request or comment, so that I know what version you're using:

$Id: Manual.pod,v 1.15 2006/03/14 02:26:03 rvosa Exp $

AUTHOR

Rutger A. Vos,

email: rvosa@sfu.ca
web page: http://www.sfu.ca/~rvosa/

ACKNOWLEDGEMENTS

The author would like to thank Jason Stajich for many ideas borrowed from BioPerl http://www.bioperl.org, and CIPRES http://www.phylo.org and FAB* http://www.sfu.ca/~fabstar for comments and requests.

COPYRIGHT & LICENSE

Copyright 2005 Rutger A. Vos, All Rights Reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 132:

Non-ASCII character seen before =encoding in 'façade'. Assuming CP1252