NAME

Bio::Phylo - A base module for analyzing and manipulating phylogenetic trees.

SYNOPSIS

use Bio::Phylo;

#instantiate a new object
my $phylo = new Bio::Phylo;

#and destroy it
$phylo->DESTROY;

DESCRIPTION

INTRODUCTION

The Bio::Phylo package consists of a collection of Perl libraries (OO-style) for parsing, generating and analyzing phylogenetic trees. The intended audience are biologists who are well versed in phylogenetic theory and comfortable with text editors. The simplest usage of Bio::Phylo would be for file conversion, in which case only a few lines of code would suffice. However, the libraries offer more, and in order to make these features more easily accessible I should start with a description of how Bio::Phylo sees trees, taxa, and matrices.

THE Bio::Phylo OBJECT MODEL

TREES

According to Bio::Phylo, there are Trees (which are modelled by the Bio::Phylo::Trees object), which contain Bio::Phylo::Trees::Tree objects, which contain Bio::Phylo::Trees::Node objects.

The Bio::Phylo::Trees::Node object

A node 'knows' a couple of things: its name, its branch length (i.e. the length of the branch connecting it and its parent), who its parent is, its next sister (on its right), its previous sister (on the left), its first daughter and its last daughter. These properties can be retrieved and modified by methods classified as ACCESSORS and MUTATORS.

From this set of properties follows a number of things which must be either true or false. For example, if a node has no children it is a terminal node. By asking a node whether it "is_terminal", it replies either with true (i.e. 1) or false (undef). Methods such as this are classified as TESTS.

Likewise, based on the properties of an individual node we can perform a query to retrieve nodes related to it. For example, by asking the node to "get_ancestors" it returns a list of its ancestors, being all the nodes and the path from its parent to, and including, the root. These methods are QUERIES.

Lastly, some CALCULATIONS can be performed by the node. By asking the node to "calc_path_to_root" it calculates the sum of the lengths of the branches connecting it and the root. Of course, in order to make all this possible, a node has to exist, so it needs to be constructed. The CONSTRUCTOR is the Bio::Phylo::Node->new() method.

Once a node has served its purpose it can be destroyed. For this purpose there is a DESTRUCTOR, which cleans up once we're done with the node. However, in most cases you don't have to worry about constructing and destroying nodes as this is done for you by a parser or a generator as needs arise.

For a detailed description of all the node methods, their arguments and return values, consult the node documentation, which, after install, can be viewed by issuing the "perldoc Bio::Phylo::Trees::Node" command.

The Bio::Phylo::Trees::Tree object

A tree knows very little. All it really holds is a set of nodes, which are there because of TREE POPULATION, i.e. the process of inserting nodes in the tree. The tree can be queried in a number of ways, for example, we can ask the tree to "get_entities", to which the tree replies with a list of all the nodes it holds. Be advised that this doesn't mean that the nodes are connected in a meaningful way, if at all. The tree doesn't care, the nodes are supposed to know who their parents, sisters, and daughters are. But, we can still get, for example, all the terminal nodes (i.e. the tips) in the tree by retrieving all the nodes in the tree and asking each one of them whether it "is_terminal", discarding the ones that aren't.

Based on the set of nodes the tree holds it can perform calculations, such as "calc_tree_length", which simply means that the tree iterates over all its nodes, summing their branch lengths, and returning the total.

The tree object also has a constructor and a destructor, but normally you don't have to worry about that. All the tree methods can be viewed by issuing the "perldoc Bio::Phylo::Trees::Tree" command.

The Bio::Phylo::Trees object

The object containing all others is the Trees object. It serves merely as a container to hold multiple trees, which are inserted in the Trees object using the "insert()" method, and retrieved using the "get_entities" method. More information can be found in the Bio::Phylo::Trees perldoc page.

CREATING NODES AND TREES

The Bio::Phylo::Parsers::* objects

Trees are probably most easily imported from files. To this end Bio::Phylo::Parsers objects are available. The constructor and destructor aside, these have only one method intended for outside access: "parse", with arguments indicating the tree format, and the location of the tree file.

The Bio::Phylo::Generator object

For simulations you can also generate trees. The currently available models are Yule, Hey and equiprobable. Consult the perldoc pages for Bio::Phylo::Generator to learn more about how to address this object.

USEFUL FEATURES

The following features are of particular interest:

Stemminess and Balance measures

A number of analysis methods heretofore unavailable are included in this package, such as calculation of two stemminess indices (Fiala et al, 1985; Rohlf et al., 1990).

Filters

Sets of objects can be filtered based on the results of any calculation available in this package. For example: Trees can be filtered on tree length, or imbalance, and so on. Nodes can be filtered based on their distance to the root, or to the tips, and so on. In addition, sets of objects can be filtered based on the string results of any method. For example, on a tree that contains species from the genera Lemur, Hapalemur, Eulemur and Otolemur these can all be filtered out by searching on the /^.*lemur$/i pattern.

Converters

The Phylo packages includes a number of parsers and unparsers, and additional ones can be included simply by writing the appropriate package and dropping the file in the Parsers or Unparsers folder. No modification of any of the other source is required, as long as a very limited set of methods is supported. With the currently packaged parsers, one can for example import taxa from one file, a tree from another, and a data matrix from a third file, then crossreference the three, and output them in a format suitable for Discrete, Continuous or Multistate. (The tree is resolved on the fly.)

REQUIREMENTS

Phylo has the following requirements:

A recent version of perl (5.6.* or 5.8.*);

The module should then build on all platforms. A quick test yielded success on all platforms that I tried:

- perl, v5.8.4 built for MSWin32-x86-multi-thread
- perl, v5.8.6 built for cygwin-thread-multi-64int
- perl, v5.8.0 built for darwin
- perl, v5.8.0 built for sun4-solaris
- perl, v5.6.1 built for i386-linux
- perl, v5.8.1-RC3 built for darwin-thread-multi-2level

Older versions of perl5 may or may not work. Perl4 definitely won't work.

Any version of the Math::Random module for generating Yule and Hey trees.

Math::Random can be installed from the comprehensive perl archive network by issuing:

perl -MCPAN -e 'install Math::Random'

or, on Windows:

ppm
install Math::Random

from the command line.

Any version of the SVG module for drawing trees.

SVG.pm can be installed from the comprehensive perl archive network by issuing:

perl -MCPAN -e 'install SVG'

or, on Windows:

ppm
install SVG

from the command line.

METHODS

CONSTRUCTOR

new()

The Bio::Phylo object itself, and thus its constructor, is rarely, if ever, used directly. Rather, all other objects in this package inherit its methods.

Type    : Constructor
Title   : new
Usage   : my $phylo = new Bio::Phylo;
Function: Instantiates Bio::Phylo object
Returns : a Bio::Phylo object
Args    : none

PACKAGE METHODS

get()

All objects in the package subclass the Bio::Phylo object, and so, for example, you can do $node->get('get_branch_length'); instead of $node->get_branch_length. This is a useful feature for listable objects especially, as the have the get_by_value method, which allows you to retrieve, for instance, a list of nodes whose branch length exceeds a certain value. That method (and get_by_regular_expression) uses this $obj->get method.

Type    : Accessor
Title   : get
Usage   : my $treelength = $tree->get('calc_tree_length');
Function: Alternative syntax for safely accessing any of the object data;
          useful for interpolating runtime $vars.
Returns : A SCALAR numerical value.
Args    : a SCALAR variable, e.g. $var = 'calc_matrix_size';
VERBOSE()

Getter and setter for the verbose level. Currently it's just 0=no messages, 1=messages, but perhaps there could be more levels? For caller diagnostics and so on?

Type    : Accessor
Title   : VERBOSE(0|1)
Usage   : Phylo->VERBOSE(0|1)
Function: Sets/gets verbose level
Alias   :
Returns : Verbose level
Args    : 0=no messages; 1=error messages
Comments:
CITATION()
Type    : Accessor
Title   : CITATION
Usage   : $phylo->CITATION;
Function: Returns suggested citation.
Alias   :
Returns : Returns suggested citation.
Args    : None
Comments:
COMPLAIN()
Type    : Internal method
Title   : COMPLAIN
Usage   : $phylo->COMPLAIN("error");
Function: Prints error message to STDERR if verbose level > 0
Alias   :
Returns : TRUE
Args    : String, error message
Comments:
VERSION()
Type    : Accessor
Title   : VERSION
Usage   : $phylo->VERSION;
Function: Returns version number (including revision number).
Alias   :
Returns : SCALAR
Args    : NONE
Comments:

DESTRUCTOR

DESTROY()

The destructor doesn't actually do anything yet, but it may be used, in the future, for additional debugging messages.

Type    : Destructor
Title   : DESTROY
Usage   : $phylo->DESTROY
Function: Destroys Phylo object
Alias   :
Returns : TRUE
Args    : none
Comments: You don't really need this, perl takes care of memory
          management and garbage collection.

AUTHOR

Rutger Vos, <rvosa@sfu.ca> http://www.sfu.ca/~rvosa/

BUGS

Please report any bugs or feature requests to bug-bio-phylo@rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Bio-Phylo. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

ACKNOWLEDGEMENTS

The author would like to thank Jason Stajich for many ideas borrowed from BioPerl http://www.bioperl.org, and CIPRES http://www.phylo.org and FAB* http://www.sfu.ca/~fabstar for comments and requests.

COPYRIGHT & LICENSE

Copyright 2005 Rutger Vos, All Rights Reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.