NAME
Bio::Phylo::Forest::DBTree - Phylogenetic database as a tree object
SYNOPSIS
use Bio::Phylo::Forest::DBTree;
# connect to the Green Genes tree
my $file = 'gg_13_5_otus_99_annotated.db';
my $dbtree = Bio::Phylo::Forest::DBTree->connect($file);
# $dbtree can be used as a Bio::Phylo::Forest::Tree object,
# and the node objects that are returned can be used as
# Bio::Phylo::Forest::Node objects
my $root = $dbtree->get_root;
DESCRIPTION
This package provides the functionality to handle very large phylogenies (examples: the NCBI taxonomy, the Green Genes tree) as if they are Bio::Phylo tree objects, with all the possibilities for traversal, computation, serialization, and visualization, but stored in a SQLite database. These databases are single files, so that they can be easily shared. Some useful database files are available here: https://figshare.com/account/home#/projects/18808
To make new tree databases, a number of scripts are provided with the distribution of this package:
megatree-loader
Loads a very large Newick tree into a database.megatree-ncbi-loader
Loads the NCBI taxonomy dump into a database.megatree-phylotree-loader
Loads a tree in the format of http://phylotree.org into a database.
As an example of interacting with a database tree, the script megatree-pruner
can be used to extract subtrees from a database.
DATABASE METHODS
The following methods deal with the database as a whole: creating a new database, connecting to an existing one, persisting a tree in a database and extracting one as a mutable, in-memory object.
create()
Creates a SQLite database file in the provided location. Usage:
use Bio::Phylo::Forest::DBTree;
# second argument is optional
Bio::Phylo::Forest::DBTree->create( $file, '/opt/local/bin/sqlite3' );
The first argument is the location where the database file is going to be created. The second argument is optional, and provides the location of the sqlite3
executable that is used to create the database. By default, the sqlite3
is simply found on the $PATH
, but if it is installed in a non-standard location that location can be provided here. The database schema that is created corresponds to the following SQL statements:
create table node(
id int not null,
parent int,
left int,
right int,
name varchar(20),
length float,
height float,
primary key(id)
);
create index parent_idx on node(parent);
create index left_idx on node(left);
create index right_idx on node(right);
create index name_idx on node(name);
connect()
Connects to a SQLite database file, returns the connection as a Bio::Phylo::Forest::DBTree
object. Usage:
use Bio::Phylo::Forest::DBTree;
my $dbtree = Bio::Phylo::Forest::DBTree->connect($file);
The argument is a file name. If the file exists, a DBD::SQLite database handle to that file is returned. If the file does not exist, a new database is created in that location, and subsequently the handle to that newly created database is returned. The creation of the database is handled by the create()
method (see below).
persist()
Persist a phylogenetic tree object (a subclass of Bio::Phylo::Forest::Tree) into a newly created database file. Usage:
use Bio::Phylo::Forest::DBTree;
my $dbtree = Bio::Phylo::Forest::DBTree->persist(
-file => $file,
-tree => $tree,
);
This method first create a database at the location specified by $file
by making a call to the create()
method. Subsequently, the $tree
object is traversed from root to tips and inserted in the newly created database. Finally, the handle to this database is returned, i.e. a Bio::Phylo::Forest::DBTree
object.
extract()
Extracts a tree from a database. The returned tree is an in-memory object. Hence, this is an expensive operation that is best avoided as much as possible. Usage:
my $tree = $dbtree->extract;
dbh()
Returns the underlying handle through which SQL statements can be executed directly on the database. This is a DBD::SQLite object. Usage:
my $dbh = $dbtree->dbh;
TREE METHODS
The following methods are implemented here to override methods of the same name in the Bio::Phylo hierarchy so that the tree database is accessed more efficiently than otherwise would be the case.
get_root()
Returns the root of the tree, i.e. a Bio::Phylo::Forest::DBTree::Result::Node object, which is a subclass of Bio::Phylo::Forest::Node. Usage:
my $root = $dbtree->get_root;
get_id()
Returns a dummy ID, an integer. Usage:
my $id = $dbtree->get_id;
get_by_name()
Returns the first node object that has the provided name. Usage:
my $node = $dbtree->get_by_name( 'Homo sapiens' );
visit()
Given a code reference, visits all the nodes in the tree and executes the code on the focal node. Usage:
$dbtree->visit(sub{
my $node = shift;
print $node->name, "\n";
});