NAME

megatree-pruner - Given a list of taxa, returns subtree from a database

SYNOPSIS

megatree-pruner -d <file> -i <file> [-l <list>] [-trvhm]

OPTIONS

-d <file> or -dbfile <file>

Location of a database file, compatible with sqlite3, that has been produced by one of the megatree-*-loader scripts.

-i <file> or -infile <file>

Input file containing a list of taxon names that occur in the tree, one name per line. These are the taxa that are retained in the subtree that is produced. The alternate to this is the -l option, which is a list of names on the command line.

-l <list> or -list <list>

Input list of taxon names that occur in the tree, comma separated. These are the taxa that are retained in the subtree that is produced. The alternate to this is the -i option, which is an input file that contains a list of names on the command line.

-t or -tabular

Optional.

With this option, instead of producing a Newick-formatted tree description (which is the default), a tab-separated table that describes the tree is produced.

-r or -relabel

Optional.

With this option, internal nodes are relabeled in the output such that they become names of the format nXXX, where XXX is the primary key (i.e. an integer ID) of the node in the database.

-v or -verbose

Optional.

With this option, more feedback messages are written during processing. This option can be used multiple times, which increases the verbosity further.

-h or -help

Optional.

Prints help message / documentation.

-m or -man

Optional.

Prints manual page. Additional information is available in the documentation, i.e. perldoc megatree-pruner

DESCRIPTION

This program produces the subtree of the set of taxa provided as input from a previously produced database. As such, the functionality is roughly similar to extracting the 'common tree' from the NCBI taxonomy, which is a common operation in phylogenomics and related fields.

The input names can be provided either as a text file (the -i argument) or as a comma- separated list on the command line (the -l argument). The names in the file must match those in the database exactly, i.e. there is no fuzzy matching. Any names not found in the database are skipped, and a warning message will be emitted.

The output that is produced is either a Newick-formatted tree string, or a tab-separated table (which is recognized and readable by Bio::Phylo as the adjacency format).