NAME Mashtree

SYNOPSIS

Helps run a mashtree analysis to make rapid trees for genomes. Please see github.com/lskatz/Mashtree for more information.

mashtree executables

This document covers the Mashtree library, but the highlight the mashtree package is the executable `mashtree`. See github.com/lskatz/Mashtree for more information.

Fast method:

mashtree --numcpus 12 *.fastq.gz [*.fasta] > mashtree.dnd

More accurate method:

mashtree --mindepth 0 --numcpus 12 *.fastq.gz [*.fasta] > mashtree.dnd

Bootstrapping and jackknifing

mashtree_bootstrap.pl --reps 100 --numcpus 12 *.fastq.gz -- --min-depth 0 > mashtree.jackknife.dnd
mashtree_jackknife.pl --reps 100 --numcpus 12 *.fastq.gz -- --min-depth 0 > mashtree.jackknife.dnd

VARIABLES

$VERSION
$MASHTREE_VERSION (same value as $VERSION)
@fastqExt = qw(.fastq.gz .fastq .fq .fq.gz)
@fastaExt = qw(.fasta .fna .faa .mfa .fas .fsa .fa)
@bamExt = qw(.sorted.bam .bam)
@vcfExt = qw(.vcf.gz .vcf)
@mshExt = qw(.msh)
@richseqExt = qw(.gb .gbank .genbank .gbk .gbs .gbf .embl .ebl .emb .dat .swiss .sp)
$fhStick :shared

Used to mark whether a file is being read, so that Mashtree limits disk I/O

METHODS

$SIG{'__DIE__'}

Remakes how `die` works, so that it references the caller

logmsg

Prints a message to STDERR with the thread number and the program name, with a trailing newline.

openFastq
Opens a fastq file in a thread-safe way.
_truncateFilename
Removes fastq extension, removes directory name,
distancesToPhylip

1. Read the mash distances 2. Create a phylip file

Arguments: hash of distances, output directory, settings hash

sortNames

Sorts names.

Arguments:

1. $name - array of names 2. $settings - options * $$settings{'sort-order'} is either "abc", "random", "input-order"

createTreeFromPhylip($phylip, $outdir, $settings)
Create tree file with Quicktree but bioperl 
as a backup.
treeDist($treeObj1, $treeObj2)
Lee's implementation of a tree distance. The objective
is to return zero if two trees are the same.
mashDist($file1, $file2, $k, $settings)

Find the distance between two mash sketch files Alternatively: two hash lists.

mashHashes($sketch)

Return an array of hashes, the kmer length, and the genome estimated length

raw_mash_distance_unequal_sizes($hashes1, $hashes2)

Compare unequal sized hashes. Treat the first set of hashes as the reference (denominator) set.

raw_mash_distance($hashes1, $hashes2)

Return the number of kmers in common and the number compared total. inspiration from https://github.com/onecodex/finch-rs/blob/master/src/distance.rs#L34

transfer_bootstrap_expectation
Title   : transfer_bootstrap_expectation
Usage   : my $tree_with_bs = transfer_bootstrap_expectation(\@bs_trees,$guide_tree);
Function: Calculates the Transfer Bootstrap Expectation (TBE) for internal nodes based on 
          the methods outlined in Lemoine et al, Nature, 2018.
          Currently experimental.
Returns : L<Bio::Tree::TreeI>
Args    : Arrayref of L<Bio::Tree::TreeI>s
          Guide tree, L<Bio::Tree::TreeI>s