NAME

Mashtree::Db - functions for Mashtree databasing

SYNOPSIS

use strict;
use warnings
use Mashtree::Db;

my $dbFile = "mashtree.tsv";
my $db=Mashtree::Db->new($dbFile);

# Add 10 distances from genome "test" to other genomes
my %distHash;
for(my $dist=0;$dist<10;$dist++){
  my $otherGenome = "genome" . $dist;
  $distHash{"test"}{$otherGenome} = $dist;
}
$db->addDistancesFromHash(\$distHash);

my $firstDistance = $db->findDistance("test", "genome0");
# => 0

DESCRIPTION

This is a helper module, usually not used directly. This is how Mashtree reads and writes to the internal database.

METHODS

Mashtree::Db->new($dbFile, \%settings)

Create a new Mashtree::Db object.

The database file is a tab-separated file and will be created if it doesn't exist. If it does exist, then it will be read into memory.

Arguments:

* $dbFile - a file path
* $settings - a hash of key/values (currently unused)
$db->selectDb

Selects a database. If it doesn't exist, then it will be created. Then, it sets the object property `dbFile` to the file path.

$db->readDatabase

Reads the database from the dbFile set by `selectDb`. Returns a hash of distances, e.g., genome1 => {genome2=>dist}

Then, this hash of distances is set in the object property `cache`.

addDistancesFromHash

Add distances from a perl hash, $distHash $distHash is { genome1 => {$genome2 => $dist} }

$db->addDistances

Add distances from a TSV file. TSV file should be a mash distances tsv file and is in the format of, e.g., # query t/lambda/sample1.fastq.gz t/lambda/sample2.fastq.gz 0.059 t/lambda/sample3.fastq.gz 0.061

$db->findDistance

Find the distance between any two genomes. Return undef if not found.

$db->findDistances

Find all distances from one genome to all others Return undef if not found.

$db->toString

Turn the database into a string representation.

Arguments:

* genomeArray - list of genomes to include, or undef for all genomes
* format - can be a string of one of these values:
  * tsv    3-column format (default)
  * matrix all-vs all tsv format
  * phylip Phylip matrix format
* sortBy - can be:
  * abc (default)
  * rand