NAME
Mashtree::Db - functions for Mashtree databasing
SYNOPSIS
use strict;
use warnings
use Mashtree::Db;
my $dbFile = "mashtree.tsv";
my $db=Mashtree::Db->new($dbFile);
# Add 10 distances from genome "test" to other genomes
my %distHash;
for(my $dist=0;$dist<10;$dist++){
my $otherGenome = "genome" . $dist;
$distHash{"test"}{$otherGenome} = $dist;
}
$db->addDistancesFromHash(\$distHash);
my $firstDistance = $db->findDistance("test", "genome0");
# => 0
DESCRIPTION
This is a helper module, usually not used directly. This is how Mashtree reads and writes to the internal database.
METHODS
- Mashtree::Db->new($dbFile, \%settings)
-
Create a new Mashtree::Db object.
The database file is a tab-separated file and will be created if it doesn't exist. If it does exist, then it will be read into memory.
Arguments:
* $dbFile - a file path * $settings - a hash of key/values (currently unused)
- $db->selectDb
-
Selects a database. If it doesn't exist, then it will be created. Then, it sets the object property `dbFile` to the file path.
- $db->readDatabase
-
Reads the database from the dbFile set by `selectDb`. Returns a hash of distances, e.g., genome1 => {genome2=>dist}
Then, this hash of distances is set in the object property `cache`.
- addDistancesFromHash
-
Add distances from a perl hash, $distHash $distHash is { genome1 => {$genome2 => $dist} }
- $db->addDistances
-
Add distances from a TSV file. TSV file should be a mash distances tsv file and is in the format of, e.g., # query t/lambda/sample1.fastq.gz t/lambda/sample2.fastq.gz 0.059 t/lambda/sample3.fastq.gz 0.061
- $db->toString
-
Turn the database into a string representation.
Arguments:
* genomeArray - list of genomes to include, or undef for all genomes * format - can be a string of one of these values: * tsv 3-column format (default) * matrix all-vs all tsv format * phylip Phylip matrix format * sortBy - can be: * abc (default) * rand