NAME
Bio::Kmer - Helper module for Kmer Analysis.
SYNOPSIS
A module for helping with kmer analysis.
use strict;
use warnings;
use Bio::Kmer;
my $kmer=Bio::Kmer->new("file.fastq.gz",{kmercounter=>"jellyfish",numcpus=>4});
my $kmerHash=$kmer->kmers();
my $countOfCounts=$kmer->histogram();
my $minimizers = $kmer->minimizers();
my $minimizerCluster = $kmer->minimizerCluster();
The BioPerl way
use strict;
use warnings;
use Bio::SeqIO;
use Bio::Kmer;
# Load up any Bio::SeqIO object. Quality values will be
# faked internally to help with compatibility even if
# a fastq file is given.
my $seqin = Bio::SeqIO->new(-file=>"input.fasta");
my $kmer=Bio::Kmer->new($seqin);
my $kmerHash=$kmer->kmers();
my $countOfCounts=$kmer->histogram();
DESCRIPTION
A module for helping with kmer analysis. The basic methods help count
kmers and can produce a count of counts. Currently this module only
supports fastq format. Although this module can count kmers with pure
perl, it is recommended to give the option for a different kmer counter
such as Jellyfish.
DEPENDENCIES
* BioPerl
* Jellyfish >=2
* Perl threads
* Perl >=5.10
VARIABLES
$Bio::Kmer::iThreads
Boolean describing whether the module instance is using threads
METHODS
Bio::Kmer->new($filename, \%options)
Create a new instance of the kmer counter. One object per file.
Filename can be either a file path or a Bio::SeqIO object.
Applicable arguments for \%options:
Argument Default Description
kmercounter perl What kmer counter software to use.
Choices: Perl, Jellyfish.
kmerlength|k 21 Kmer length
numcpus 1 This module uses perl
multithreading with pure perl or
can supply this option to other
software like jellyfish.
gt 1 If the count of kmers is fewer
than this, ignore the kmer. This
might help speed analysis if you
do not care about low-count kmers.
sample 1 Retain only a percentage of kmers.
1 is 100%; 0 is 0%
Only works with the perl kmer counter.
verbose 0 Print more messages.
Examples:
my $kmer=Bio::Kmer->new("file.fastq.gz",{kmercounter=>"jellyfish",numcpus=>4});
$kmer->ntcount()
Returns the number of base pairs counted. In some cases such as when
counting with Jellyfish, that number is not calculated; instead the
length is calculated by the total length of kmers. Internally, this
number is stored as $kmer->{_ntcount}.
Note: internally runs $kmer->histogram() if $kmer->{_ntcount} is not
initially found.
Arguments: None
Returns: integer
$kmer->count()
Count kmers. This method is called as soon as new() is called and so
you should never have to run this method. Internally caches the kmer
counts to ram.
Arguments: None
Returns: None
$kmer->clearCache
Clears kmer counts and histogram counts. You should probably never
use this method.
Arguments: None
Returns: None
$kmer->query($queryString)
Query the set of kmers with your own query
Arguments: query (string)
Returns: Count of kmers.
0 indicates that the kmer was not found.
-1 indicates an invalid kmer (e.g., invalid length)
$kmer->histogram()
Count the frequency of kmers. Internally caches the histogram to
ram.
Arguments: none
Returns: Reference to an array of counts. The index of
the array is the frequency.
$kmer->kmers
Return actual kmers
Arguments: None
Returns: Reference to a hash of kmers and their counts
$kmer->minimizers(5)
Finds minimizer of each kmer
Arguments: length of minimizer (default: 5)
returns: hash ref, e.g., $hash = {AAAAA=>AAA, TAGGGT=>AGG,...}
$kmer->minimizerCluster(5)
Finds minimizer of each kmer
Arguments: length of minimizer (default: 5).
Internally, calls $kmer->minimizer($l)
If $kmer->minimizer has already been called, this parameter will be ignored.
returns: hash ref, e.g., $hash = {AAA=>[TAAAT, AAAGG,...], ATT=>[GATTC,...]}}
$kmer->union($kmer2)
Finds the union between two sets of kmers
Arguments: Another Bio::Kmer object
Returns: List of kmers
$kmer->intersection($kmer2)
Finds the intersection between two sets of kmers
Arguments: Another Bio::Kmer object
Returns: List of kmers
$kmer->subtract($kmer2)
Finds the set of kmers unique to this Bio::Kmer object.
Arguments: Another Bio::Kmer object
Returns: List of kmers
$kmer->close()
Cleans the temporary directory and removes this object from RAM.
Good for when you might be counting kmers for many things but want
to keep your overhead low.
Arguments: None
Returns: 1
COPYRIGHT AND LICENSE
MIT license. Go nuts.
AUTHOR
Author: Lee Katz <lkatz@cdc.gov>
For additional help, go to https://github.com/lskatz/Bio--Kmer
CPAN module at http://search.cpan.org/~lskatz/Bio-Kmer/