NAME
Bio::GenomeMap - Data structure store and query genomically indexed data efficiently using SQLite's R*Tree.
VERSION
version 0.03
SYNOPSIS
my $gm = Bio::GenomeMap->new(sqlite_file => 'gm.sqlite3', ro => BOOL);
$gm->bulk_insert(sub{
my ($inserter) = @_;
while (<ARGV>){ # get line
# parse into seqid, start, end, data.
chomp;
my ($seqid, $start, $end, $data) = split /\t/, $line;
$inserter->($seqid, $start, $end, $data);
}
});
$gm->iter_overlaps('chr1', 10000, 20000, sub {
my ($start, $end, $data) = @_;
...
});
METHODS
$gm->bulk_insert($code)
Main insertion method. $code is called with a single argument, an $inserter coderef, which itself should be called with a $seqid, $start coord, $end coord, and $data. Data can be either a scalar, or a more complicated perl structure, which will be frozen with Storable (and thawed automatically when retrieved with the iter_* and slurp_* methods. This method is smart enough to commit to the underlying database every-so-often (currently hardcoded to 50000 insertions/commit).
$gm->bulk_insert(sub{
my ($inserter) = @_;
# iterate over file/whatever and call $inserter on the parsed data:
while (...){ # get line
# parse into seqid, start, end, data.
$inserter->($seqid, $start, $end, $data);
}
});
$gm->iter_overlaps($seqid, $start, $end, $code)
Iterate over all entries on $seqid overlapping interval [$start, $end]. $code is called for each matching entry with arguments $start, $end, and $data:
$gm->iter_overlaps('chr1', 10000, 20000, sub {
my ($start, $end, $data) = @_;
});
$gm->iter_surrounding($seqid, $start, $end, $code)
Iterate over all entries on $seqid surrounding interval [$start, $end]. $code is called for each matching entry with arguments $start, $end, and $data:
$gm->iter_surrounding('chr1', 10000, 20000, sub {
my ($start, $end, $data) = @_;
});
$gm->iter_within($seqid, $start, $end, $code)
Iterate over all entries on $seqid within interval [$start, $end]. $code is called for each matching entry with arguments $start, $end, and $data:
$gm->iter_within('chr1', 10000, 20000, sub {
my ($start, $end, $data) = @_;
});
$gm->iter_all($code)
Iterate over everything. $code is called for each entry with arguments $seq, $start, $end, $data:
$gm->iter_all(sub{
my ($seq, $start, $end, $data) = @_;
});
slurp_overlaps($seqid, $start, $end)
$gm->slurp_overlaps('Chr1', 30000, 32000);
Returns array reference, each element of the form: [$start, $end, $data]
slurp_within($seqid, $start, $end)
$gm->slurp_within('Chr1', 30000, 32000);
Returns array reference, each element of the form: [$start, $end, $data]
Returns arefs of [start, end, data]:
slurp_surrounding($seqid, $start, $end)
$gm->slurp_surrounding('Chr1', 30000, 32000);
Returns array reference, each element of the form: [$start, $end, $data]
Returns arefs of [start, end, data]:
$gm->slurp_all()
Returns array reference of [seqid, start, end, data]:
my $res = $gm->slurp_all();
$gm->search($search_term);
Search data column textually, returns list or [$seqid, $start, $end, $data]. Up to $limit results returned, starting from $start.
my @results = $gm->search('chromatin');
AUTHOR
T. Nishimura <tnishimura@fastmail.jp>
COPYRIGHT AND LICENSE
This software is copyright (c) 2013 by T. Nishimura.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.