NAME

Bio::GenomeMap - Data structure store and query genomically indexed data efficiently using SQLite's R*Tree.

VERSION

version 0.03

SYNOPSIS

my $gm = Bio::GenomeMap->new(sqlite_file => 'gm.sqlite3', ro => BOOL);

$gm->bulk_insert(sub{
    my ($inserter) = @_;

    while (<ARGV>){ # get line
       # parse into seqid, start, end, data.
       chomp; 
       my ($seqid, $start, $end, $data) = split /\t/, $line;
       $inserter->($seqid, $start, $end, $data);
    }
});

$gm->iter_overlaps('chr1', 10000, 20000, sub {
   my ($start, $end, $data) = @_;
   ...
});

METHODS

$gm->bulk_insert($code)

Main insertion method. $code is called with a single argument, an $inserter coderef, which itself should be called with a $seqid, $start coord, $end coord, and $data. Data can be either a scalar, or a more complicated perl structure, which will be frozen with Storable (and thawed automatically when retrieved with the iter_* and slurp_* methods. This method is smart enough to commit to the underlying database every-so-often (currently hardcoded to 50000 insertions/commit).

$gm->bulk_insert(sub{
    my ($inserter) = @_;

    # iterate over file/whatever and call $inserter on the parsed data:
    while (...){ # get line
       # parse into seqid, start, end, data.
       $inserter->($seqid, $start, $end, $data);
    }
});

$gm->iter_overlaps($seqid, $start, $end, $code)

Iterate over all entries on $seqid overlapping interval [$start, $end]. $code is called for each matching entry with arguments $start, $end, and $data:

$gm->iter_overlaps('chr1', 10000, 20000, sub {
   my ($start, $end, $data) = @_;
});

$gm->iter_surrounding($seqid, $start, $end, $code)

Iterate over all entries on $seqid surrounding interval [$start, $end]. $code is called for each matching entry with arguments $start, $end, and $data:

$gm->iter_surrounding('chr1', 10000, 20000, sub {
   my ($start, $end, $data) = @_;
});

$gm->iter_within($seqid, $start, $end, $code)

Iterate over all entries on $seqid within interval [$start, $end]. $code is called for each matching entry with arguments $start, $end, and $data:

$gm->iter_within('chr1', 10000, 20000, sub {
   my ($start, $end, $data) = @_;
});

$gm->iter_all($code)

Iterate over everything. $code is called for each entry with arguments $seq, $start, $end, $data:

$gm->iter_all(sub{
    my ($seq, $start, $end, $data) = @_;
});

slurp_overlaps($seqid, $start, $end)

$gm->slurp_overlaps('Chr1', 30000, 32000);

Returns array reference, each element of the form: [$start, $end, $data]

slurp_within($seqid, $start, $end)

$gm->slurp_within('Chr1', 30000, 32000);

Returns array reference, each element of the form: [$start, $end, $data]

Returns arefs of [start, end, data]:

slurp_surrounding($seqid, $start, $end)

$gm->slurp_surrounding('Chr1', 30000, 32000);

Returns array reference, each element of the form: [$start, $end, $data]

Returns arefs of [start, end, data]:

$gm->slurp_all()

Returns array reference of [seqid, start, end, data]:

my $res = $gm->slurp_all();

$gm->search($search_term);

Search data column textually, returns list or [$seqid, $start, $end, $data]. Up to $limit results returned, starting from $start.

my @results = $gm->search('chromatin');

AUTHOR

T. Nishimura <tnishimura@fastmail.jp>

COPYRIGHT AND LICENSE

This software is copyright (c) 2013 by T. Nishimura.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.