NAME

Compress::BGZF::Reader - Performs blocked GZIP (BGZF) decompression

SYNOPSIS

use Compress::BGZF::Reader;

# Use as filehandle
my $fh_bgz = Compress::BGZF::Reader->new_filehandle( $bgz_filename );

# you can do this, but it's probably faster just to pipe gunzip
while (my $line = <$fh_bgz>) {
    print $line;
}

# here's the random-access goodness
# fetch 32 bytes from uncompressed offset 1001
seek $fh_bgz, 1001, 0;
read $fh_bgz, my $data, 32;
print $data;

# Use as object
my $reader = Compress::BGZF::Reader->new( $bgz_filename );

# Move to a virtual offset (somehow pre-calculated) and read 32 bytes
$reader->move_to_vo( $virt_offset );
my $data = $reader->read_data(32);
print $data;

$reader->write_index( $fn_idx );

DESCRIPTION

Compress::BGZF::Reader is a module implementing random access to the BGZIP file format. While it can do sequential/streaming reads, there is really no point in using it for this purpose over standard GZIP tools/libraries, since BGZIP is GZIP-compatible.

There are two main modes of construction - as an object (using new()) and as a filehandle glob (using new_filehandle). The filehandle mode is straightforward for general use (emulating seek/read/tell functionality and passing to other classes/methods that expect a filehandle). The object mode has additional features such as seeking to virtual offsets and dumping the offset index to file.

METHODS

Filehandle Functions

new_filehandle
my $fh_bgzf = Compress::BGZF::Writer->new_filehandle( $input_fn );

Create a new Compress::BGZF::Reader engine and tie it to a IO::File handle, which is returned. Takes a mandatory single argument for the filename to be read from.

<>
readline
seek
read
tell
eof
my $line = <$fh_bgzf>;
my $line = readline $fh_bgzf;
seek $fh_bgzf, 256, 0;
read $fh_bgzf, my $buffer, 32;
my $loc = tell $fh_bgzf;
print "End of file\n" if eof($fh_bgzf);

These functions emulate the standard perl functions of the same name.

Object-oriented Methods

new
my $reader = Compress::BGZF::Reader->new( $fn_in );

Create a new Compress::BGZF::Reader engine. Requires a single argument - the name of the BGZIP file to be read from.

move_to
$reader->move_to( 493, 0 );

Seeks to the given uncompressed offset. Takes two arguments - the requested offset and the relativity of the offset (0: file start, 1: current, 2: file end)

move_to_vo
$reader->move_to_vo( $virt_offset );

Like move_to, but takes as a single argument a virtual offset. Virtual offsets are described more in the top-level documentation for Compress::BGZF.

get_vo
$reader->get_vo();

Returns the virtual offset of the current read position

read_data
my $data = $reader->read_data( 32 );

Read uncompressed data from the current location. Takes a single argument - the number of bytes to be read - and returns the data read or undef if at EOF.

getline
my $line = $reader->getline();

Reads one line of uncompressed data from the current location, shifting the current file offset accordingly. Returns the line read or undef if currently at EOF.

usize
my $size = $reader->usize();

Returns the uncompressed size of the file, as calculated during indexing.

write_index
$reader->write_index( $fn_index );

Writes the compressed index to file. The index format (as defined by htslib) consists of little-endian int64-coded values. The first value is the number of offsets in the index. The rest of the values consist of pairs of block offsets relative to the compressed and uncompressed data. The first offset (always 0,0) is not included. The index files written by Compress::BGZF should be compatible with those of the htslib bgzip software, and vice versa.

rebuild_index
$reader->rebuild_index;

Clears the in-memory index and rebuilds it from scratch

NEWLINES

Note that when using the tied filehandle interface, the behavior of the module will replicate that of a file opened in raw mode. That is, none of the Perl magic concerning platform-specific newline conversions will be performed. It's expected that users of this module will generally be seeking to predetermined byte offsets in a file (such as read from an index), and operations such as seek, read, and <> are not reliable in a cross-platform way on files opened in 'text' mode. In other words, seeking to and reading from a specific offset in 'text' mode may return different results depending on the platform Perl is running on. This isn't an issue specific to this module but to Perl in general. Users should simply be aware that any data read using this module will retain its original line endings, which may not be the same as those of the current platform.

For a further discussion, see http://perldoc.perl.org/perlport.html#Newlines.

CAVEATS AND BUGS

This is code is in alpha testing stage and the API is not guaranteed to be stable.

Please reports bugs to the author.

AUTHOR

Jeremy Volkening <jdv *at* base2bio.com>

COPYRIGHT AND LICENSE

Copyright 2015-2016 Jeremy Volkening

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.