NAME

Bio::ToolBox::db_helper::bigbed

DESCRIPTION

This module supports the use of bigBed file in the biotoolbox scripts. It is used to collect the dataset scores from a binary bigBed file (.bb). The file may be local or remote.

Scores may be restricted to strand by specifying the desired strandedness. For example, to collect transcription data over a gene, pass the strandedness value 'sense'. If the strand of the region database object (representing the gene) matches the strand of the bed feature, then the data for that bed feature is collected.

For loading bigbed files into a Bio::DB database, see the biotoolbox perl script 'big_filegff3.pl'.

USAGE

The module requires Lincoln Stein's Bio::DB::BigBed to be installed.

Load the module at the beginning of your program.

use Bio::ToolBox::db_helper::bigbed;

It will automatically export the name of the subroutines.

collect_bigbed_scores

This subroutine will collect only the data values from a binary bigbed file for the specified database region. The positional information of the scores is not retained, and the values are best further processed through some statistical method (mean, median, etc.).

The subroutine is passed seven or more arguments in the following order:

1. The chromosome or seq_id
2. The start position of the segment to collect
3. The stop or end position of the segment to collect
4. The strand of the feature or segment.

The BioPerl strand values must be used, i.e. -1, 0, or 1.

5. The strandedness of the bed elements to collect.

A scalar value representing the desired strandedness of the data to be collected. Acceptable values include "sense", "antisense", or "all". Only those scores which match the indicated strandedness are collected.

6. The value type of the data to collect.

Acceptable values include score, count, pcount, and length.

score returns the score of each bed element within the 
region. Make sure the BigBed elements contain a score 
column.

count returns the number of elements that overlap the 
search region. 

pcount, or precise count, returns the count of elements 
that only fall within the region and do not extend beyond
the search region.

length returns the lengths of all overlapping elements. 
7. The paths to one or more BigBed files

Always provide the BigBed path. Opened BigBed file objects are cached. Both local and remote files are supported.

The subroutine returns an array of the defined dataset values found within the region of interest.

collect_bigbed_position_scores

This subroutine will collect the score values from a binary bigBed file for the specified database region keyed by position.

The subroutine is passed the same arguments as collect_bigbed_scores().

The subroutine returns a hash of the defined dataset values found within the region of interest keyed by position. The feature midpoint is used as the key position. When multiple features are found at the same position, a simple mean (for score or length data methods) or sum (for count methods) is returned.

open_bigbed_db()

This subroutine will open a BigBed database connection. Pass either the local path to a bigBed file (.bb extension) or the URL of a remote bigBed file. It will return the opened database object.

The opened BigBed object is cached for later use. If you do not want this (for example, when forking), pass a second true argument.

sum_total_bigbed_features()

This subroutine will sum the total number of bed features present in a BigBed file. This may be useful, for example, in calculating fragments (reads) per million mapped values when the bigbed file represents sequence alignments.

Pass either the name of a bigBed file (.bb), either local or remote, or an opened BigBed database object. A scalar value of the total number of features is returned.

AUTHOR

Timothy J. Parnell, PhD
Howard Hughes Medical Institute
Dept of Oncological Sciences
Huntsman Cancer Institute
University of Utah
Salt Lake City, UT, 84112

This package is free software; you can redistribute it and/or modify it under the terms of the Artistic License 2.0.