NAME
Bio::ToolBox::db_helper::bigbed
DESCRIPTION
This module provides support for binary BigBed files to the Bio::ToolBox package.
USAGE
The module requires Bio::DB::BigBed to be installed, which in turn requires the UCSC Kent C library to be installed.
In general, this module should not be used directly. Use the methods available in Bio::ToolBox::db_helper or <Bio::ToolBox::Data>.
All subroutines are exported by default.
Available subroutines
- open_bigbed_db
-
This subroutine will open a BigBed database connection. Pass either the local path to a bigBed file (.bb or .bigbed extension) or the URL of a remote bigBed file. It will return the opened database object.
The opened BigBed object is cached for later use. If you do not want this (for example, when forking), pass a second true argument.
- collect_bigbed_scores
-
This subroutine will collect only the data values from a binary bigbed file for the specified database region. The positional information of the scores is not retained.
The subroutine is passed a parameter array reference. See "Data Collection Parameters Reference" below for details.
The subroutine returns an array or array reference of the requested dataset values found within the region of interest.
- collect_bigbed_position_scores
-
This subroutine will collect the score values from a binary bigBed file for the specified database region keyed by position.
The subroutine is passed a parameter array reference. See "Data Collection Parameters Reference" below for details.
The subroutine returns a hash of the defined dataset values found within the region of interest keyed by position. The feature midpoint is used as the key position. When multiple features are found at the same position, a simple mean (for score or length data methods) or sum (for count methods) is returned.
- sum_total_bigbed_features
-
This subroutine will sum the total number of bed features present in a BigBed file. This may be useful, for example, in calculating fragments (reads) per million mapped values when the bigbed file represents sequence alignments.
Pass either the name of a bigBed file (.bb), either local or remote, or an opened BigBed database object. A scalar value of the total number of features is returned.
Data Collection Parameters Reference
The data collection subroutines are passed an array reference of parameters. The recommended method for data collection is to use the "get_segment_score" in Bio::ToolBox::db_helper method.
The parameters array reference includes these items:
- 1. chromosome
- 1. start coordinate
- 3. stop coordinate
-
Coordinates are in BioPerl-style 1-base system.
- 4. strand
-
Should be standard BioPerl representation: -1, 0, or 1.
- 5. strandedness
-
A scalar value representing the desired strandedness of the data to be collected. Acceptable values include "sense", "antisense", or "all". Only those scores which match the indicated strandedness are collected.
- 6. score method
-
Acceptable values include score, count, and pcount.
* score returns the basepair coverage of alignments over the region of interest * count returns the number of alignments that overlap the search region. * pcount, or precise count, returns the count of alignments whose start and end fall within the region. * ncount, or named count, returns an array of alignment read names. Use this to avoid double-counting paired-end reads by counting only unique names. Reads are taken if they overlap the search region.
- 7. database
-
Not used here.
- 8. path to BigBed file
-
Subsequent bam files may also be provided as additional list items. Opened BigBed file objects are cached. Both local and remote files are supported.
SEE ALSO
Bio::ToolBox::Data::Feature, Bio::ToolBox::db_helper, Bio::DB::BigBed
AUTHOR
Timothy J. Parnell, PhD
Howard Hughes Medical Institute
Dept of Oncological Sciences
Huntsman Cancer Institute
University of Utah
Salt Lake City, UT, 84112
This package is free software; you can redistribute it and/or modify it under the terms of the Artistic License 2.0.