NAME
Bio::ToolBox::db_helper::bigwig
DESCRIPTION
This module provides support for binary BigWig files to the Bio::ToolBox package. It also supports a directory of one or more bigWig files as a combined database, known as a BigWigSet.
USAGE
The module requires Bio::DB::BigWig to be installed, which in turn requires the UCSC Kent C library to be installed.
In general, this module should not be used directly. Use the methods available in Bio::ToolBox::db_helper or <Bio::ToolBox::Data>.
All subroutines are exported by default.
Available subroutines
- open_bigwig_db
-
This subroutine will open a BigWig database connection. Pass either the local path to a bigWig file (.bw or .bigwig extension) or the URL of a remote bigWig file. It will return the opened database object.
- open_bigwigset_db
-
This subroutine will open a BigWigSet database connection using a directory of BigWig files and one metadata index file, as described in Bio::DB::BigWigSet. Essentially, this treats a directory of BigWig files as a single database with each BigWig file representing a different feature with unique attributes (type, source, strand, etc).
Pass the subroutine a scalar value representing the local path to the directory. It presumes a feature_type of 'region', as expected by the other Bio::ToolBox db_helper subroutines and modules. It will return the opened database object.
- collect_bigwig_score
-
This subroutine will collect a single value from a binary bigWig file. It uses the low-level summary method to collect the statistical information and is therefore significantly faster than the other methods, which rely upon parsing individual data points across the region.
The subroutine is passed a parameter array reference. See "Data Collection Parameters Reference" below for details.
The object will return either a valid score or a null value.
- collect_bigwigset_score
-
Similar to "collect_bigwig_score" but using a BigWigSet database of BigWig files. Unlike individual BigWig files, BigWigSet features support stranded data collection if a strand attribute is defined in the metadata file.
The subroutine is passed a parameter array reference. See "Data Collection Parameters Reference" below for details.
- collect_bigwig_scores
-
This subroutine will collect only the score values from a binary BigWig file for the specified database region. The positional information of the scores is not retained.
The subroutine is passed a parameter array reference. See "Data Collection Parameters Reference" below for details.
The subroutine returns an array or array reference of the requested dataset values found within the region of interest.
- collect_bigwigset_scores
-
Similar to "collect_bigwig_scores" but using a BigWigSet database of BigWig files. Unlike individual BigWig files, BigWigSet features support stranded data collection if a strand attribute is defined in the metadata file.
The subroutine is passed a parameter array reference. See "Data Collection Parameters Reference" below for details.
- collect_bigwig_position_scores
-
This subroutine will collect the score values from a binary BigWig file for the specified database region keyed by position.
The subroutine is passed a parameter array reference. See "Data Collection Parameters Reference" below for details.
The subroutine returns a hash of the defined dataset values found within the region of interest keyed by position. Note that only one value is returned per position, regardless of the number of dataset features passed. Usually this isn't a problem as only one dataset is examined at a time.
- collect_bigwigset_position_score
-
Similar to "collect_bigwig_position_scores" but using a BigWigSet database of BigWig files. Unlike individual BigWig files, BigWigSet features support stranded data collection if a strand attribute is defined in the metadata file.
The subroutine is passed a parameter array reference. See below for details.
Data Collection Parameters Reference
The data collection subroutines are passed an array reference of parameters. The recommended method for data collection is to use the "get_segment_score" in Bio::ToolBox::db_helper method.
The parameters array reference includes these items:
- 1. chromosome
- 1. start coordinate
- 3. stop coordinate
-
Coordinates are in BioPerl-style 1-base system.
- 4. strand
-
Should be standard BioPerl representation: -1, 0, or 1.
- 5. strandedness
-
A scalar value representing the desired strandedness of the data to be collected. Acceptable values include "sense", "antisense", or "all". Only those scores which match the indicated strandedness are collected.
- 6. score method
-
Acceptable values include mean, min, max, stddev, sum, and count. Used when collecting a single value over a genomic segnment.
Note: methods of pcount and ncount are technically supported, but are treated the same as count.
- 7. A database object.
-
Pass the opened Bio::DB::BigWigSet database object when working with BigWigSets. Otherwise, pass
undef
for BigWig files. - 8. Dataset name
-
For BigWig files, pass the path of the local or URL of a remote bigWig file. Opened BigWig objects are cached.
For BigWigSet databases, pass the name of the dataset within the BigWigSet database to use. Either the
name
ortype
may be used.Additional dataset items may be added to the list when merging data.
SEE ALSO
Bio::ToolBox::Data::Feature, Bio::ToolBox::db_helper, Bio::DB::BigWig, Bio::DB::BigWigSet
AUTHOR
Timothy J. Parnell, PhD
Howard Hughes Medical Institute
Dept of Oncological Sciences
Huntsman Cancer Institute
University of Utah
Salt Lake City, UT, 84112
This package is free software; you can redistribute it and/or modify it under the terms of the Artistic License 2.0.