NAME
Bio::ToolBox::db_helper::seqfasta
DESCRIPTION
This module supports opening Bio::DB::SeqFeature::Store and Bio::DB::Fasta BioPerl database adaptors. It also supports collecting feature scores from Bio::DB::SeqFeature::Store databases. Unsupported BioPerl-style database adaptors that support generic methods may also be used, although success may vary.
Opening databases
For Fasta databases, either a single fasta file or a directory of fasta files may be provided.
For SeqFeature Store databases, the connection parameters are stored in a configuration file, .biotoolbox.cfg
. Multiple database containers are supported, including MySQL, SQLite, and in-memory.
Collecting scores
Scores from seqfeature objects stored in the database may be retrieved. The scores may be collected as is, or they may be associated with genomic positions (indexed scores). Scores may be restricted to strand by specifying the desired strandedness. For example, to collect transcription data over a gene, pass the strandedness value 'sense'. If the strand of the region database object (representing the gene) matches the strand of the wig file data feature, then the data is collected.
Legacy wig file support uses GFF SeqFeature databases to store the file paths of the binary wiggle (.wib) files. If the seqfeature objects returned from the database include the wigfile attribute, then these objects are forwarded on to the Bio::ToolBox::db_helper::wiggle adaptor for appropriate score collection.
USAGE
The module requires the BioPerl adaptors Bio::DB::SeqFeature::Store and Bio::DB::Fasta.
Load the module at the beginning of your program.
use Bio::ToolBox::db_helper::seqfasta;
It will automatically export the name of the subroutines.
- collect_store_scores
-
This subroutine will collect only the score values from database features for the specified database region. The positional information of the scores is not retained, and the values may be further processed through some statistical method (mean, median, etc.).
The subroutine is passed eight or more arguments in the following order:
1) The opened database object. A database name or file is not ok. 2) The chromosome name 3) The start position of the segment to collect from 4) The stop or end position of the segment to collect from 5) The strand of the original feature (or region), -1, 0, or 1. 6) A scalar value representing the desired strandedness of the data to be collected. Only those scores which match the indicated strandedness are collected. Acceptable values include "sense", "antisense", "none" or "no". 7) The type of data collected. Acceptable values include 'score' (returns the score), 'count' (the number of defined positions with scores), or 'length' (the wig step is used here). 8) One or more feature types or primary_tags to perform the database search. If nothing is provided, then usually everything in the database is returned!
The subroutine returns an array of the defined dataset values found within the region of interest.
- collect_wig_position_scores
-
This subroutine will collect the score values form features in the database for the specified region keyed by position.
The subroutine is passed the same arguments as collect_wig_scores().
The subroutine returns a hash of the defined dataset values found within the region of interest keyed by position. Note that only one value is returned per position, regardless of the number of dataset features passed.
AUTHOR
Timothy J. Parnell, PhD
Dept of Oncological Sciences
Huntsman Cancer Institute
University of Utah
Salt Lake City, UT, 84112
This package is free software; you can redistribute it and/or modify it under the terms of the Artistic License 2.0.