NAME
Bio::DB::GFF::Adaptor::dbi::mysqlopt -- Optimized Bio::DB::GFF adaptor for mysql
SYNOPSIS
See Bio::DB::GFF
DESCRIPTION
This adaptor is similar to Bio::DB::GFF::Adaptor::mysqlopt, except that it implements several optimizations:
- 1. Binning
-
It uses a hierarchical binning scheme to dramatically accelerate feature queries that use positional information.
- 2. DNA fetching
-
Because mysql is slow when fetching substrings out of large text BLOBs, this adaptor uses Bio::DB::Fasta to fetch DNA segments rapidly. out of FASTA files.
- 3. An ACEDB interface
-
Features can be linked to ACEDB objects, allowing this module to be used as a replacement for the Ace::Sequence module.
The schema is identical to Bio::DB::GFF::Adaptor::dbi, except for the fdata table:
fid feature ID (integer)
fref reference sequence name (string)
fstart start position relative to reference (integer)
fstop stop postion relative to reference (integer)
fbin bin containing this feature (float)
ftypeid feature type ID (integer)
fscore feature score (float); may be null
fstrand strand; one of "+" or "-"; may be null
fphase phase; one of 0, 1 or 2; may be null
gid group ID (integer)
ftarget_start for similarity features, the target start position (integer)
ftarget_stop for similarity features, the target stop position (integer)
The only difference is the "fbin" field, which indicates the interval in which the feature is contained. This module uses a hierarchical set of bins, the smallest of which are 1 kb, and the largest is 100 megabases.
In the call to initialize() you can set the following options:
-minbin minimum value to use for binning
-maxbin maximum value to use for binning
-straight_join_limit
size of range over which it is faster to force mysql to use the range for indexing
-minbin and -maxbin indicate the minimum and maximum sizes of features, and are important for range query optimization. They are set at reasonable values -- in particular, the maximum bin size is set to 100 megabases. Do not change them unless you know what you are doing.
new
Title : new
Usage : $db = Bio::DB::GFF->new(@args)
Function: create a new adaptor
Returns : a Bio::DB::GFF object
Args : see below
Status : Public
The new constructor is identical to the "dbi" adaptor's new() method, except that the prefix "dbi:mysql" is added to the database DSN identifier automatically if it is not there already.
Argument Description
-------- -----------
-dsn the DBI data source, e.g. 'dbi:mysql:ens0040' or "ens0040"
-fasta path to a directory containing FASTA files for this database
(e.g. "/usr/local/share/fasta")
-acedb an acedb URL to use when converting features into ACEDB
objects (e.g. sace://localhost:2005)
-user username for authentication
-pass the password for authentication
-minbin minimum value to use for binning
-maxbin maximum value to use for binning
The path indicated by -fasta must be writable by the current process. This is needed in order to build an index of the fasta files.
-minbin and -maxbin indicate the minimum and maximum sizes of features, and are important for range query optimization. They are set at reasonable values -- in particular, the maximum bin size is set to 100 megabases. Do not change them unless you know what you are doing.
freshen_ace
Title : freshen
Usage : $flag = Bio::DB::GFF->freshen_ace;
Function: Refresh internal acedb handle
Returns : flag if correctly freshened
Args : none
Status : Public
ACeDB has an annoying way of timing out, leaving dangling database handles. This method will invoke the ACeDB reopen() method, which causes dangling handles to be refreshed. It has no effect if you are not using ACeDB to create ACeDB objects.