NAME
Bio::Search::BlastUtils - Utility functions for Bio::Search:: BLAST objects
SYNOPSIS
# This module is just a collection of subroutines, not an object.
See Bio::Search::Hit::BlastHit.
DESCRIPTION
The BlastUtils.pm module is a collection of subroutines used primarily by Bio::Search::Hit::BlastHit objects for some of the additional functionality, such as HSP tiling. Right now, the BlastUtils is just a collection of methods, not an object, and it's tightly coupled to Bio::Search::Hit::BlastHit. A goal for the future is to generalize it to work based on the Bio::Search interfaces, then it can work with any objects that implements them.
AUTHOR
Steve Chervitz <sac@bioperl.org>
tile_hsps
Usage : tile_hsps(
$sbjct
);
: This is called automatically by Bio::Search::Hit::BlastHit
: during object construction or
: as needed by methods that rely on having tiled data.
Purpose : Collect statistics about the aligned sequences in a set of HSPs.
: Calculates the following data across all HSPs:
: -- total alignment
length
: -- total identical residues
: -- total conserved residues
Returns : n/a
Argument : A Bio::Search::Hit::BlastHit object
Throws : n/a
Comments :
: This method is
*strongly
* coupled to Bio::Search::Hit::BlastHit
: (it accesses BlastHit data members directly).
: TODO: Re-
write
this to the Bio::Search::Hit::HitI interface.
:
: This method performs more careful summing of data across
: all HSPs in the Sbjct object. Only HSPs that are in the same strand
: and frame are tiled. Simply summing the data from all HSPs
: in the same strand and frame will overestimate the actual
:
length
of the alignment
if
there is overlap between different HSPs
: (often the case).
:
: The strategy is to tile the HSPs and sum over the
: contigs, collecting data separately from overlapping and
: non-overlapping regions of
each
HSP. To facilitate this, the
: HSP.pm object now permits extraction of data from
sub
-sections
: of an HSP.
:
: Additional useful information is collected from the results
: of the tiling. It is possible that
sub
-sequences in
: different HSPs will overlap significantly. In this case, it
: is impossible to create a single unambiguous alignment by
: concatenating the HSPs. The ambiguity may indicate the
: presence of multiple, similar domains in one or both of the
: aligned sequences. This ambiguity is recorded using the
: ambiguous_aln() method.
:
: This method does not attempt to discern biologically
: significant vs. insignificant overlaps. The allowable amount of
: overlap can be set
with
the overlap() method or
with
the -OVERLAP
: parameter used
when
constructing the Blast & Sbjct objects.
:
: For a
given
hit, both the query and the sbjct sequences are
: tiled independently.
:
: -- If only query sequence HSPs overlap,
: this may suggest multiple domains in the sbjct.
: -- If only sbjct sequence HSPs overlap,
: this may suggest multiple domains in the query.
: -- If both query & sbjct sequence HSPs overlap,
: this suggests multiple domains in both.
: -- If neither query & sbjct sequence HSPs overlap,
: this suggests either
no
multiple domains in either
: sequence OR that both sequences have the same
: distribution of multiple similar domains.
:
: This method can deal
with
the special case of
when
multiple
: HSPs exactly overlap.
:
: Efficiency concerns:
: Speed will be an issue
for
sequences
with
numerous HSPs.
:
Bugs : Currently, tile_hsps() does not properly account
for
: the number of non-tiled but overlapping HSPs, which becomes a problem
: as overlap() grows. Large
values
overlap() may thus lead to
: incorrect statistics
for
some hits. For best results, keep overlap()
: below 5 (DEFAULT IS 2). For more about this, see the "HSP Tiling and
: Ambiguous Alignments" section in L<Bio::Search::Hit::BlastHit>.
See Also : _adjust_contigs(), Bio::Search::Hit::BlastHit
_adjust_contigs
Usage : n/a; called automatically during object construction.
Purpose : Builds HSP contigs
for
a
given
BLAST hit.
: Utility method called by _tile_hsps()
Returns :
Argument :
Throws : Exceptions propagated from Bio::Search::Hit::BlastHSP::matches()
:
for
invalid
sub
-sequence ranges.
Status : Experimental
Comments : This method does not currently support gapped alignments.
: Also, it does not keep track of the number of HSPs that
: overlap within the amount specified by overlap().
: This will lead to significant tracking errors
for
large
: overlap
values
.
See Also : tile_hsps(), Bio::Search::Hit::BlastHSP::matches
get_exponent
Usage :
&get_exponent
( number );
Purpose : Determines the power of 10 exponent of an integer, float,
: or scientific notation number.
Example :
&get_exponent
(
"4.0e-206"
);
:
&get_exponent
(
"0.00032"
);
:
&get_exponent
(
"10."
);
:
&get_exponent
(
"1000.0"
);
:
&get_exponent
(
"e+83"
);
Argument : Float, Integer, or scientific notation number
Returns : Integer representing the exponent part of the number (+ or -).
: If argument == 0 (zero),
return
value is
"-999"
.
Comments : Exponents are rounded up (less negative)
if
the mantissa is >= 5.
: Exponents are rounded down (more negative)
if
the mantissa is <= -5.
collapse_nums
Usage :
@cnums
= collapse_nums(
@numbers
);
Purpose : Collapses a list of numbers into a set of ranges of consecutive terms:
: Useful
for
condensing long lists of consecutive numbers.
: EXPANDED:
: 1 2 3 4 5 6 10 12 13 14 15 17 18 20 21 22 24 26 30 31 32
: COLLAPSED:
: 1-6 10 12-15 17 18 20-22 24 26 30-32
Argument : List of numbers sorted numerically.
Returns : List of numbers mixed
with
ranges of numbers (see above).
Throws : n/a
See Also : Bio::Search::Hit::BlastHit::seq_inds()
strip_blast_html
Usage :
$boolean
=
&strip_blast_html
( string_ref );
: This method is exported.
Purpose : Removes HTML formatting from a supplied string.
: Attempts to restore the Blast report to enable
: parsing by Bio::SearchIO::blast.pm
Returns : Boolean: true
if
string was stripped, false
if
not.
Argument : string_ref = reference to a string containing the whole Blast
: report containing HTML formatting.
Throws : Croaks
if
the argument is not a
scalar
reference.
Comments : Based on code originally written by Alex Dong Li
: (ali
@genet
.sickkids.on.ca).
: This method does some Blast-specific stripping
: (adds back a
'>'
character in front of
each
HSP
: alignment listing).
:
: THIS METHOD IS VERY SENSITIVE TO BLAST FORMATTING CHANGES!
:
: Removal of the HTML tags and accurate reconstitution of the
: non-HTML-formatted report is highly dependent on structure of
: the HTML-formatted version. For example, it assumes that first
: line of
each
alignment section (HSP listing) starts
with
a
: <a name=..> anchor tag. This permits the reconstruction of the
: original report in which these lines begin
with
a
">"
.
: This is required
for
parsing.
:
: If the structure of the Blast report itself is not intended to
: be a standard, the structure of the HTML-formatted version
: reconstitute parsable Blast reports from HTML-
format
versions
: should be considered a temporary solution.