NAME
Bio::Polloc::GroupCriteria - Rules to group loci
DESCRIPTION
Takes loci and returns groups of loci based on certain rules. If created via .bme (.cfg) files, it is defined in the [ RuleGroup ]
and [ GroupExtension ]
namespaces.
AUTHOR - Luis M. Rodriguez-R
Email lmrodriguezr at gmail dot com
LICENSE
This package is licensed under the Artistic License - see LICENSE.txt
IMPLEMENTS OR EXTENDS
APPENDIX - Methods
Methods provided by the package
new
Generic initialization method
Arguments
Returns
The
Bio::Polloc::GroupCriteria
object
source
Sets/gets the type of source loci (see Bio::Polloc::LocusI->family
target
Sets/gets the type of target loci (see Bio::Polloc::LocusI->family
locigroup
Gets/sets the input Bio::Polloc::LociGroup object containing all the loci to evaluate.
condition
Sets/gets the conditions set to evaluate.
evaluate
Compares two loci based on the defined conditions
Arguments
The first locus (a Bio::Polloc::LocusI object)
The second locus (a Bio::Polloc::LocusI object)
Returns
Boolean
Throws
Bio::Polloc::Polloc::Error if unexpected input or undefined condition, source or target
get_loci
Gets the stored loci
Note
The stored loci can also be obtained with
$object->locigroup->loci
, but this function ensures a consistent order in the loci for its evaluation.
get_locus
Get the locus with the specified index.
Arguments
The index (int, mandatory).
Returns
A Bio::Polloc::LocusI object or undef.
Note
This is a lazzy method, and should be used ONLY after
get_loci()
were called at least once. Otherwise, the order might not be the expected, and weird results would appear.
extension
Sets the conditions for group extensions.
Arguments
Array, hash or string with
-key => value
pairs. Supported values are:- -function str
-
context
-
Searches the flanking regions in the target sequence.
- -upstream int
-
Extension in number of residues upstream the feature.
- -downstream int
-
Extension in number of residues downstream the feature.
- -detectstrand bool (int)
-
Should I detect the proper strand? Otherwise, the stored strand is trusted. This is useful for non-directed features like repeats, which context is actually directed.
- -alldetected bool (int)
-
Include all detected features (even these overlapping with input features).
- -feature bool (int)
-
Should I include the feature region in the search? 0 by default.
- -lensd float
-
Number of Standar Deviations (SD) tolerated as half of the range of lengths for a feature. The average (Avg) and the standard deviation of the length are calculated based on all the stored features, and the Avg+(SD*lensd) is considered as the largest possible new feature. No minimum length constraint is given, unless explicitly set with -minlen. This argument is ignored if
-maxlen
is explicitly set. Default is 1.5. - -maxlen int
-
Maximum length of a new feature in number of residues. If zero (0) evaluates
-lensd
instead. Default is 0. - -minlen int
-
Minimum length of a new feature in number of residues. Default is 0.
- -similarity float
-
Minimum fraction of similarity to include a found region. 0.8 by default.
- -oneside bool (int)
-
Should I consider features with only one of the sides? Takes effect only if both -upstream and -downstream are defined. 0 by default.
- -algorithm str
-
blast
-
Use BLAST to search (after multiple alignment and consensus calculation of queries). Default algorithm.
hmmer
-
Use HMMer to search (after multiple alignment and
hmmbuild
of query sequences).
- -score int
-
Minimum score for either algorithms blast and hmmer. 20 by default.
- -consensusperc float
-
Minimum percentage a residue must appear in order to include it in the consensus used as query. 60 by default. Only if -algorithm blast.
- -e float
-
If
-algorithm
blast, maximum e-value. 0.1 by default. - -p str
-
If
-algorithm
blast, program used ([t]blast[npx]
). blastn by default.
Throws
Bio::Polloc::Polloc::Error if unexpected input,
extend
Extends a group based on the arguments provided by Bio::Polloc::GroupCriteria-extension>.
Arguments
- -loci Bio::Polloc::LociGroup
-
The Bio::Polloc::LociGroup containing the loci in the group to extend.
Returns
A Bio::Polloc::LociGroup object containing the updated group, i.e. the original group PLUS the extended features.
Throws
Bio::Polloc::Polloc::Error if unexpected input or weird extension definition.
build_bin
Compares all the included loci and returns the identity matrix
Arguments
Returns
A reference to a boolean 2-dimensional array (only left-down triangle)
Note
WARNING! The order of the output is not allways the same of the input. Please use
get_loci()
instead, as source features MUST be after target features in the array. Otherwise, it is not possible to have the full picture without building the full matrix (instead of half).
bin_build_groups
Builds groups of loci based on a binary matrix
Arguments
A matrix as returned by Bio::Polloc::GroupCriteria->build_bin
Returns
A 2-D arrayref.
Note
This method is intended to build groups providing information on all-vs-all comparisons. If you do not need this information, use the much more efficient Bio::Polloc::GroupCriteria->build_groups method, that relies on transitive property of groups to avoid unnecessary comparisons. Please note that this function also relies on transitivity, but gives you the option to examine all the paired comparisons and even write your own grouping function.
build_groups
This is the main method, creates groups of loci.
Arguments
- -cpus int
-
If defined, attempts to distribute the work among the specified number of cores. Warning: This parameter is experimental, and relies on
Parallel::ForkManager
. It can be used in production with certain confidence, but it is highly probable to NOT work in parallel (to avoid errors, this method ignores the command at ANY possible error).Unimplemented: This argument is currently ignored. Some algorithmic considerations must be addressed before using it. TODO.
- -advance coderef
-
A reference to a function to call at every new pair. The function is called with three arguments, the first is the index of the first locus, the second is the index of the second locus and the third is the total number of loci. Note that this function is called BEFORE running the comparison.
Returns
An arrayref of Bio::Polloc::LociGroup objects, each containing one consistent group of loci.
Note
This method is faster than combining
build_bin()
andbuild_groups_bin()
, and it should be used whenever transitivity can be freely assumed and you do not need the all-vs-all matrix for further evaluation (for example, manual inspection).
genomes
Gets the genomes of the base group of loci. This function is similar to calling
locigroup()->genomes()
, but is read-only.
INTERNAL METHODS
Methods intended to be used only within the scope of Bio::Polloc::*
_detect_border_pairs
_next_group_id
Returns an incremental ID that attempts to identify the group used as basis of extension. Please note that this method DOES NOT check if the group's ID is the right one, and it is basically intended to keep track of how many times the
extend
function has been called.
_build_subseq
- Arguments
-
All the following arguments are mandatory and must be passed in that order. The strand will be determined by the relative position of from/to:
The sequence (Bio::Seq object).
The from position (int).
The to position (int).
- Returns
-
A Bio::Seq object.
- Comments
-
This method should be located at a higher hierarchy module (Root?).
This method is static.
_search_aln_seqs
Uses an alignment to search in the sequences of the collection of genomes
Arguments
A Bio::SimpleAlign object
Returns
A 2D arrayref, where first key is an incremental and second key preserves the orrder in the structure:
["genome-key:acc", from, to, strand, score]
_feat_index2obj
Takes an index 2D matrix and returns it as the equivalent Bio::Polloc::LocusI objects
Arguments
2D matrix of integers (arrayref)
Returns
2D matrix of Bio::Polloc::LocusI objects (ref)
_grouprules_cleanup
_initialize
27 POD Errors
The following errors were encountered while parsing the POD:
- Around line 61:
Expected '=item *'
- Around line 83:
Expected '=item *'
- Around line 186:
Expected '=item *'
- Around line 200:
Expected '=item *'
- Around line 204:
Expected '=item *'
- Around line 250:
Expected '=item *'
- Around line 289:
Expected '=item *'
- Around line 293:
Expected '=item *'
- Around line 297:
Expected '=item *'
- Around line 322:
Expected '=item *'
- Around line 423:
Expected '=item *'
- Around line 467:
Expected '=item *'
- Around line 477:
Expected '=item *'
- Around line 482:
Expected '=item *'
- Around line 620:
Expected '=item *'
- Around line 630:
Expected '=item *'
- Around line 634:
Expected '=item *'
- Around line 671:
Expected '=item *'
- Around line 675:
Expected '=item *'
- Around line 679:
Expected '=item *'
- Around line 719:
Expected '=item *'
- Around line 743:
Expected '=item *'
- Around line 748:
Expected '=item *'
- Around line 953:
Expected '=item *'
- Around line 957:
Expected '=item *'
- Around line 1079:
Expected '=item *'
- Around line 1083:
Expected '=item *'