NAME
Bio::Grep::Backends::BackendI.pm - Superclass for all back-ends
SYNOPSIS
See the back-end modules for example code.
DESCRIPTION
This is the superclass for all back-ends. Don't use this class directly.
METHODS
new()
-
This function constructs a Backend object.
$sbe->settings()
-
Get the settings. This is a Bio::Grep::Container::SearchSettings object
# search for the reverse complement and allow 4 mismatches $sbe->settings->database('ATH1.cdna'); $sbe->settings->query('UGAACAGAAAGCUCAUGAGCC'); $sbe->settings->reverse_complement(1); $sbe->settings->mismatches(4);
$sbe->results()
-
Get the results. This is an array of Bio::Grep::Container::SearchResults objects.
# output the searchresults with alignments foreach my $res (@{$sbe->results}) { print $res->sequence->id . "\n"; print $res->alignment_string() . "\n\n"; }
This method is DEPRECATED. The new syntax is
# output the searchresults with alignments while ( my $res = $sbe->next_res ) { print $res->sequence->id . "\n"; print $res->alignment_string() . "\n\n"; } if you need an array with all search results, you should use following code: my @results; while ( my $res = $sbe->next_res ) { push @results, $res; }
$sbe->features()
-
Get available features. This is a hash. Valid features are MISMATCHES, GUMISMATCHES, EDITDISTANCE, INSERTIONS, DELETIONS, FILTERS, NATIVE_ALIGNMENTS, PROTEINS, UPSTREAM, DOWNSTREAM, MAXHITS, COMPLETE, QUERYFILE, SHOWDESC, QSPEEDUP, EVALUE and PERCENT_IDENTITY.
if (defined($sbe->features->{GUMISMATCHES})) { # $sbe->settings->gumismatches(0); $sbe->settings->gumismatches(0.5); } else { print "\nBack-end does not support wobble pairs\n"; }
$sbe->get_alphabet_of_database($db)
-
Returns 'dna' if the specified database is a DNA database, 'protein' otherwise.
ABSTRACT METHODS
Every back-end must implement this methods.
$sbe->search
-
This function searches for the query specified in the Bio::Grep::Container::SearchSettings object
$sbe->settings
$sbe->search();
$sbe->next_res
-
Returns next result. Bio::Grep::Container::SearchResult object
while ( my $res = $sbe->next_res ) { # output result }
$sbe->get_sequences
-
This function returns all sequences with the ids in the specified array reference as a Bio::SeqIO object.
my $seqio = $sbe->get_sequences([$id]); my $string; my $stringio = IO::String->new($string); my $out = Bio::SeqIO->new('-fh' => $stringio, '-format' => 'fasta'); while ( my $seq = $seqio->next_seq() ) { # write the sequences in a string $out->write_seq($seq); } print $string;
$sbe->get_databases
-
Returns a hash with all available databases. The keys are the filenames, the values are descriptions (or the filename if no description is available).
Descriptions can be set in info files. For example, if you indexed file ATH1.cdna, Vmatch and HyPA construct a lot of ATH1.cdna.* files. Now simply create a file ATH1.cdna.nfo and write a description in that file. The function
generate_database_out_of_fastafile
will create this file for you if you add a description as second argument.my %local_dbs_description = $sbe->get_databases(); my @local_dbs = sort keys %local_dbs_description; # take first available database $sbe->settings->database($local_dbs[0]);
$sbe->generate_database_out_of_fastafile($fastafile)
-
Copies the specified file in the datapath directory (
$sbe->settings->datapath
) and generates a database (HyPa/Vmatch: a suffix array). You can get the available databases with$sbe->get_databases()
. You have to do this only once. Vmatch and HyPa need a lot of RAM for the construction of their enhanced suffix arrays.$sbe->generate_database_out_of_fastafile('ATH1.cdna', 'AGI Transcripts');
$sbe->available_sort_modes()
-
Returns a hash with the available result sort modes. Keys are the modes you can set with $sbe->settings->sort($mode), values a short description.
INTERNAL METHODS
Only back-ends should call them directly.
_check_search_settings
-
Performs some basic error checking. Important security checks, because we use system(). So we should check, if we get what we assume.
Because every back-end should call this function at the top of its search function, we clean things like old search results here up
_prepare_query
-
Another important method that every back-end must call. Prepares the query, for example calculating the reverse complement if necessary, returns the prepared query. settings->query is unchanged!
_copy_fasta_file_and_create_nfo
-
The generate_database_out_of_fastafile implementation of your back-end class should use this function to copy the specified Fasta file to the data directory and to generate an info file, containing the description of the Fasta file.
_get_alignment( $seq_query, $seq_subject )
-
Calculates and returns an alignment of two Bio::Seq objects. Requires EMBOSS and bioperl-run.
_get_databases($suffix)
-
This function searches the data directory for files ending with $suffix and returns this list of files in an array
Substitutes $suffix with .nfo from all found files and searches for an info file with that name. The content of that file will be used as description. When no file is found, the description will be the filename without the suffix:
%dbs = _get_databases('.al1'); # finds file ATH1.cdna.al1, searches for ATH1.cdna.nfo print $dbs{'ATH1.cdna'}; # prints content of ATH1.cdna.nfo or 'ATH1.cdna'
_get_sequences_from_bio_index($id)
-
Hypa and Agrep back-end use Bio::Index for sequence id queries (implemented in this this method. Returns a Bio::SeqIO object like abstract the method get_sequences should.
DIAGNOSTICS
Bio::Root::IOException
-
It was not possible to copy the Fasta file in
generate_database_out_of_fastafile
into the data directory or it was not possible to write to the Fasta file in the data directory. Check permissions, data path and free disk space. Bio::Root::FileOpenException
-
It was not possible to open the Fasta file in the data directory for writing. Check permissions.
Bio::Root::BadParameter
-
You started the search with invalid search settings.
Sort mode not valid
-
The specified sort mode ($sbe->settings->sort) is not valid. You can get all valid sort modes with $sbe->available_sort_modes() See Bio::Grep::Backends::Vmatch, Bio::Grep::Backends::Hypa, Bio::Grep::Backends::Agrep for details.
Database not defined
-
You forgot to define a database. You have to build a database with $sbe->generate_database_out_of_fastafile (once) and set it with $sbe->settings->database(). Example:
$sbe->generate_database_out_of_fastafile('ATH1.cdna"); $sbe->settings->database('ATH1.cdna');
Database not valid (insecure characters)
-
The database name is not valid. Allowed characters are 'a-z', 'A-z','0-9', '.' , '-' and '_'.
FILES
Requires EMBOSS and Bio::Factory::EMBOSS for the Needleman-Wunsch local alignment implementation from EMBOSS. The internal method _get_alignment($seq_a, $seq_b)
can than calculate an alignment for back-ends that do not generate a alignment (like Hypa, agrep).
SEE ALSO
Bio::Grep::Container::SearchSettings Bio::Grep::Container::SearchResults
AUTHOR
Markus Riester, <mriester@gmx.de>
LICENCE AND COPYRIGHT
Based on Weigel::Seach v0.13
Copyright (C) 2005-2006 by Max Planck Institute for Developmental Biology, Tuebingen.
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
DISCLAIMER OF WARRANTY
BECAUSE THIS SOFTWARE IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE SOFTWARE, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE SOFTWARE "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE SOFTWARE IS WITH YOU. SHOULD THE SOFTWARE PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR, OR CORRECTION.
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE SOFTWARE AS PERMITTED BY THE ABOVE LICENCE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE SOFTWARE (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE SOFTWARE TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.