NAME
dbfetch - generic CGI program to retrieve biological database entries in various formats and styles (using SRS)
SYNOPSIS
# URL examples:
# prints the interactive page with the HTML form
http://www.ebi.ac.uk/cgi-bin/dbfetch
# for backward compatibility, implements <ISINDEX>
# single entry queries defaulting to EMBL sequence database
http://www.ebi.ac.uk/cgi-bin/dbfetch?J00231
# retrieves one or more entries in default format
# and default style (html)
# returns nothing for IDs which are not valid
http://www.ebi.ac.uk/cgi-bin/dbfetch?id=J00231.1,hsfos,bum
# retrieve entries in fasta format without html tags
http://www.ebi.ac.uk/cgi-bin/dbfetch?format=fasta&style=raw&id=J00231,hsfos,bum
# retrieve a raw Ensembl entry
http://www.ebi.ac.uk/cgi-bin/dbfetch?db=ensembl&style=raw&id=AL122059
DESCRIPTION
This program generates a page allowing a web user to retrieve database entries from a local SRS in two styles: html and raw. Other database engines can be used to implement te same interfase.
At this stage, on unique identifier queries are supported. Free text searches returning more than one entry per query term are not in these specs.
In its default setup, type one or more EMBL accession numbers (e.g. J00231), entry name (e.g. BUM) or sequence version into the seach dialog to retieve hypertext linked enties.
Note that for practical reasons only the first 50 identifiers submitted are processed.
Additional input is needed to change the sequence format or suppress the HTML tags. The styles are html and raw. In future there might be additional styles (e.g. xml). Currently XML is a 'raw' format used by Medline. Each style is implemented as a separate subroutine.
MAINTANENCE
A new database can be added simply by adding a new entry in the global hash %IDS. Additionally, if the database defines new formats add an entry for each of them into the hash %IDMATCH. After modifying the hash, run this script from command line for some sanity checks with parameter debug set to true (e.g. dbfetch debug=1 ).
Finally, the user interface needs to be updated in the print_prompt subroutine.
VERSIONS
Version 3 uses EBI SRS server 6.1.3. That server is able to merge release and update libraries automatically which makes this script simpler. The other significant change is the way sequence versions are indexed. They used to be indexed together with the string accession (e.g. 'J00231.1'). Now they are indexed as integers (e.g. '1').
Version 3.1 changes the command line interface. To get the debug information use attribute 'debug' set to true. Also, it uses File::Temp module to create temporary files securely.
Version 3.2 fixes fasta format parsing to get the entry id.
Version 3.3. Adds RefSeq to the database list.
Version 3.4. Make this compliant to BioFetch specs.
AUTHOR - Heikki Lehvaslaiho
Email: heikki@ebi.ac.uk Address:
EMBL Outstation, European Bioinformatics Institute
Wellcome Trust Genome Campus, Hinxton
Cambs. CB10 1SD, United Kingdom
print_prompt
Title : print_prompt
Usage :
Function: Prints the default page with the query form
to STDOUT (Web page)
Args :
Returns :
protect
Title : protect
Usage : $value = protect($q->param('id'));
Function:
Removes potentially dangerous characters from the input
string. At the same time, converts word separators into a
single space character.
Args : scalar, string with one or more IDs or accession numbers
Returns : scalar
input_error
Title : input_error
Usage : input_error($q, 'html', "Error message");
Function: Standard error message behaviour
Args : reference to the CGI object
scalar, string to display on input error.
Returns : scalar
no_entries
Title : no_entries
Usage : no_entries($q, "Message");
Function: Standard behaviour when no entries found
Args : reference to the CGI object
scalar, string to display on input error.
Returns : scalar
raw
Title : raw
Usage :
Function: Retrieves a single database entry in plain text
Args : scalar, an ID
scaler, format
Returns : scalar
html
Title : html
Usage :
Function: Retrieves a single database entry with HTML
hypertext links in place. Limits retieved enties to
ones with correct version if the string has '.' in it.
Args : scalar, a UID
scalar, format
Returns : scalar
xml
Title : xml
Usage :
Function: Retrieves an entry formatted as XML
Args : array, UID
scalar, format
Returns : scalar
debugging
Title : debugging
Usage : 'perl dbfetch'
Function:
Performs sanity checks on global hash %IDS when this script
is run from command line. %IDS holds the description of
formats and other crusial info for each database accessible
through the program.
Note that hash key 'version' is not tested as it should
only be in sequence databases.
Args : none
Returns : error messages to STDOUT