NAME

dbfetch - generic CGI program to retrieve biological database entries in various formats and styles (using SRS)

SYNOPSIS

# URL examples:

# prints the interactive page with the HTML form
http://www.ebi.ac.uk/cgi-bin/dbfetch

# for backward compatibility, implements <ISINDEX>
# single entry queries defaulting to EMBL sequence database
http://www.ebi.ac.uk/cgi-bin/dbfetch?J00231

# retrieves one or more entries in default format
# and default style (html)
# returns nothing for IDs which are not valid
http://www.ebi.ac.uk/cgi-bin/dbfetch?id=J00231.1,hsfos,bum

# retrieve entries in fasta format without html tags
http://www.ebi.ac.uk/cgi-bin/dbfetch?format=fasta&style=raw&id=J00231,hsfos,bum

# retrieve a raw Ensembl entry
http://www.ebi.ac.uk/cgi-bin/dbfetch?db=ensembl&style=raw&id=AL122059

DESCRIPTION

This program generates a page allowing a web user to retrieve database entries from a local SRS in two styles: html and raw. Other database engines can be used to implement te same interfase.

At this stage, on unique identifier queries are supported. Free text searches returning more than one entry per query term are not in these specs.

In its default setup, type one or more EMBL accession numbers (e.g. J00231), entry name (e.g. BUM) or sequence version into the seach dialog to retieve hypertext linked enties.

Note that for practical reasons only the first 50 identifiers submitted are processed.

Additional input is needed to change the sequence format or suppress the HTML tags. The styles are html and raw. In future there might be additional styles (e.g. xml). Currently XML is a 'raw' format used by Medline. Each style is implemented as a separate subroutine.

MAINTANENCE

A new database can be added simply by adding a new entry in the global hash %IDS. Additionally, if the database defines new formats add an entry for each of them into the hash %IDMATCH. After modifying the hash, run this script from command line for some sanity checks with parameter debug set to true (e.g. dbfetch debug=1 ).

Finally, the user interface needs to be updated in the print_prompt subroutine.

VERSIONS

Version 3 uses EBI SRS server 6.1.3. That server is able to merge release and update libraries automatically which makes this script simpler. The other significant change is the way sequence versions are indexed. They used to be indexed together with the string accession (e.g. 'J00231.1'). Now they are indexed as integers (e.g. '1').

Version 3.1 changes the command line interface. To get the debug information use attribute 'debug' set to true. Also, it uses File::Temp module to create temporary files securely.

Version 3.2 fixes fasta format parsing to get the entry id.

Version 3.3. Adds RefSeq to the database list.

Version 3.4. Make this compliant to BioFetch specs.

AUTHOR - Heikki Lehvaslaiho

Email: heikki@ebi.ac.uk Address:

EMBL Outstation, European Bioinformatics Institute
Wellcome Trust Genome Campus, Hinxton
Cambs. CB10 1SD, United Kingdom
Title   : print_prompt
Usage   :
Function: Prints the default page with the query form
          to STDOUT (Web page)
Args    :
Returns :

protect

 Title   : protect
 Usage   : $value = protect($q->param('id'));
 Function:

           Removes potentially dangerous characters from the input
	   string.  At the same time, converts word separators into a
	   single space character.

 Args    : scalar, string with one or more IDs or accession numbers
 Returns : scalar

input_error

Title   : input_error
Usage   : input_error($q, 'html', "Error message");
Function: Standard error message behaviour
Args    : reference to the CGI object
          scalar, string to display on input error.
Returns : scalar

no_entries

Title   : no_entries
Usage   : no_entries($q, "Message");
Function: Standard behaviour when no entries found
Args    : reference to the CGI object
          scalar, string to display on input error.
Returns : scalar

raw

Title   : raw
Usage   :
Function: Retrieves a single database entry in plain text
Args    : scalar, an ID
          scaler, format
Returns : scalar

html

Title   : html
Usage   :
Function: Retrieves a single database entry with HTML
          hypertext links in place. Limits retieved enties to 
          ones with correct version if the string has '.' in it.
Args    : scalar, a UID
          scalar, format
Returns : scalar

xml

Title   : xml
Usage   : 
Function: Retrieves an entry formatted as XML
Args    : array, UID
          scalar, format 
Returns : scalar

debugging

Title   : debugging
Usage   : 'perl dbfetch'
Function:

          Performs sanity checks on global hash %IDS when this script
          is run from command line. %IDS holds the description of
          formats and other crusial info for each database accessible
          through the program.

          Note that hash key 'version' is not tested as it should 
          only be in sequence databases.

Args    : none
Returns : error messages to STDOUT