NAME
Bio::Tools::WWW - Bioperl manager for web resources related to biology.
SYNOPSIS
Object Creation
use Bio::Tools qw(:obj);
$pdb = $BioWWW->home_url('pdb');
There is no need to create a new Bio::Tools::WWW.pm object when the :obj
tag is used. This tag will import the static $BioWWW object created by Bio::Tools::WWW.pm into your name space. This saves you from having to call new Bio::Tools::WWW
.
You are free to not use the :obj tag and create the object as you like, but a Bio::Tools::WWW object is not configurable; any given script only needs a single copy.
INSTALLATION
This module is included with the central Bioperl distribution:
http://bio.perl.org/Core/Latest
ftp://bio.perl.org/pub/DIST
You also need to define URLs for the following variables in this package:
$Not_found_url : Generic page to show in place of a 404 error.
$Tmp_url : Web-accessible site that is Used for scripts that
need to generate temporary, web-accessible files.
The files need not necessarily be HTML files, but
being on the same disk as the server will permit
faster IO from server scripts.
DESCRIPTION
Bio::Tools::WWW is primarily a URL broker for a select set of sites related to bioinformatics/genome analysis. It definitely represents a biased, unexhaustive set. It might be more accurate to call this module "Bio::Tools::URL.pm". But this module does handle some non-URL things and it may do more of this in the future. Having one module to cover all biologically relevant web utilities makes it more convenient, especially at this early stage of development.
Maintaining accurate URLs over time can be challenging as new web sites spring up and old sites are re-organized. Because of this fact, the URLs in this module are not guaranteed to be correct or exhaustive and will require periodic updating.
URL Management
By keeping URL management within Bio::Tools::WWW.pm, other generic modules can easily access a variety of different web sites without having to know about a potential multitude of specific modules specialized for one database or another. An alternative approach would be to have addresses defined within modules specialized for different web sites. This, however, may create maintenance headaches when updating these addresses.
Complex Websites
Websites with complex datasets may require special treatment within this module. As an example, URLs for the Saccharomyces Genome Database are clustered separately in this module, due to (1) the different ways to access information at this database and (2) the familiarity of the developer with this database. The Bio::SGD::WWW.pm inherits from Bio::Tools::WWW.pm to permit access to the URLs provided by Bio::Tools::WWW.pm and to SGD-specific HTML and images.
The organization of Bio::Tools::WWW.pm is expected to evolve as websites get born, die, and mutate their APIs.
SEE ALSO
http://bio.perl.org/Projects/modules.html - Online module documentation
http://bio.perl.org/ - Bioperl Project Homepage
FEEDBACK
Mailing Lists
User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one of the Bioperl mailing lists. Your participation is much appreciated.
bioperl-l@bioperl.org - General discussion
http://www.bioperl.org/MailList.shtml - About the mailing lists
Reporting Bugs
Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via email or the web:
bioperl-bugs@bio.perl.org
http://bugzilla.bioperl.org/
AUTHOR
Steve Chervitz, sac@bioperl.org
VERSION
Bio::Tools::WWW.pm, 0.014
COPYRIGHT
Copyright (c) 1996-98 Steve Chervitz. All Rights Reserved. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
APPENDIX
Methods beginning with a leading underscore are considered private and are intended for internal use by this module. They are not considered part of the public interface and are described here for documentation purposes only.
home_url
Usage : $BioWWW->home_url(<string>)
Purpose : To obtain the homepage URL for a biological database or resource.
Returns : String containing the URL (including "http://")
Argument : String
: Currently acceptable arguments are:
: bioperl bioperl-schema biomoo bsm ebi emotif entrez
: expasy mips mmdb ncbi pir pfam pdb geneQuiz
: molMov pubmed sacch3d sgd scop swissProt webmol ypd
Throws : Warns if argument cannot be resolved to a URL.
Comments : The URLs listed here do not represent a complete list.
: Expect this to evolve and grow with time.
See Also : search_url()
search_url
Usage : $BioWWW->search_url(<string>)
Purpose : To provide a URL stem for a search engine at a biological database
: or resource.
Returns : String containing the URL (including "http://")
Argument : String
: Currently acceptable arguments are:
: 3db embl cath ec1 ec2 ec3 emotif_id entrez gb1 gb2
: gb3 gb4 gb5 pdb medline mmdb pdb pdb_coord pfam pir_acc
: pdbSum molMov swpr swModel swprSearch scop scop_pdb scop_data
: ypd
Throws : Warns if argument cannot be resolved to a URL.
Comments : Unlike the homepage URLs, this method does not return a complete
: URL but a stem which must be further modified, typically by
: appending data to it, before it can be used. The data appended
: depends on the specific URL; typically, it is a database ID or
: other unique identifier.
: The requirements for each URL will be described here eventually.
:
: The URLs listed here do not represent a complete list.
: Expect this to evolve and grow with time.
:
: Given this complexity, it may be useful to provide special methods
: for these different URLs. This would however result in an
: explosion of methods that might make this module less
: maintainable and harder to use.
See Also : home_url()
stem_url
Usage : $BioWWW->stem_url(<string>)
Purpose : To obtain the minimal stem URL for searching a biological database or resource.
Returns : String containing the URL (including "http://")
Argument : String
: Currently acceptable arguments are:
: emotif entrez pdb
Throws : Warns if argument cannot be resolved to a URL.
Comments : The URLs stems returned by this method are much more minimal than
: this provided by search_url(). Use of these stems requires knowledge
: of the CGI scripts which they invoke.
See Also : search_url()
viewer_url
Usage : $BioWWW->viewer_url(<string>)
Purpose : To obtain the stem URL for a 3D viewer (RasMol, WebMol, Cn3D)
Returns : String containing the URL (including "http://")
Argument : String
: Currently acceptable arguments are:
: rasmol webmol cn3d java (java is an alias for webmol)
Throws : Warns if argument cannot be resolved to a URL.
Comments : The 4-letter Brookhaven PDB identifier must be appended to the
: URL provided by this method.
: The URLs listed here do not represent a complete list.
: Expect this to evolve and grow with time.
not_found_url
Usage : $BioWWW->not_found_url()
Purpose : To obtain the URL for a web page to be shown in place of a 404 error.
Returns : String containing the URL (including "http://")
Argument : n/a
Throws : n/a
Comments : This URL should be customized as desired.
tmp_url
Usage : $BioWWW->tmp_url()
Purpose : To obtain the URL for a temporary, web-accessible directory.
Returns : String containing the URL (including "http://")
Argument : n/a
Throws : n/a
Comments : This URL should be customized as desired.
search_link
Usage : $BioWWW->search_link(<site>, <value>, <text>)
Purpose : Wrapper for search_url() that returns the URL within an HTML anchor.
Returns : String containing the HTML anchor ( qq|<A HREF="http://..."</A>|)
Argument : <site> = string to be used as argument for search_url()
: <value> = string to be appended to the search URL stem.
: <text> = string to be shown as the link text (default = <value>).
Throws : n/a
Status : Experimental
See Also : search_url()
viewer_link
Usage : $BioWWW->viewer_link(<site>, <value>, <text>)
Purpose : Wrapper for viewer_url() that returns the complete URL within an HTML anchor.
Returns : String containing the HTML anchor ( qq|<A HREF="http://..."</A>|)
Argument : <site> = string to be used as argument for viewer_url()
: <value> = string to be appended to the viewer URL stem.
: <text> = string to be shown as the link text (default = <value>).
Throws : n/a
Status : Experimental
See Also : viewer_url()
html
Usage : $BioWWW->html(<string>)
Purpose : To obtain HTML-formatted text for frequently needed web-page messages.
Returns : String containing the HTML anchor ( qq|<A HREF="http://..."</A>|)
Argument : String.
: Currently acceptable arguments are:
: authority (mailto: link for webmaster; shows e-mail address as link)
: notify (wraps mailto:authority link with text for link "please notify us")
: ourFault ("this problem is our fault. If it persists <notify-link>")
: trouble (same as ourFault but doesn't blame us for the problem)
: techDiff ("we are experiencing technical difficulties. Please stand by.")
Throws : n/a
Comments : The authority (webmaster) is imported from the Bio::Root::Global.pm
: module. The value for $AUTHORITY should be set there, or
: customize this module so that it doesn't use Bio::Root::Global.pm.
sgd_url
Usage : $BioWWW->sgd_url(<string>)
Purpose : To obtain the webpage URL or search stem for SGD.
Returns : String containing the URL (including "http://")
Argument : String
: Currently acceptable arguments (TODO).
Throws : Warns if argument cannot be resolved to a URL.
Comments : This accessor is specialized for the Saccharomyces Genome Database.
: It is possible that it will be moved to SGD::WWW.pm in the future.
See Also : search_url()
s3d_url
Usage : $BioWWW->s3d_url(<string>)
Purpose : To obtain the webpage URL or search stem for Sacch3D.
Returns : String containing the URL (including "http://")
Argument : String
: Currently acceptable arguments (TODO).
Throws : Warns if argument cannot be resolved to a URL.
Comments : This accessor is specialized for the Saccharomyces Genome Database.
: It is possible that it will be moved to SGD::WWW.pm in the future.
See Also : search_url()
sgd_stem_url
Usage : $BioWWW->sgd_stem_url(<string>)
Purpose : To obtain the minimal stem URL for a SGD/Sacch3D CGI script.
Returns : String containing the URL (including "http://")
Argument : String
: Currently acceptable arguments (TODO).
Throws : Warns if argument cannot be resolved to a URL.
Comments : This accessor is specialized for the Saccharomyces Genome Database.
: It is possible that it will be moved to SGD::WWW.pm in the future.
See Also : search_url()
s3d_link
Usage : $BioWWW->s3d_link(<site>, <value>, <text>)
Purpose : Wrapper for s3d_url() that returns the complete URL within an HTML anchor.
Returns : String containing the URL (including "http://")
Argument : <site> = string to be used as argument for s3d_url()
: <value> = string to be appended to the s3d URL stem.
: <text> = string to be shown as the link text (default = <value>).
Throws : n/a
Status : Experimental
Comments : This accessor is specialized for the Saccharomyces Genome Database.
: It is possible that it will be moved to SGD::WWW.pm in the future.
See Also : s3d_url(), sgd_link()
sgd_link
Usage : $BioWWW->sgd_link(<site>, <value>, <text>)
Purpose : Wrapper for sgd_url() that returns the complete URL within an HTML anchor.
Returns : String containing the URL (including "http://")
Argument : <site> = string to be used as argument for sgd_url()
: <value> = string to be appended to the sgd URL stem.
: <text> = string to be shown as the link text (default = <value>).
Throws : n/a
Status : Experimental
Comments : This accessor is specialized for the Saccharomyces Genome Database.
: It is possible that it will be moved to SGD::WWW.pm in the future.
See Also : sgd_url(), s3d_link()
start_html
Usage : $BioWWW->start_html()
Purpose : Prints the "Content-type: text/html\n\n<HTML>\n" header.
Returns : n/a; This method prints the Content-type string shown above.
Argument : n/a
Throws : n/a
Status : Experimental
Comments : This method prevents redundant invocations thus avoiding th
: accidental printing of the "content-type..." on the page.
: If using L. Stein's CGI.pm, this is similar to $query->header()
: (Does CGI.pm prevent redundant invocation?)
redirect
Usage : $BioWWW->redirect(<string>)
Purpose : Prints the header needed to redirect a web browser to a supplied URL.
Returns : n/a; Prints the redirection header.
Argument : String containing the URL to be redirected to.
Throws : n/a
Status : Experimental
pre
Usage : $BioWWW->pre("text to be pre-formatted");
Purpose : To produce HTML for text that is not to be formated by the brower.
Returns : String containing the "<pre>" formatted html.
Argument : n/a
Throws : n/a
Status : Experimental
strip_html
Usage : $boolean = &strip_html( string_ref, [fast] );
Purpose : Removes HTML formatting from a supplied string.
Returns : Boolean: true if string was stripped, false if not.
Argument : string_ref = reference to a string containing the whole
: web page to be stripped.
: fast = a non-zero value. Optional. If set, a faster
: but perhaps less thorough procedure is used for
: stripping. Default = not fast.
Throws : Exception if the argument is not a scalar reference.
Comments : Based on code originally written by Alex Dong Li
: (ali@genet.sickkids.on.ca).
: This is a more generic version of the function that appears
: in Bio::Tools::Blast::HTML.pm
: This version does not perform any Blast-specific stripping.
:
: This employs a simple method for removing tags that
: will fail under following conditions:
: 1) if quoted > appears in a tag (does this ever happen?)
: 2) if a tag is split over multiple lines and this method is
: used to process one line at a time.
:
: Without fast mode, large HTML files can take exceedingly long times to
: strip (e.g., 1Meg file with many tags can take 10 minutes versus 5 seconds
: in fast mode. Try the swissprot yeast table). If you know the HTML to be
: well-behaved (i.e., tags are not split across mutiple lines), use fast
: mode for large, dense files.
FOR DEVELOPERS ONLY
Data Members
An instance of Bio::Tools::WWW.pm is a blessed reference to a hash containing all or some of the following fields:
FIELD VALUE
--------------------------------------------------------------
_started_html Defined the on the initial invocation of start_html()
to avoid duplicate printing out the "Content-type..." header.