NAME

Bio::DB::Query::HIVQuery - Query interface to the Los Alamos HIV Sequence Database

SYNOPSIS

    $q = new Bio::DB::Query::HIVQuery(" C[subtype] ZA[country] CXCR4[coreceptor] ");
    $q = new Bio::DB::Query::HIVQuery(
         -query=>{'subtype'=>'C', 
                  'country'=>'ZA', 
                  'coreceptor'=>'CXCR4'});

    $ac = $q->get_annotations_by_id(($q->ids)[0]);
    $ac->get_value('Geo', 'country')                    # returns 'SOUTH AFRICA'

    $db = new Bio::DB::HIV();
    $seqio = $db->get_Stream_by_query($q);              # returns annotated Bio::Seqs 

    # get subtype C sequences from South Africa and Brazil, 
    # with associated info on patient health, coreceptor use, and 
    # infection period:

    $q = new Bio::DB::Query::HIVQuery(
         -query => {
                    'query' => {'subtype'=>'C',
		    'country'=>['ZA', 'BR']},
                    'annot' => ['patient_health', 
                                'coreceptor', 
                                'days_post_infection']
                    });
	

DESCRIPTION

Bio::DB::Query::HIVQuery provides a query-like interface to the cgi-based Los Alamos National Laboratory (LANL) HIV Sequence Database. It uses Bioperl facilities to capture both sequences and annotations in batch in an automated and computable way. Use with Bio::DB::HIV to create Bio::Seq objects and annotated Bio::SeqIO streams.

Query format

The interface implements a simple query language emulation that understands AND, OR, and parenthetical nesting. The basic query unit is

(match1 match2 ...)[fieldname]

Sequences are returned for which fieldname equals match1 OR match2 OR .... These units can be combined with AND, OR and parentheses. For example:

(B, C)[subtype] AND (2000, 2001, 2002, 2003)[year] AND ((CN)[country] OR (ZA)[country])

which can be shortened to

(B C)[subtype] (2000 2001 2002 2003)[year] (CN ZA)[country]

The user can specify annotation fields, that do not restrict the query, but arrange for the return of the associated field data for each sequence returned. Specify annotation fields between curly braces, as in:

(B C)[subtype] 2000[year] {country cd4_count cd8_count}

Annotations can be accessed off the query using methods described in APPENDIX.

Hash specifications for query construction

Single query specifications can be made as hash references provided to the -query argument of the constructor. There are two forms:

-query => { 'country'=>'BR', 'phenotype'=>'NSI', 'cd4_count'=>'Any' }

equivalent to

-query => [ 'country'=>'BR', 'phenotype'=>'NSI', 'cd4_count'=>'Any' ]

or

-query => { 'query' => {'country'=>'BR', 'phenotype'=>'NSI'},
            'annot' => ['cd4_count'] }

In both cases, the CD4 count is included in the annotations returned, but does not restrict the rest of the query.

To 'OR' multiple values of a field, use an anonymous array ref:

-query => { 'country'=>['ZA','BR','NL'], 'subtype'=>['A', 'C', 'D'] }

Valid query field names

An attempt was made to make the query field names natural and easy to remember. Aliases are specified in an XML file (lanl-schema.xml) that is part of the distribution. Custom field aliases can be set up by modifying this file.

An HTML cheatsheet with valid field names, aliases, and match data can be generated from the XML by using hiv_object->help('help.html'). A query can also be validated locally before it is unleashed on the server; see below.

Annotations

LANL DB annotations have been organized into a number of natural groupings, tagged Geo, Patient, Virus, and <StdMap>. After a successful query, each id is associated with a tree of Bio::Annotation::SimpleValue objects. These can be accessed with methods get_value() and put_value() described in APPENDIX.

Delayed/partial query runs

Accessing the LANL DB involves multiple HTTP requests. The query can be instructed to proceed through all (the default) or only some of them, using the named parameter RUN_OPTION.

To validate a query locally, use

$q = new Bio::DB::Query::HIVQuery( -query => {...}, -RUN_OPTION=>0 )

which will throw an exception if a field name or option is invalid.

To get a query count only, you can save a server hit by using

$q = new Bio::DB::Query::HIVQuery( -query => {...}, -RUN_OPTION=>1 )

and asking for $q->count. To finish the query, do

$q->_do_query(2)

which picks up where you left off.

-RUN_OPTION=>2, the default, runs the full query, returning ids and annotations.

Query re-use

You can clear the query results, retaining the same LANL session and query spec, by doing $q->_reset. Change the query, and rerun with $q->_do_query($YOUR_RUN_OPTION).

FEEDBACK

Mailing Lists

User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to the Bioperl mailing list. Your participation is much appreciated.

bioperl-l@bioperl.org                  - General discussion
http://bioperl.org/wiki/Mailing_lists  - About the mailing lists

Reporting Bugs

Report bugs to the Bioperl bug tracking system to help us keep track of the bugs and their resolution. Bug reports can be submitted via the web:

http://bugzilla.open-bio.org/

AUTHOR - Mark A. Jensen

Email maj@fortinbras.us

CONTRIBUTORS

APPENDIX

The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _

Constructor

new

Title   : new
Usage   : my $hiv_query = new Bio::DB::Query::HIVQuery();
Function: Builds a new Bio::DB::Query::HIVQuery object,
          running a sequence query against the Los Alamos
          HIV sequence database
Returns : an instance of Bio::DB::Query::HIVQuery
Args    :

QueryI compliance

count

Title   : count
Usage   : $hiv_query->count($newval)
Function: return number of sequences found
Example : 
Returns : value of count (a scalar)
Args    : on set, new value (a scalar or undef, optional)
Note    : count warns if it is accessed for reading before query
          has been executed to at least level 1

ids

Title   : ids
Usage   : $hiv_query->ids($newval)
Function: LANL ids of returned sequences 
Example : 
Returns : value of ids (an arrayref of sequence accessions/ids)
Args    : on set, new value (an arrayref or undef, optional)

query

Title   : query
Usage   : $hiv_query->query
Function: Get/set the submitted query hash or string
Example :
Returns : hashref or string
Args    : query in hash or string form (see DESCRIPTION)

Bio::DB::Query::HIVQuery specific methods

help

Title   : help
Usage   : $hiv_query->help("help.html")
Function: get html-formatted listing of valid fields/aliases/options
          based on current schema xml
Example : perl -MBio::DB::Query::HIVQuery -e "new Bio::DB::Query::HIVQuery()->help" | lynx -stdin
Returns : HTML
Args    : optional filename; otherwise prints to stdout

Annotation manipulation methods

get_annotations_by_ids

Title   : get_annotations_by_ids (or ..._by_id)
Usage   : $ac = $hiv_query->get_annotations_by_ids(@ids)
Function: Get the Bio::Annotation::Collection for these sequence ids
Example :
Returns : A Bio::Annotation::Collection object
Args    : an array of sequence ids

add_annotations_for_id

Title   : add_annotations_for_id
Usage   : $hiv_query->add_annotations_for_id( $id ) to create a new 
           empty collection for $id
          $hiv_query->add_annotations_for_id( $id, $ac ) to associate 
          $ac with $id
Function: Associate a Bio::Annotation::Collection with this sequence id
Example :
Returns : a Bio::Annotation::Collection object
Args    : sequence id [, Bio::Annotation::Collection object]

remove_annotations_for_ids

Title   : remove_annotations_for_ids (or ..._for_id)
Usage   : $hiv_query->remove_annotations_for_ids( @ids)
Function: Remove annotation collection for this sequence id
Example :
Returns : An array of the previous annotation collections for these ids
Args    : an array of sequence ids

remove_annotations

Title   : remove_annotations
Usage   : $hiv_query->remove_annotations()
Function: Remove all annotation collections for this object
Example :
Returns : The previous annotation collection hash for this object
Args    : none

get_value

Title   : get_value
Usage   : $ac->get_value($tagname) -or-
          $ac->get_value( $tag_level1, $tag_level2,... )
Function: access the annotation value assocated with the given tags
Example :
Returns : a scalar
Args    : an array of tagnames that descend into the annotation tree
Note    : this is a L<Bio::AnnotationCollectionI> method added in 
          L<Bio::DB::HIV::HIVQueryHelper>

put_value

Title   : put_value
Usage   : $ac->put_value($tagname, $value) -or-
          $ac->put_value([$tag_level1, $tag_level2, ...], $value) -or-
          $ac->put_value( [$tag_level1, $tag_level2, ...] )
Function: create a node in an annotation tree, and assign a scalar value to it
          if a value is specified
Example :
Returns : scalar or a Bio::AnnotationCollection object
Args    : $tagname, $value scalars (can be specified as -KEYS=>$tagname,
          -VALUE=>$value) -or- 
          \@tagnames, $value (or as -KEYS=>\@tagnames, -VALUE=>$value )
Notes   : This is a L<Bio::AnnotationCollectionI> method added in 
          L<Bio::DB::HIV::HIVQueryHelper>.
          If intervening nodes do not exist, put_value creates them, replacing 
          existing nodes. So if $ac->put_value('x', 10) was done, then later,
          $ac->put_value(['x', 'y'], 20), the original value of 'x' is trashed,
          and $ac->get_value('x') will now return the annotation collection 
          with tagname 'y'. 

GenBank accession manipulation methods

get_accessions

Title   : get_accessions
Usage   : $hiv_query->get_accessions()
Function: Return an array of GenBank accessions associated with these 
          sequences (available only after a query is subjected to a 
          full run (i.e., when $RUN_OPTION == 2)
Example :
Returns : array of gb accession numbers, or () if none found for this query
Args    : none

get_accessions_by_ids

Title   : get_accessions_by_ids (or ..._by_id)
Usage   : $hiv_query->get_accessions_by_ids(@ids)
Function: Return an array of GenBank accessions associated with these 
          LANL ids (available only after a query is subjected to a 
          full run (i.e., when $RUN_OPTION == 2)
Example :
Returns : array of gb accession numbers, or () if none found for this query
Args    : none

Query control methods

_do_query

Title   : _do_query
Usage   : $hiv_query->_do_query or $hiv_query->_do_query($run_level)
Function: Execute the query according to argument or $RUN_OPTION
          and set _RUN_LEVEL
          extent of query reflects the value of argument
           0 : validate only (no HTTP action)
           1 : return sequence count only
           2 : return sequence ids (full query, returns with annotations)
          noop if current _RUN_LEVEL of query is >= argument or $RUN_OPTION,
Example :
Returns : actual _RUN_LEVEL (0, 1, or 2) achieved
Args    : desired run level (optional, global $RUN_OPTION is default)

_reset

Title   : _reset
Usage   : $hiv_query->_reset
Function: Resets query storage, count, and ids, while retaining session id, 
          original query string, and db schema
Example : 
Returns : void
Args    : none

_session_id

Title   : _session_id
Usage   : $hiv_query->_session_id($newval)
Function: Get/set HIV db session id (initialized in _do_lanl_request)
Example : 
Returns : value of _session_id (a scalar)
Args    : on set, new value (a scalar or undef, optional)

_run_option

Title   : _run_option
Usage   : $hiv_query->_run_option($newval)
Function: Get/set HIV db query run option (see _do_query for values)
Example : 
Returns : value of _run_option (a scalar)
Args    : on set, new value (a scalar or undef, optional)

Internals

add_id

Title   : add_id
Usage   : $hiv_query->add_id($id)
Function: Add new id to ids
Example : 
Returns : the new id
Args    : a sequence id

map_db

Title   : map_db
Usage   : $obj->map_db($newval)
Function: 
Example : 
Returns : value of map_db (a scalar)
Args    : on set, new value (a scalar or undef, optional)

make_search_if

Title   : make_search_if
Usage   : $obj->make_search_if($newval)
Function: 
Example : 
Returns : value of make_search_if (a scalar)
Args    : on set, new value (a scalar or undef, optional)

search_

Title   : search_
Usage   : $obj->search_($newval)
Function: 
Example : 
Returns : value of search_ (a scalar)
Args    : on set, new value (a scalar or undef, optional)

_map_db_uri

Title   : _map_db_uri
Usage   :
Function: return the full map_db uri ("Database Map")
Example :
Returns : scalar string
Args    : none

_make_search_if_uri

Title   : _make_search_if_uri
Usage   :
Function: return the full make_search_if uri ("Make Search Interface")
Example :
Returns : scalar string
Args    : none

_search_uri

Title   : _search_uri
Usage   :
Function: return the full search cgi uri ("Search Database")
Example :
Returns : scalar string
Args    : none

_schema_file

Title   : _schema_file
Usage   : $hiv_query->_schema_file($newval)
Function: 
Example : 
Returns : value of _schema_file (an XML string or filename)
Args    : on set, new value (an XML string or filename, or undef, optional)

_schema

Title   : _schema
Usage   : $hiv_query->_schema($newVal)
Function: 
Example : 
Returns : value of _schema (an HIVSchema object in package 
          L<Bio::DB::HIV::HIVQueryHelper>)
Args    : none (field set directly in new())

_lanl_query

Title   : _lanl_query
Usage   : $hiv_query->_lanl_query(\@query_parms)
Function: pushes \@query_parms onto @{$self->{'_lanl_query'}
Example : 
Returns : value of _lanl_query (an arrayref)
Args    : on set, new value (an arrayref or undef, optional)

_lanl_response

Title   : _lanl_response
Usage   : $hiv_query->_lanl_response($response)
Function: pushes $response onto @{$hiv_query->{'_lanl_response'}}
Example : 
Returns : value of _lanl_response (an arrayref of HTTP::Response objects)
Args    : on set, new value (an HTTP::Response object or undef, optional)

_create_lanl_query

Title   : _create_lanl_query
Usage   : $hiv_query->_create_lanl_query()
Function: validate query hash or string, prepare for _do_lanl_request
Example : 
Returns : 1 if successful; throws exception on invalid query
Args    :

_do_lanl_request

Title   : _do_lanl_request
Usage   : $hiv_query->_do_lanl_request()
Function: Perform search request on _create_lanl_query-validated query
Example : 
Returns : 1 if successful
Args    : 

_parse_lanl_response

Title   : _parse_lanl_response
Usage   : $hiv_query->_parse_lanl_response()
Function: Parse the tab-separated-value response obtained by _do_lanl_request
          for sequence ids, accessions, and annotations
Example : 
Returns : 1 if successful
Args    : 

_parse_query_string

Title   : _parse_query_string
Usage   : $hiv_query->_parse_query_string($str)
Function: Parses a query string using query language emulator QRY
        : in L<Bio::DB::Query::HIVQueryHelper>
Example : 
Returns : arrayref of hash structures suitable for passing to _create_lanl_query
Args    : a string scalar

Dude, sorry-

_sorry

Title   : _sorry
Usage   : $hiv_query->_sorry("-president=>Powell")
Function: Throws an exception for unsupported option or parameter
Example :
Returns : 
Args    : scalar string