NAME
GO::AppHandle - Gene Ontology Data API handle
SYNOPSIS
use GO::AppHandle;
my $dbname = "go";
# connect to a database on a specific host
$apph = GO::AppHandle->connect(-dbname=>$dbname, -dbhost=>$mysqlhost);
# EXAMPLE 1
# fetching a GO term from the datasource
$term = $apph->get_term({acc=>"GO:0003677"});
printf
"GO term; name=%s GO ID=%s\n",
$term->name(), $term->public_acc();
# EXAMPLE 2
# fetching a list of associations to the ER
# (and all the GO terms that are subtypes of the ER, or
# located within the ER)
# for which there is reasonably good evidence
# (traceable author / direct assay)
$assocs = $apph->get_associations({name=>"endoplasmic reticulum"},
{evcodes=>["TAS", "IDA"]});
foreach my $assoc (@$assocs) {
printf
"Gene: %s evidence for association: %s %s",
$assoc->gene_product->symbol,
$assoc->evidence->code(),
$assoc->evidence->xref->xref_key();
}
# EXAMPLE 3
# fetching a subgraph of GO
$graph = $apph->get_graph(-acc=>3677, -depth=>3);
foreach my $term (@ {$graph->get_all_nodes}) {
printf
"GO term; name=%s GO ID=%s\n",
$term->name(), $term->public_acc();
}
# EXAMPLE 4
# fetching a subgraph of GO,
# and using a graph iterator to
# display the graph
$graph = $apph->get_graph_by_search("DNA helicase*");
$it = $graph->create_iterator;
while (my $ni = $it->next_node_instance) {
$depth = $ni->depth;
$term = $ni->term;
printf
"%s Term = %s (%s) // n_assocs=%s // depth=%d\n",
"----" x $depth,
$term->name,
$term->public_acc,
$term->n_associations || 0,
$depth;
}
# EXAMPLE 5
# fetching a subgraph of GO,
# constraining by gene products
# get all terms that were used to annotate these two SGD genes
$terms = $apph->get_terms({products=>["Eip63F-1", "Krt1-13"]});
# build a graph all the way to the leaf nodes
# from the above terms
$graph = $apph->get_graph_by_terms($terms, -1);
# create an iterator on the graph
$it = $graph->create_iterator;
# iterate through every node in graph
while (my $ni = $it->next_node_instance) {
$depth = $ni->depth;
$term = $ni->term;
printf
"%s Term = %s (%s) // ASSOCS=%s\n",
"----" x $depth,
$term->name,
$term->public_acc,
join("; ",
map {$_->gene_product->acc} @{$term->association_list});
}
DESCRIPTION
This is a module for accessing Gene Ontology data sources, e.g the GO relational database. It defines a set of methods that provide a consistent interface independent of the way the GO data is stored.
For an explanation of the GO project, please visit hþtp://www.geneontology.org
If you are developing GO applications in perl, this is your main way into the data. You only need to read this page, and possibly the perldocs for GO::Model::Term, GO::Model::Association, GO::Model::Graph, etc
e.g.
perldoc GO/Model/Term.pm
perldoc GO/Model/Relationship.pm
perldoc GO/Model/Graph.pm
Or, if you reading this from the web
You can view the object model diagram at http://www.godatabase.org/dev/go-perl/doc/go-perl-doc.html
if you have installed the GO perl modules, you should have manpages already - e.g. try "man GO::Model::Graph"
PUBLIC METHODS - AppHandle
connect
Usage - $apph = AppHandle->connect(-dbname=>"go");
Usage - $apph = AppHandle->connect(-ior_url=>$url);
Usage - $apph = AppHandle->connect(-dbname=>"go",
-dbiproxy=>"hostname=blahblah.geneontology.org;port=3335");
Usage - $apph = AppHandle->connect(\@my_args);
Returns - an object implementing GO::AppHandle
Args - either array or reference to an array
This is the call you make to receive an API handle.
The argument array should be passed as alternate key/value pairs, with keys preceeded by a hyphen. if an array *reference* is passed as an argument, all the key/value pairs that are recognised will be processed and removed from the array. This means you can write unix command line scripts like this:
# usage: myscript.pl [-dbname db] [-ior_url url] [etc] ACCESSION
$apph=AppHandle->connect(\@ARGV);
my $go_id = shift @ARGV;
print $apph->get_term($go_id)->description();
and defer the decision as to how to connect to the user.
You can also specify default settings in a file $HOME/.geneontologyrc e.g.
dbname go
dbhost gomysql.geneontology.org
connection parameters
These are the parameters that are currently recognised:
- -dbname [or -d]
-
name of database; usually "go" but you may want point at a test/dvlp database
- -dbuser
-
name of user to connect to database as; optional
- -dbauth
-
password to connect to database with; optional
- -dbh
-
if you like, you can pass in your own DBI handle object; it is recommended you dont and instead let the connect() method create this for you
- -dbhost [or -h]
-
name of server where the database server lives; see http://www.godatabase.org/dev/database for details of which servers are active. or you can just specify "localhost" if you have go-mysql installed locally
- -dbiproxy
-
address of proxy server; if you wish to connect remotely, and there is a go proxy server running you can use this in combination with -dbname.
[in order to use this, you will need DBI installed]
currently there is no stable proxy server running
- -ior_url
-
url serving the IOR (internet orb reference) for a GO corba server
[in order to use this, you will need an orb, such as orbit, installed]
- -impl
-
API Handle implementation; currently either "sql" or "corba". This parameter is optional, as the implementation is inferred by the presence of the parameters above
DATA SOURCE QUERYING METHODS
timestamp
Usage - my $time = $apph->timestamp;
Usage - my $pp_time = localtime($apph->timestamp);
returns the timestamp for the data; eg if the datasource is an sql database loaded from the flatfiles, this returns the time at which the load was initiated
get_term
Usage - my $term = $apph->get_term({acc=>3677})
Usage - my $term = $apph->get_term({search=>"apoptos*"})
Returns - GO::Model::Term
Args - constraints [hashref], attributes [array ref], template
See GO::Model::Term
get_terms
Usage - my $term_l = $apph->get_terms({search=>"apoptos*"})
Usage - my $term_l = $apph->get_terms({product=>"ninaA"})
Returns - arrayref of GO::Model::Term
Args - constraints [hashref], attributes [array ref], template
fetches a term or list of terms from the database
See GO::Model::Term
specify the term by the constraints hashref; the keys can be any of:
- name
-
my $term = $apph->get_term({name=>"DNA binding"})
fetches a term by it's name/description
- type
-
my $terms = $apph->get_terms({type=>"biological_process"})
constrains search by ontology
- synonym
-
my $term = $apph->get_term({synonym=>"RNAi"})
fetches a term by it's synonym
- subset
-
my $term = $apph->get_term({subset=>"goslim_plant"})
Finds all terms in goslim_plant.
The goslims themsleves can be retrieved as terms, like this:
my $goslim_plant = $apph->get_term({acc=>"goslim_plant"})
You can get all slims like this:
my $slims = $apph->get_terms({term_type=>"subset"})
- acc
-
my $term = $apph->get_term({acc=>3677})
fetches a term by its GO ID/accession (expressed as an integer, without the GO: prefix)
You can specify multiple accs as an arrayref
- search
-
my $term_l = $apph->get_terms({search=>"apoptos*"})
fetches a term or terms by doing a search on name/description, synonyms, definition, xrefs (eg swissprot keywords), comments, obsoletes
search can have "*" as wildcards
ADVANCED SEARCH OPTIONS: you can also specify a list of fields to search, eg:
# all terms with carbohydrate in name or synonym field my $term_l = $apph->get_terms({search=>"carbohydrate*", search_fields=>"name,synonym"}); # search all fields except definition my $term_l = $apph->get_terms({search=>"carbohydrate*", search_fields=>"!definition"}); # equivalent to the above my $term_l = $apph->get_terms({search=>"carbohydrate*", search_fields=>"name,synonym,dbxrefs,comments"});
(NOTE: dont leave spaces between commas)
- product
-
my $term_l = $apph->get_terms({product=>"ninaA"})
fetches terms for which there is an association to the specified gene product.
product can either be expressed as a gene product symbol, or a GO::Model::GeneProduct object or hashreference
my $term_l = $apph->get_terms({product=>{full_name=>"heat shock protein, DNAJ-like 3"}})
fetches all terms for which there is associations for products with this full_name
- product_accs
-
my $term_l = $apph->get_terms({products_accs=>["S0004660", "S0004661"]})
- products
-
my $term_l = $apph->get_terms({products=>["mygene1", "mygene2", ....]})
fetches terms for which there is an association to one of the specified gene products.
product can either be expressed as a list of gene product symbols, or a list of GO::Model::GeneProduct object or hashreference
my $term_l = $apph->get_terms({products=>[{acc=>"FBgn0000001"}, {acc=>"FBgn0000002"}]})
fetches all terms for which there is associations for products with these gene product accessions
my $term_l = $apph->get_terms({products=>{full_name=>"endothelial cell-selective adhesion molecule"}});
finds the gene product with full_name "endothelial cell-selective adhesion molecule" and finds the GO terms used to annotate that product
my $term_l = $apph->get_terms({products=>{synonym=>"HUF*"}, is_not=>0});
this finds all terms that have products with a synonym matching the wildcard HUF*. negative annotations are filtered out.
NOTE: when you constrain the list of terms using a product or list of products, the resulting terms will be adorned with these products, they can be accessed via
$term->selected_association_list
Rationale: say we have a bunch of proteins that we have clustered eg via expression data or by sequence analysis; we want to see how that cluster jives with the GO categorizations. we can just query terms by the product list and show how the products are adorned on the tree
note: constraints can also be passed in as an array of name/value pairs
get_terms_with_associations
Usage - my $term_l = $apph->get_terms_with_associations({acc=>3677})
Returns - arrayref of GO::Model::Term
Args - constraints [hashref], attributes [array ref], template
See GO::Model::Term
This will fetch a list of terms, including all the ones specified by the constraints hash, and also including any child terms of these terms. It will also populate $term->association_list for each of these. Any terms that do not have any associations are filtered out.
Rationale: often we want to fetch a list of gene products for any particular term, and also fetch gene products beneath this term. We could use $term->deep_products() but we would lose information on how each term is associated to each product.
The following piece of code illustrates how this may be used:
# fetch all terms with associations that are DNA Binding (GO:0003677)
# fetches all subtypes of DNA binding, so long as they have
# associations attached
my $tl = $apph->get_terms_with_associations({acc=>3677});
foreach my $t (@$tl) {
my $al = $t->association_list;
foreach my $a (@$al) {
printf(
"%s %20s %s %s %s\n",
$t->public_acc,
$t->name,
$a->gene_product->symbol,
);
}
}
filters: $apph->filters will be respected in constructing this query
developes note: see test t290 for specification of this behaviour
TEMPLATES (optional):
the term object is attached to other objects like this:
GO::Model::Term --->[n] GO::Model::Association --->[1] GO::Model::Product
|
|
------------------->[n] GO::Model::Evidence
you can specify that only a subset of this info is retrieved via templates, like this:
# this just gets the GO::Model::Term object, no associations
$term = $apph->get_term({acc=>3677}, "shallow");
# this just gets the accession and definition fields
$term = $apph->get_term({acc=>3677}, {acc=>1, definition=>1});
get_term_by_acc
Usage - my $term = $apph->get_term_by_acc(3677)
Or - my $term = $apph->get_term_by_acc("GO:0003677")
Returns - GO::Model::Term
Args - accession (GO ID) + same args as get_term
See GO::Model::Term
get_terms_by_search
Usage - my $term = $apph->get_term_by_search("*membrane*")
Returns - GO::Model::Term
Args - search term + same args as get_term
use asterisk as the wildcard
See GO::Model::Term
get_root_term
Usage - my $term = $apph->get_root_term;
returns GO::Model::Term for top node in entire complete ontology
(ie Gene_Ontology)
See GO::Model::Term
get_ontology_root_terms
Usage - my $terms = $apph->get_ontology_root_terms;
returns GO::Model::Term list for top nodes in individual ontologies
(ie process, function, compinent)
See GO::Model::Term
get_relationships
Usage - my $rel_l = $apph->get_relationships({parent_acc=>3677});
Returns - list reference of GO::Model::Relationship objects
Args - constraints hashref
constraints: parent_acc (integer) GO ID of parent term (ie will return all arcs pointing down) child_acc (integer) GO ID of child term (ie will return all arcs pointing up) parent (GO::Model::Term) all rels for which this is a parent child (GO::Model::Term) all rels for which this is a child
TODO: constrain by type
See GO::Model::Term
get_parent_terms
Usage - my $term_lref = $apph->get_parent_terms($term);
Returns -
Args -
See GO::Model::Term
get_child_terms
Usage - my $term_lref = $apph->get_child_terms($term);
Returns -
Args -
See GO::Model::Term
get_associations
Usage - $assocs = $apph->get_associations(-term=>{acc=>3677},
-options=>{direct=>1});
Or - $assocs = $apph->get_associations({name=>"DNA supercoiling"});
Or - $assocs = $apph->get_associations({name=>"DNA supercoiling"},
{evcodes=>["IEA", "ISS"]});
Returns - listref of GO::Model::Association
Args - -term => term constraints (or GO::Model::Term object)
-constraints => other constraints hashref
-template => template
-options => hashref
this will fetch a list of associations for any term. it will also get associations for subtypes of this term. for instance
my $apph = GO::AppHandle->connect($connect_params);
my $term = $apph->get_term({name=>"DNA binding"});
my $assocs = $apph->get_associations(-term=>$term,
-options=>{direct=>1});
foreach my $assoc (@$assocs) {
printf " gene product:%s %s:%s\n",
$assoc->gene_product->symbol,
$assoc->gene_product->speciesdb,
$assoc->gene_product->acc;
}
will fetch and print all genes associated with DNA binding *plus* all genes associated with different kinds of DNA binding (eg DNA supercoiling)
the default is to descend the GO graph; if direct=>1 is specified then only gene associations *specifically with that term* are fetched. (Or you can also use one of the methods below)
get_all_associations
Usage - $assocs = $apph->get_associations({acc=>3677});
Or - $assocs = $apph->get_associations({name=>"DNA supercoiling"});
same as get_associations()
ie this fetches all associations directly attached to a term plus all descendants of that term. (for example if the term is "receptor", associations attached to "trasmembrane receptor" *WILL* be fetched.
get_direct_associations
Usage - $assocs = $apph->get_associations({acc=>3677});
Or - $assocs = $apph->get_associations({name=>"DNA supercoiling"});
same as get_associations() with direct=>1
ie this fetches all associations directly attached to a term (for example, if the term is "receptor", associations attached to "trasmembrane receptor" will *NOT* be fetched.
get_product
Usage - $product = $apph->get_product({symbol=>"Cyp1a1"});
Or - $product = $apph->get_product({synonym=>"HUF*"});
Or - $product = $apph->get_product({acc=>"FBgn0002936"});
Or - $products = $apph->get_products({speciesdb=>"MGI"});
Or - $products = $apph->get_products({taxid=>[7227]});
Or - $products = $apph->get_products({qualifier_taxid=>[7227]});
Or - $product = $apph->get_product({term=>3677});
Or - $products = $apph->get_products({terms=>[@terms]});
Returns - GO::Model::GeneProduct
Args - constraints attributes
constraints: symbol, acc, speciesdb, taxid, term, terms
give the constraint 'deep' if you want to fetch all the products for a term an its subterms
eg
# fetch all products attached to this node
# and all children of this node; exclude NOT associations
$prods =
$apph->get_products({deep=>1,
is_not=>0,
term=>{name=>"carbohydrate metabolism"}});
bear in mind that the above search is constrained by the evidence codes filter (which is !IEA by default). to get all products attached to carbohydrate metabolism or its children for which the association is IDA or IPI do this first:
$apph->filters({evcodes=>["IDA", "IPI"]});
MULTIPLE TERMS
by default, "or" is used to combine terms; eg
$prods = $apph->get_products({terms=>[6955, 5887]})
gets all products that are annotated to immune response OR integral membrane protein
if you want all products that are annotated to immune response AND integral membrane protein, then do this:
$prods = $apph->get_products({terms=>[6955, 5887], operator=>"and"})
get_products
Usage - as get_product
Returns - array ref of GO::Model::Product
Args - as get_product
get_deep_products
Usage - as get_product
e.g $apph->get_deep_products({term=>"transmembrane receptor"});
Returns - array ref of GO::Model::Product
Args - as get_product
fetches all products attached to a term *and any of its children*
this is exactly the same as calling get_products() with the deep=>1 constraint
for example, the above queries gets gene products that are any kind of transmembrane receptor - e.g. GPCR
if you have set the filters in using the filters() method then these filters will be used in the query, unless you override them
get_product_count
Usage - $apph->get_product_count({term=>$term});
Usage - $apph->get_product_count({term=>$term,
evcodes=>["!IEA"],
speciesdbs=>["SGD", "MGI", "WormBase"]});
Usage - $apph->get_product_count({term=>$term,
taxids=>[7227, 9606]});
Usage - $apph->get_product_count({term=>$term,
taxids=>[5691], # parasite
qualifier_taxids=>[9606,9313], # hosts
});
Returns - int
Args - constraints
gets the count for the number of gene products annotated at BUT NOT BELOW this level. if you have set the filters in using the filters() method then these filters will be used in determining the count, unless they are overridden by consteraints you pass in
term should be a GO::Model::Term object, or a term constraint, for example:
$apph->get_product_count({term=>{name=>"compound eye morphogenesis"}})
get_deep_product_count
Usage - $apph->get_deep_product_count({term=>$term});
Usage - $apph->get_deep_product_count({term=>$term,
evcodes=>["!IEA"],
speciesdbs=>["SGD", "MGI", "WormBase"]});
Returns - int
Args - constraints
gets the count for the number of gene products annotated at OR BELOW this level. if you have set the filters using the filters() method then these filters will be used in determining the count, unless they are overridden by consteraints you pass in
get_node_graph
Usage - my $graph = $apph->get_node_graph($acc, $depth)
or my $graph = $apph->get_node_graph(-acc=>$acc, -depth=>$depth)
or my $graph = $apph->get_node_graph(-acc=>$acc,
-depth=>$depth,
-template=>{terms=>$ttmpl})
Returns - GO::Model::Graph object
Args - acc, depth, template
See GO::Model::Graph
use this whenever you want to get a subgraph of the whole GO graph to a particular depth
the default action is to populate the graph up to 2 down, and all the way to the top.
get_graph
synonym for get_node_graph
get_graph_by_acc
synonym for get_node_graph
get_graph_by_search
Usage - $graph = $apph->get_graph_by_search("*binding*", 3)
Returns - GO::Model::Graph
Args - search term, depth [optional - 2 if omitted], template
finds all the terms that satisfy the search constraints, builds a subgraph of the whole GO graph that contains all these terms, then populates the graph downwards to the specified depth, and populates the graph with all paths from the terms up to the root term.
See GO::Model::Graph
get_graph_by_terms
Usage - $graph = $apph->get_graph_by_terms(\@terms, 3)
Usage - $graph = $apph->get_graph_by_terms(-terms=>\@terms,
-depth=>3,
-template=>{traverse_up=>0,
traverse_down=>1})
Returns - GO::Model::Graph
Args - GO::Model::Term list, depth [optional - 2 if omitted], template
Builds a subgraph of the whole GO graph that contains all the input terms, then populates the graph downwards to the specified depth, and populates the graph with all paths from the terms up to the root term.
See GO::Model::Graph
extend_graph
Usage - $apph->extend_graph($graph, $acc, $depth)
Returns -
Args - GO::Model::Graph, acc, depth, template
See GO::Model::Graph
get_paths_to_top
Usage - $path_l = $apph->get_paths_to_top({acc=>'GO:0003677'})
Returns - array ref of GO::Model::Path
Args - same constraints as get_term(...)
returns all the different paths to the root of the GO graph from any point
See GO::Model::Path
get_species_list
Usage - $list = $apph->get_species_list
Returns - arrayref of GO::Model::Species
Args -
returns a list of species for which there is at least one annotation
get_speciesdbs
Usage - $list = $apph->get_speciesdb_dict
Returns - arrayref of speciesdbs
Args - [optional constraints hashref]
Returns a list of speciesdbs. A speciesdb is a database that contributes associations; eg MGD, FB etc.
Speciesdb to species is often not one-to-one; eg SWISS-PROT may in future contribute associations for all species
get_speciesdb_dict
Usage - $sd = $apph->get_speciesdb_dict
Returns - dictionary/hashref of speciesdb->species objects
Args - [optional constraints hashref]
returns a lookup table keyed by species database name that point to Bio::Species objects (this is a bioperl object, you will need bioperl installed to make this call - see http://www.bioperl.org)
it is important to make the distinction between species and the speciesdb/datasource. associations are grouped in GO according to their source (eg SGD, FlyBase, MGI, Compugen). Currently there is a 1<->1 mapping between the source and species but this need not always be the case.
If you just want a list of sources/contributing databases, do this:
$sd = $apph->get_speciesdb_dict;
@sources = keys %$sd;
or you can get the database and the species like this:
$sd = $apph->get_speciesdb_dict;
foreach my $src (keys %$sd) {
my $species = $sd->{$src};
printf "source:%s common_name:%s\n",
$src, $species->common_name
}
see the bioperl docs for the Bio::Species object
(not all attributes are currently filled)
get_seq
Usage - $list = $apph->get_seq({display_id=>"Q9XHP0"})
Returns - GO::Model::Seq
Args -
get_dbs
Usage - $dbs = $apph->get_dbs({name=>"ZFIN"});
Returns - arrayref of L<GO::Model::DB>
Args - [optional constraints hashref]
Must have dbs loaded into database; this is typically sourced from http://www.geneontology.org/doc/GO.xrf_abbs
Any column from the db table can be used as a constraint
acc2name_h
Usage - $n = $apph->acc2name_h->{$go_id}
Returns - string
Args - acc string
returns a hash mapping between term IDs and term names derived from db
get_statistics
Usage -
Returns - GO::Stats object
Args -
filters
Usage - $apph->filters({evcodes=>["!IEA"]});
Returns -
Args - hashref of filter types, each value is an arrayref
filter types: speciesdb, evcodes
gets/sets default filters for querying data; when an AppHandle is initialized, the default filter should be ["!IEA"] (this is because there are so many IEAs it makes things disproportionately slower, and this will also discourage circular annotations)
Any value can be negated with the exclamation mark
# only get associations that are direct assays or
# traceable author statements
$apph->filters({evcodes=>["IDA", "TAS"]});
$graph = $apph->get_graph(5054);
$graph->to_text_output(1);
# only get associations that are
# in FB (flybase) or MGD (mouse)
# default ev code filter !IEA will be used
$apph->filters({filters=>["FB", "MGD"]});
$graph = $apph->get_graph(5054);
$graph->to_text_output(1);
It is very important to understand GO evidence codes. See http://www.geneontology.org for details
See GO::Model::Evidence and GO::Model::Association
evidence_codes
Usage - @codes = $apph->evidence_codes;
Returns -
Args -
ANALYSIS METHODS
get_enriched_term_hash
Usage - $eh = $apph->get_enriched_term_hash( $products )
Returns - hash
Args - listref of L<GO::Model::Product> OR listref of product constraint hashes
NOT YET FULLY TESTED
Requires GO::TermFinder (separate CPAN distribution)
Performs a term enrichment analysis. Uses hypergeometric distribution, takes entire DAG into account.
First the database will be queried for matching gene products. Any filters in place will be applied (or you can pass in a list of gene products previously fetched, eg using $apph->get_products).
The matching products count as the *sample*. This is compared against the gene products in the database that match any pre-set filters (statistics may be more meaningful when a filter is set to a particular taxon or speciesdb-source).
We then examine terms that have been used to annotate these gene products. Filters are taken into account (ie if !IEA is set, then no IEA associations will count). The DAG is also taken into account - so anything annotated to a process will count as being annotated to biological_process. This means the fake root "all" will always have p-val=1. Currently the entire DAG is traversed, relationship types are ignored (in future it may be possible to specify deduction rules - this will be useful when the number of relations in GO progresses beyond 2, or when this code is used with other ontologies)
Results are returned as a hash-of-hashes, outer hash keyed by term acc, inner hash specifying the fields:
- term
-
a GO::Model::Term object
- n_gps_in_sample_annotated
-
number of the initial product list that was fed in that are annotated (by transitivity, including filters) to this term
- n_gps_in_sample
-
number in the initial product list
- n_gps_in_database_annotated
-
number of products in database that are annotated (by transitivity) to this term
filters are applied
- n_gps_in_database
-
number of products in database (after filters are applied)
- gps_in_sample_annotated
-
a listref that is a subset of the original GO::Model::GeneProduct list; the ones that are annotated (by transitivity, including filters) to this term
- p_value
-
probability that this term occurs by chance
See http://genome-www5.stanford.edu/help/GO-TermFinder/GO_TermFinder_help.shtml
for a full explanation
-
$apph->filters({speciesdb=>"SGD",evcodes=>["!IEA"]}); my @pqs = map { {synonym=>$_} } qw(YNL116W YNL030W YNL126W); my $eh = $apph->get_enriched_term_hash(\@pqs); my @erows = sort { $a->{p_value} <=> $b->{p_value} } values %$eh; foreach (@erows) { next unless $_->{p_value} <= 0.1; next if $_->{n_gps_in_sample_annotated} < 2; printf("%s sample:%d/%d database:%d/%d P-value:%s \"%s\" Genes: %s\n", $_->{term}->acc, $_->{n_gps_in_sample_annotated}, $_->{n_gps_in_sample}, $_->{n_gps_in_database_annotated}, $_->{n_gps_in_database}, $_->{p_value}, $_->{term}->name, join('; ',map {sprintf("%s[%s]", $_->symbol, $_->acc)} @{$_->{gps_in_sample_annotated}})) }
Use GO::TermFinder - you must have this installed to use this method (you can still use the rest of this module without GO::TermFinder)
See also http://genome-www5.stanford.edu/help/GO-TermFinder/GO_TermFinder_help.shtml
Not implemented
Bonferroni method
AUDIT METHODS
source_audit
Usage -
Returns -
Args -
returns a listref of hashes
[
{source_type => 'file',
source_path => 'function.ontology',
source_mtime => 1233456787
},
{source_type => 'file',
source_path => 'process.ontology',
source_mtime => 1233456999
},
]
times are unixtimes
instance_data
Usage -
Returns -
Args -
returns a hash
{release_name => 'go_200212',
release_type => 'seqdb',
release_notes => ''
}
get_term_loadtime
Usage -
Returns -
Args - acc
returns unixtime for when the term was loaded into the db
DATA SOURCE SPECIFIC METHODS
These only work on the SQL implementation. You only need to call these if you are populating a GO database with your own data.
add_root
Usage - $apph->add_root
Returns -
Args - name of root node [optional]
GO::AppHandle assumes that there is exactly one global root node
The OBO file format allows unrooted terms - after loading an ontology, this method should be called. It creates a new global root (default name "all"), and de-roots the existing root nodes and places them under this node
fill_path_table
Usage - $apph->fill_path_table
Returns -
Args -
Builds the transitive closure table; see also http://www.godatabase.org/dev for docs
Once you have finished loading all your *terms* and their relationships into your GO database instance, you can call this method to populate the *path* table. the SQL implementation of AppHandle recognises when the path table is populated, and will use it to make queries (involving graphs, etc) more efficient.
Note: normally you dont have to worry about using this call yourself, if you use the scripts::load-go-into-db script this will get called for you.
CURRENT LOGIC
We express the current implementation using horn clause logical rules:
Rule 1 - reflexivity; every term is reflexively related to itself with distance 0
graph_path(Term,Term,0).
Rule 2 - direct relationships; every term is linked to its parent with distance 1
graph_path(Term,Parent,1) <- term2term(Term,Parent)
Rule 2 - transitive relationships; note recursive definition.
graph_path(Term,Ancestor,D+1) <- term2term(Term,Parent), graph_path(Parent,Ancestor,D)
To see the actual implementation, look at the source. See also GO::Model::GraphIterator
FUTURE PLANS
The current implementation builds a transitive closure over ALL relations indiscriminitely
In future we will want to build a deductive closure on a per-relation basis (use case: when querying for gene expression results, the user may want to see genes that are expressed in subtypes or parts, but NOT following the develops_from relation)
fill_count_table
Usage - $apph->fill_count_table
Returns -
Args - evcode array (optional), reltype array (optional)
Once you have finished loading all your *associations* into your GO database instance, you can call this method to populate the *gene_product_count* table. The SQL implementation of AppHandle recognises when the gene_product_count table is populated, and will use it to make GO::Model::Term->n_deep_product() calls more efficient.
Note: normally you dont have to worry about using this call yourself, if you use the load-assocs.pl script this will get called for you.
Currently the default is to give recursive product counts for non-IEA evidence codes; if you want to get the count for ALL annotations, do this:
$apph->fill_count_table([""]);
which is a little abstruse
Currently the default is to give recursive product counts for all relationship types.
To get recursive product counts for only is_a and part_of and non_IEA evidence codes, do this: $apph->fill_count_table(undef, ['is_a', 'part_of']);
In the future it may be desirable to have seperate recursive product counts for different evidence codes. the time/space tradeoff becomes more expensive here, ie you would have to wait a long time for the table to fill, and it would take up a lot of space. Currently this is not an option, but the code could easily be modified to do this, if desired. One difficulty is that the counts divided by evidence code are *not* additive, unlike the counts divided by speciesdb (this is because a single product cannot exist in >1 speciesdb, but a single product can be annotated with multiple associations with differing evidence codes)
once this has completed, you will be able to do this
$term = $apph->get_term({acc=>"GO:0003677"});
$recursive_product_count = $term->n_deep_products();
[this will be *very* slow unless you have filled the count table]
INHERITANCE
This class inherits from GO::ObjFactory, in the go-perl package. This class inherits all the methods of this class, but you should not need to use them directly
HOW IT WORKS
GO::AppHandle 'dispatches' the method calls to an actual
implementation object for actual execution. The GO::AppHandle object
is also responsible for the dynamic loading of implementations,
checking/handling and other duties. (Users of the DBI database access
module should find this familiar).
.-. .---------------------------.
.-------. | |---| AppHandleSqlImpl |--- DBI
| Perl | | | `---------------------------'
| script| |A| |A| .---------------------------.
| using |--|P|--|p|---| CorbaClient::Session |--- IIOP
| | |I| |p| `---------------------------'
| API | |H|...
|methods| |d|... Other implementations
-------' |l|... (e.g C language wrapper)
`-' `
GO::AppHandle hides the implementation specific details, and provides a robust, consistent interface which should be reasonably constant in the face of changes in relational tables, distribution mechanism etc.
If you write tools in perl that use AppHandle, the exact same code will (in theory) work, whether the tool is deployed as a server-side application with direct database connectivity, or as a client-side application using corba as a distribution mechanism.
It also allows us to plug in different implementations (e.g. object database, xml database, prolog predicate list or lisp knowledge base, ...)
TEMPLATES (optional)
whenever you ask for an object from the database, this API will return that object and some other associated objects. for instance, if you ask for a GO::Model::Term object, you will receive attached to that object a list of GO::Model::Xrefs, definitions, synonyms etc.
This behaviour may not always be desirable. if you are doing a search purely for GO accessions, for instance, you dont want the extra SQL overhead of fetching synonyms etc.
You can ask for only a subset of possible data to be returned by specifying an object "template".
# this just gets the GO::Model::Term object, no associations
$term = $apph->get_term({acc=>3677}, "shallow");
# this just gets the accession and definition fields
$term = $apph->get_term({acc=>3677}, {acc=>1, definition=>1});
# no template specified; the current default behaviour
# is to fetch everything except the full association list
# note that the association count is prefetched, so you can
# say $term->n_associations()
$term = $apph->get_term({acc=>3677});
not all implementations respect the templates; the default will generally be in favour of getting too much data.
as of April 2001, this API has been implemented such that some data such as associations are fetched on-demand.
this means you can ignore templates and just use the default, eg
$term = $apph->get_term({acc=>3677});
then when you enquire about the associations attribute like this
foreach my $assoc (@{$term->association_list}) {
....
}
the associations will be fetched;
you can still ask for all the associations up-front; in some circumstances this will be faster. generally you can choose to ignore this.
FEEDBACK
Email cjm AT fruitfly.berkeley.edu
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself
3 POD Errors
The following errors were encountered while parsing the POD:
- Around line 113:
Non-ASCII character seen before =encoding in 'hþtp://www.geneontology.org'. Assuming CP1252
- Around line 1183:
'=item' outside of any '=over'
- Around line 1215:
You forgot a '=back' before '=head3'