NAME
GO::OntologyProvider::OboParser - Provides API for retrieving data from Gene Ontology obo file.
SYNOPSIS
use GO::OntologyProvider::OboParser;
my $ontology = GO::OntologyProvider::OboParser->new(ontologyFile => "gene_ontology.obo",
aspect => [P|F|C]);
print "The ancestors of GO:0006177 are:\n";
my $node = $ontology->nodeFromId("GO:0006177");
foreach my $ancestor ($node->ancestors){
print $ancestor->goid, " ", $ancestor->term, "\n";
}
$ontology->printOntology();
DESCRIPTION
GO::OntologyProvider::OboParser implements the interface defined by GO::OntologyProvider, and parses the gene ontology obo file (GO) in plain text (not XML) format. These files can be obtained from the Gene Ontology Consortium web site, http://www.geneontology.org/. From the information in the file, it creates a directed acyclic graph (DAG) structure in memory. This means that GO terms are arranged into tree-like structures where each GO node can have multiple parent nodes and multiple child nodes. The file MUST be named with a .obo suffix.
This data structure can be used in conjunction with files in which certain genes are annotated to corresponding GO nodes.
Each GO ID (e.g. "GO:1234567") has associated with it a GO node. That GO node contains the name of the GO term, a list of the nodes directly above the node ("parent nodes"), and a list of the nodes directly below the current node ("child nodes"). The "ancestor nodes" of a certain node are all of the nodes that are in a path from the current node to the root of the ontology, with all repetitions removed.
The example format is as follows:
[Term] id: GO:0000006 name: high affinity zinc uptake transporter activity namespace: molecular_function def: "Catalysis of the reaction: Zn2+(out) = Zn2+(in), probably powered by proton motive force." [TC:2.A.5.1.1] xref_analog: TC:2.A.5.1.1 is_a: GO:0005385 ! zinc ion transporter activity
[Term] id: GO:0000005 name: ribosomal chaperone activity namespace: molecular_function def: "OBSOLETE. Assists in the correct assembly of ribosomes or ribosomal subunits in vivo, but is not a component of the assembled ribosome when performing its normal biological function." [GOC:jl, PMID:12150913] comment: This term was made obsolete because it refers to a class of gene products and a biological process rather than a molecular function. To update annotations, consider the molecular function term 'unfolded protein binding ; GO:0051082' and the biological process term 'ribosome biogenesis and assembly ; GO:0042254' and its children. is_obsolete: true
Instance Constructor
new
This is the constructor for an OboParser object. The constructor expects one of two arguments, either an 'ontologyFile' argument, or an 'objectFile' argument. When instantiated with an ontologyFile argument, it expects it to correspond to an obo file created by the GO consortium, according to their file format, and in addition, also requires an 'aspect' argument. When instantiated with an objectFile argument, it expects to open a previously created ontologyParser object that has been serialized to disk (see serializeToDisk).
Usage:
my $ontology = GO::OntologyProvider::OboParser->new(ontologyFile => $ontologyFile,
aspect => $aspect);
my $ontology = GO::OntologyProvider::OboParser->new(objectFile => $objectFile);
Instance Methods
printOntology
This prints out the ontology, with redundancies, to STDOUT. It does not yet print out all of the ontology information (like relationship type etc). This method will be likely be removed in a future version, so should not be relied upon.
Usage:
$ontologyParser->printOntology;
allNodes
This method returns an array of all the GO:Nodes that have been created.
Usage:
my @nodes = $ontologyParser->allNodes;
rootNode
This returns the root node in the ontology.
my $rootNode = $ontologyParser->rootNode;
nodeFromId
This public method takes a GOID and returns the GO::Node that it corresponds to.
Usage :
my $node = $ontologyParser->nodeFromId($goid);
If the GOID does not correspond to a GO node, then undef will be returned. Note if you try to call any methods on an undef, you will get a fatal runtime error, so if you can't guarantee all GOIDs that you supply are good, you should check that the return value from this method is defined.
numNodes
This public method returns the number of nodes that exist with the ontology
Usage :
my $numNodes = $ontologyParser->numNodes;
serializeToDisk
Saves the current state of the Ontology Parser Object to a file, using the Storable package. Saves in network order for portability, just in case. Returns the name of the file. If no filename is provided, then the name of the file (and its directory, if one was provided) used for object construction, will be used, with .obj appended. If the object was instantiated from a file with a .obj suffix, then the same filename would be used, if none were provided.
This method currently causes a segfault on MacOSX (at least 10.1.5 -> 10.2.3), with perl 5.6, and Storable 1.0.14, when trying to store the process ontology. This failure occurs using either store, or nstore, and is manifested by a segmentation fault. It has not been investigated whether this is a perl problem, or a Storable problem (which has large amounts of C-code). This does not cause a segmentation on Solaris, using perl 5.6.1 and Storable 1.0.13. This does not make it clear whether it is a MacOSX problem or a perl problem or not. It should be noted that newer versions of both perl and Storable exist, and the code should be tested with those as well.
Usage:
my $objectFile = $ontologyParser->serializeToDisk(filename=>$filename);
Authors
Gavin Sherlock; sherlock@genome.stanford.edu
Elizabeth Boyle; ell@mit.edu
Shuai Weng; shuai@genome.stanford.edu