NAME
Bio::DOOP::Util::Run::GeneMerge - GeneMerge based GO analyzer
VERSION
Version 0.02
SYNOPSIS
#!/usr/bin/perl -w
use Bio::DOOP::DOOP;
$test = Bio::DOOP::Util::Run::GeneMerge->new();
if ($test->getDescFile("GO/use/GO.BP.use") < 0){
print"Desc error\n"
}
if ($test->getAssocFile("GO/assoc/A_thaliana.converted.BP") < 0){
print"Assoc error\n"
}
if ($test->getPopFile("GO/pop.500") < 0){
print"Pop error\n"
}
if ($test->getStudyFile("GO/study.500/combined1314.list") < 0){
print"Study error\n"
}
$results = $test->getResults();
foreach $res (@{$results}) {
print $$res{'GOterm'}," ",$$res{'RawEs'},"\n";
}
DESCRIPTION
This is a module based on GeneMerge v1.2.
Original program described in:
Cristian I. Castillo-Davis and Daniel L. Hartl GeneMerge - post-genomic analysis, data mining, and hypothesis testing Bioinformatics Vol. 19 no. 7 2003, Pages 891-892
The original program is not really good for large scale analysis, because the design uses a lot of I/O processes. This version fetches everything into memory at start.
AUTHORS
Tibor Nagy, Godollo, Endre Sebestyen, Martonvasar,
METHODS
new
Create new GeneMerge object.
$genemerge = Bio::DOOP::Util::Run::GeneMerge->new;
getAssocFile
The method loads the GO association file and stores it in memory. The file format is the following. Each line starts with a cluster id, and after some whitespace the associated GO ids are enumerated, separated by semicolons.
81001020 GO:0016020;GO:0003674;GO:0008150 81001110 GO:0005739;GO:0003674
$genemerge->getAssocFile('/tmp/assoc.txt');
getPopFile
The method loads the population file and stores it in memory. The file format is the following. Each line contains one and only one cluster id.
81001020 81001110
$genemerge->getPopFile('/tmp/pop.txt');
popFreq
The method calculates the population frequency. Do not use it directly.
getDescFile
The method loads the GO description file. The file format is the following. Each line starts with the GO id, and separated by a tab, the description of the GO id.
GO:0000007 low-affinity zinc ion transporter activity GO:0000008 thioredoxin
$genemerge->getDescFile('/tmp/desc.txt');
getStudyFile
The method loads the study data set, counts GO frequencies, calculates P values based on the hypergeometric distribution, and corrects P values, based on the Bonferroni method.
The file format of the study file is the following. Each line contains one and only one cluster id.
81001020 81001110
$genemerge->getStudyFile('/tmp/study.txt');
getResults
The method gives back all the results as an arrayref of hashes.
$results = $genemerge->getResults();
foreach $result (@{$results}) {
$goterm = $$result{'GOterm'};
$popfreq = $$result{'PopFreq'};
$popfrac = $$result{'PopFrac'};
$studyfrac = $$result{'StudyFrac'};
$studyfracall = $$result{'StudyFracAll'};
$raw_escore = $$result{'RawEs'};
$escore = $$result{'EScore'};
$desc = $$result{'Desc'};
@contrib = @{$$result{'Contrib'}};
}
hypergeometric
This is an internal function to calculate the hypergeometric distribution. Do not use it directly.
logNchooseK
Another internal function for the correct statistical results. Do not use it directly.
lFactorial
Factorial calculating function. Do not use it directly.