geoCancerDiagnosticDatasetsRetriever
GEO Cancer Diagnostic Datasets Retriever is a bioinformatics tool for cancer diagnostic dataset retrieval from the GEO website.
Summary
Gene Expression Omnibus (GEO) Cancer Diagnostic Datasets Retriever is a Bioinformatics tool for cancer diagnostic dataset retrieval from the GEO database. It requires a GeoDatasets input file listing all GSE dataset entries for a specific cancer (for example, Myelodysplastic syndrome), obtained as a download from the GEO database. This Bioinformatics tool functions by applying keyword filters to examine individual GSE dataset entries listed in a GEO DataSets input file. The first Diagnostic text filter flags for diagnostic keywords (for example, “diagnosis” or “health”) used by clinical science researchers and present in the title/abstract entries. Next, a flagged dataset is examined (by a second Diagnostic text filter) for diagnostic keywords, which may be present in the "Overall design" section of a GSE dataset. If found, this tool outputs the GSE code of the likely diagnostic dataset. If not found by the second filter, a more intensive filtering stage is performed. Here, this tool runs an R script (healthyControlsPresentInputParams.r) whose function is to detect desired keywords in the .SOFT file of this dataset and identify if it is a likely diagnostic dataset.
Installation
geoCancerDiagnosticDatasetsRetriever can be used on any Linux or macOS machines. To run the program, you need to have the following programs installed on your computer:
- Perl (version 5.8.0 or later)
- cURL (version 7.68.0 or later)
- Lynx (version 2.9.0dev.5 or later)
- R programming language (version 4 or later)
Manual install:
perl Makefile.PL
make
make install
On Linux Ubuntu, you might need to run the last command as a superuser
(sudo make install) and to manually install the libfile-homedir-perl
package (sudo apt-get install -y libfile-homedir-perl), if not
already installed in your Perl 5 configuration.
CPAN install:
cpanm App::geoCancerDiagnosticDatasetsRetriever
To uninstall:
cpanm --uninstall App::geoCancerDiagnosticDatasetsRetriever
On Linux Ubuntu, you might need to run the two previous CPAN commands as a superuser (sudo cpanm App::geoCancerDiagnosticDatasetsRetriever and sudo cpanm --uninstall App::geoCancerDiagnosticDatasetsRetriever).
Data file
The required input file is a GEO DataSets file obtainable as a download from GEO DataSets, upon querying for any particular cancer (for example, myelodysplastic syndrome) in geoCancerDiagnosticDatasetsRetriever.
Execution instructions
Run geoCancerDiagnosticDatasetsRetriever with the following command:
geoCancerDiagnosticDatasetsRetriever -d "CANCER_TYPE" -p "PLATFORMS_CODES"
An example command using "myelodysplastic syndrome" as a query:
geoCancerDiagnosticDatasetsRetriever -d "myelodysplastic syndrome" -p "GPL570"
The input and output files of geoCancerDiagnosticDatasetsRetriever will be found in the ~/geoCancerDiagnosticDatasetsRetriever_files/data/ and ~/geoCancerDiagnosticDatasetsRetriever_files/results/ directories, respectively.
Help information can be read by typing the following command:
geoCancerDiagnosticDatasetsRetriever -h
This command will print the following instructions:
Usage: geoCancerDiagnosticDatasetsRetriever -h
Mandatory arguments:
CANCER_TYPE type of the cancer as query search term
PLATFORM_CODES list of GPL platform codes
Optional arguments:
-h show help message and exit
Copyright and License
Copyright 2021 by Abbas Alameer (Kuwait University) and Davide Chicco (University of Toronto)
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License, version 2 (GPLv2).
Contact
geoCancerDiagnosticDatasetsRetriever was developed by:
Abbas Alameer (Kuwait University) and Davide Chicco (University of Toronto)
For information, please contact Abbas Alameer at abbas.alameer(AT)ku.edu.kw or Davide Chicco at davidechicco(AT)davidechicco.it