The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

geoCancerDiagnosticDatasetsRetriever - GEO Cancer Diagnostic Datasets Retriever is a bioinformatics tool for cancer diagnostic dataset retrieval from the GEO website.

SYNOPSIS

    Usage: geoCancerDiagnosticDatasetsRetriever -d "CANCER_TYPE" -p "PLATFORMS_CODES"

An example command using "myelodysplastic syndrome" as a query:

    $ geoCancerDiagnosticDatasetsRetriever -d "myelodysplastic syndrome" -p "GPL570"

The input and output files of geoCancerDiagnosticDatasetsRetriever will be found in the ~/geoCancerDiagnosticDatasetsRetriever_files/data/ and ~/geoCancerDiagnosticDatasetsRetriever_files/results/ directories, respectively.

DESCRIPTION

Gene Expression Omnibus (GEO) Cancer Diagnostic Datasets Retriever is a Bioinformatics tool for cancer diagnostic dataset retrieval from the GEO database. It requires a GeoDatasets input file listing all GSE dataset entries for a specific cancer (for example, Myelodysplastic syndrome), obtained as a download from the GEO database. This Bioinformatics tool functions by applying keyword filters to examine individual GSE dataset entries listed in a GEO DataSets input file. The first Diagnostic text filter flags for diagnostic keywords (for example, “diagnosis” or “health”) used by clinical science researchers and present in the title/abstract entries. Next, a flagged dataset is examined (by a second Diagnostic text filter) for diagnostic keywords, which may be present in the "Overall design" section of a GSE dataset. If found, this tool outputs the GSE code of the likely diagnostic dataset. If not found by the second filter, a more intensive filtering stage is performed. Here, this tool runs an R script (healthyControlsPresentInputParams.r) whose function is to detect desired keywords in the .SOFT file of this dataset and identify if it is a likely diagnostic dataset.

INSTALLATION

geoCancerDiagnosticDatasetsRetriever can be used on any Linux or macOS machines. To run the program, you need to have cURL (version 7.68.0 or later), Lynx (version 2.9.0dev.5 or later), and the R programming language (version 4 or later) installed on your computer.

By default, Perl is installed on all Linux or macOS operating systems. Likewise, cURL is installed on all macOS versions. cURL/R may not be installed on Linux/macOS or Lynx on macOS. They would need to be manually installed through your operating system's software centres. cURL and Lynx will be installed automatically on Linux Ubuntu by geoCancerDiagnosticDatasetsRetriever.

Manual install:

    $ perl Makefile.PL
    $ make
    $ make install

On Linux Ubuntu, you might need to run the last command as a superuser (sudo make install) and to manually install the libfile-homedir-perl package (sudo apt-get install -y libfile-homedir-perl), if not already installed in your Perl 5 configuration.

CPAN install:

    $ cpanm App::geoCancerDiagnosticDatasetsRetriever

To uninstall:

    $ cpanm --uninstall App::geoCancerDiagnosticDatasetsRetriever

On Linux Ubuntu, you might need to run the two previous CPAN commands as a superuser (sudo cpanm App::geoCancerDiagnosticDatasetsRetriever and sudo cpanm --uninstall App::geoCancerDiagnosticDatasetsRetriever).

DATA FILE

The required input file is a GEO DataSets file obtainable as a download from GEO DataSets, upon querying for any particular cancer (for example, myelodysplastic syndrome) in geoCancerDiagnosticDatasetsRetriever.

HELP

Help information can be read by typing the following command:

    $ geoCancerDiagnosticDatasetsRetriever -h

This command will print the following instructions:

Usage: geoCancerDiagnosticDatasetsRetriever -h

Mandatory arguments:

    CANCER_TYPE           type of the cancer as query search term
    PLATFORM_CODES        list of GPL platform codes

    Optional arguments:
    -h                    show help message and exit

AUTHORS

Abbas Alameer (Kuwait University) and Davide Chicco (University of Toronto)

For information, please contact Abbas Alameer at abbas.alameer(AT)ku.edu.kw or Davide Chicco at davidechicco(AT)davidechicco.it

COPYRIGHT AND LICENSE

Copyright 2021 by Abbas Alameer (Kuwait University) and Davide Chicco (University of Toronto)

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License, version 2 (GPLv2).