NAME
Microarray::GEO::SOFT - Reading microarray data in SOFT format from GEO database.
SYNOPSIS
use Microarray::GEO::SOFT;
use strict;
# initialize
my $soft = Microarray::GEO::SOFT->new;
# download
$soft->download("GSE19513");
$soft->download("GPL6793");
$soft->download("GDS3718");
# or else you can read local data
$soft = Microarray::GEO::SOFT->new(file => "GSE19513.soft");
# parse
# $data would be a object of Microarray:GEO::SOFT::GSE, Microarray::GEO::SOFT::GPL
# or Microarray::GEO::SOFT::GDS class
my $data = $soft->parse;
# meta info
$data->meta;
$data->title;
$data->platform;
$data->field;
# GPL belongs to GSE
my $gpl = $data->list("GPL")->[0];
# merge GSMs belonging to a same GPL into a whole
my $g = $data->merge->[0];
# transform the uid from probe id to gene symbol
$g->id_convert($gpl, "Gene Symbol");
# transform into Microarray::ExprSet class object
my $e = $g->soft2exprset;
# eliminate the blank lines
$e->remove_empty_feature;
# make all symbols unique
$e->unique_feature;
# obtain the expression matrix
$e->matrix;
DESCRIPTION
GEO (Gene Expression Omnibus) is the biggest database providing gene expression profile data. This module provides method to download and parse files in GEO database and transform them into format for common usage.
There are always four type of data in GEO which are GSE, GPL, GSM and GDS.
GPL: Platform of the microarray, like Affymetrix U133A
GSM: A single microarray
GSE: A complete microarray experiment, always contains multi GSMs and multi GPLS
GDS: manually collected data sets from GSE, only 1 platform
Data stored in GEO database has several formats. We provide method to parse the most used format: SOFT formatted family files. The origin data is downloaded from GEO ftp site.
Subroutines
new("file" = $file)
-
Initial a Microarray::GEO::SOFT class object. The only argument is file path for the microarray data in SOFT format or a file handle that has been openned.
$soft->download(ACC, %options
-
Download GEO record from NCBI website. The first argument is the accession number such as (GSExxx, GPLxxx or GDSxxx). Your can set the timeout and proxy via
%options
. the proxy should be set as http://username:password@server-addr:port. $soft->parse
-
Proper parsing method is selected according to the accession number of GEO record. E.g. if a GSExxx record is required, then the parsing function would choose method to parse GSExxx part and return a
Microarray::GEO::SOFT::GSE
class object. $data->meta
-
Get meta information, more detailed meta information can be get via
platform
,title
,field
,accession
. $data->platform
-
Get accession number of the platform. If a record has multiple platforms, the function return a reference of array.
$data->title
-
Title of the record
$data->field
-
Description of each field in the data matrix
$data->accession
-
Accession number for the record
$gds->id_convert($gpl, id)
-
Change the primary id for genes which always the rownames by default. Mapping information is provided in GPL record. The first argument is the GPL record corresponding to the GDS record, the id argument is from colnames in the GPL record. Use
$gpl-
>field> or$gpl-
>colnames> to find the ID names to convert. $gds->soft2exprset
-
Transform
Microarray::GEO::SOFT
class object toMicroarray::ExprSet
class object.
AUTHOR
Zuguang Gu <jokergoo@gmail.com>
COPYRIGHT AND LICENSE
Copyright 2012 by Zuguang Gu
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.12.1 or, at your option, any later version of Perl 5 you may have available.