The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Microarray::GEO::SOFT - Reading microarray data in SOFT format from GEO database.

SYNOPSIS

  use Microarray::GEO::SOFT;
  use strict;
  
  # initialize
  my $soft = Microarray::GEO::SOFT->new; 
  
  # download
  $soft->download("GSE19513");
  $soft->download("GPL6793");
  $soft->download("GDS3718");
  
  # or else you can read local data
  $soft = Microarray::GEO::SOFT->new(file => "GSE19513.soft");
  
  # parse
  # $data would be a object of Microarray:GEO::SOFT::GSE, Microarray::GEO::SOFT::GPL
  # or Microarray::GEO::SOFT::GDS class
  my $data = $soft->parse;
  
  # meta info
  $data->meta;
  $data->title;
  $data->platform;
  $data->field;
  
  # GPL belongs to GSE
  my $gpl = $data->list("GPL")->[0];
  
  # merge GSMs belonging to a same GPL into a whole
  my $g = $data->merge->[0];
  
  # transform the uid from probe id to gene symbol
  $g->id_convert($gpl, "Gene Symbol");
  
  # transform into Microarray::ExprSet class object
  my $e = $g->soft2exprset;
  
  # eliminate the blank lines
  $e->remove_empty_feature;
  
  # make all symbols unique
  $e->unique_feature;
  
  # obtain the expression matrix
  $e->matrix;   

DESCRIPTION

GEO (Gene Expression Omnibus) is the biggest database providing gene expression profile data. This module provides method to download and parse files in GEO database and transform them into format for common usage.

There are always four type of data in GEO which are GSE, GPL, GSM and GDS.

GPL: Platform of the microarray, like Affymetrix U133A

GSM: A single microarray

GSE: A complete microarray experiment, always contains multi GSMs and multi GPLS

GDS: manually collected data sets from GSE, only 1 platform

Data stored in GEO database has several formats. We provide method to parse the most used format: SOFT formatted family files. The origin data is downloaded from GEO ftp site.

Subroutines

new("file" = $file)

Initial a Microarray::GEO::SOFT class object. The only argument is file path for the microarray data in SOFT format or a file handle that has been openned.

$soft->download(ACC, %options

Download GEO record from NCBI website. The first argument is the accession number such as (GSExxx, GPLxxx or GDSxxx). Your can set the timeout and proxy via %options. the proxy should be set as http://username:password@server-addr:port.

$soft->parse

Proper parsing method is selected according to the accession number of GEO record. E.g. if a GSExxx record is required, then the parsing function would choose method to parse GSExxx part and return a Microarray::GEO::SOFT::GSE class object.

$data->meta

Get meta information, more detailed meta information can be get via platform, title, field, accession.

$data->platform

Get accession number of the platform. If a record has multiple platforms, the function return a reference of array.

$data->title

Title of the record

$data->field

Description of each field in the data matrix

$data->accession

Accession number for the record

$gds->id_convert($gpl, id)

Change the primary id for genes which always the rownames by default. Mapping information is provided in GPL record. The first argument is the GPL record corresponding to the GDS record, the id argument is from colnames in the GPL record. Use $gpl->field> or $gpl->colnames> to find the ID names to convert.

$gds->soft2exprset

Transform Microarray::GEO::SOFT class object to Microarray::ExprSet class object.

AUTHOR

Zuguang Gu <jokergoo@gmail.com>

COPYRIGHT AND LICENSE

Copyright 2012 by Zuguang Gu

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.12.1 or, at your option, any later version of Perl 5 you may have available.

SEE ALSO

Microarray::ExprSets