NAME

Microarray::ExprSet - Simple description of microarray data

SYNOPSIS

use Microarray::ExprSet;

my $mat = [[1, 2, 3, 4, 5, 6],
           [7, 8, 9, 10, 11, 12],
           [13, 14, 15, 16, 17, 18],
           [19, 20, 21, 22, 23, 24],
           [25, 26, 27, 28, 29, 30],
           [31, 32, 33, 34, 35, 36]];
my $probe = ["gene1", "gene2", "gene2", "gene3", "", "gene4"];
my $sample = ["treatment", "treatment", "treatment", "control", "control", "control"];

my $expr = Microarray::ExprSet->new();
$expr->set_matrix($mat);
$expr->set_feature($probe);
$expr->set_phenotype($sample);
# or simplified as
$expr->set_matrix($mat)->set_feature($probe)->set_phenotype($sample);

# whether the data valid
$expr->is_valid;  # 1 or 0

# do some preprocess
$expr->remove_empty_features();
# combine duplicated features, order of features is shuffled
$expr->unique_features("mean");  # you can use "median" too

# now you can get content of the object
my $new_mat = $expr->matrix;
my $new_probe = $expr->feature;
my $new_sample = $expr->phenotype;
my $n_probe = $expr->n_feature;
my $n_sample = $expr->n_phenotype;

# save into file
$expr->save("some-file");

DESCRIPTION

The Microarray::ExprSet class object describes the data structure of microarray data. It contains three elements: 1) data matrix that stores the expression value; 2) array of features that are the probe names or gene IDs; 3) array of phenotypes that are the settings of samples (e.g. control vs treatment). Other information about the microarray experiment such as the protocal or sample preparation is not included in this object. This module aims to provide the minimum information that a microarray data needs.

Usually the Microarray::ExprSet object is created by other modules such as Microarray::GEO::SOFT.

Subroutines

new

Initial or reset a Microarray::ExprSet class object.

$expr->set_matrix(MATRIX)

Argument is the expression value matrix which is stored in an array reference of array references.

$expr->set_feature(ARRAY_REF)

Set the feature names. The length of features should be equal to the number of rows of the expression value matrix. You can think each feature is a probe or a gene.

$expr->set_phenotype(ARRAY_REF)

Set the phenotype names. The length of phenotypes should be equal to the number of columns of the expression value matrix. You can think the phenotypes are the experimental sample names.

$expr->matrix

Get expression value matrix

$expr->feature

Get feature names, array reference.

$expr->phenotype

Get phenotype names, array reference.

$expr->n_feature

Get the number of features

$expr->n_phenotype

Get the number of phenotypes

$expr->is_valid

whether your object is valid. If, for some reason, the expression matrix is not a standard format of matrix, it would return 0. If feature names are defined but the length of the feature names is not identical to the number of matrix rows, it would return 0. If phenotype names are defined but the length of the phenotype names is not identical to the number of matrix columns, it would return 0.

$expr->remove_empty_features

Some features may not have names, so it is necessary to eliminate these features without any names.

$expr->unique_features('mean' | 'median')

It is usually that features are measured repeatly, especially when you map probe id to gene ID. Some analysis procedures need unified features. The argument can be set to choose the method for multiple feature merging. Note the order of arrays would be shuffled.

$expr->save(filename)

Save to file as tables.

AUTHOR

Zuguang Gu <jokergoo@gmail.com>

COPYRIGHT AND LICENSE

Copyright 2012 by Zuguang Gu

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.12.1 or, at your option, any later version of Perl 5 you may have available.

SEE ALSO

Microarray::GEO::SOFT