NAME
Microarray::ExprSet - Simple description of microarray data
SYNOPSIS
use Microarray::ExprSet;
my $mat = [[1, 2, 3, 4, 5, 6],
[7, 8, 9, 10, 11, 12],
[13, 14, 15, 16, 17, 18],
[19, 20, 21, 22, 23, 24],
[25, 26, 27, 28, 29, 30],
[31, 32, 33, 34, 35, 36]];
my $probe = ["gene1", "gene2", "gene2", "gene3", "", "gene4"];
my $sample = ["treatment", "treatment", "treatment", "control", "control", "control"];
my $expr = Microarray::ExprSet->new();
$expr->set_matrix($mat);
$expr->set_feature($probe);
$expr->set_phenotype($sample);
# or simplified as
$expr->set_matrix($mat)->set_feature($probe)->set_phenotype($sample);
# whether the data valid
$expr->is_valid; # 1 or 0
# do some preprocess
$expr->remove_empty_features();
# combine duplicated features, order of features is shuffled
$expr->unique_features("mean"); # you can use "median" too
# now you can get content of the object
my $new_mat = $expr->matrix;
my $new_probe = $expr->feature;
my $new_sample = $expr->phenotype;
my $n_probe = $expr->n_feature;
my $n_sample = $expr->n_phenotype;
# save into file
$expr->save("some-file");
DESCRIPTION
The Microarray::ExprSet
class object describes the data structure of microarray data. It contains three elements: 1) data matrix that stores the expression value; 2) array of features that are the probe names or gene IDs; 3) array of phenotypes that are the settings of samples (e.g. control vs treatment). Other information about the microarray experiment such as the protocal or sample preparation is not included in this object. This module aims to provide the minimum information that a microarray data needs.
Usually the Microarray::ExprSet
object is created by other modules such as Microarray::GEO::SOFT.
Subroutines
new
-
Initial or reset a
Microarray::ExprSet
class object. $expr->set_matrix(MATRIX)
-
Argument is the expression value matrix which is stored in an array reference of array references.
$expr->set_feature(ARRAY_REF)
-
Set the feature names. The length of features should be equal to the number of rows of the expression value matrix. You can think each feature is a probe or a gene.
$expr->set_phenotype(ARRAY_REF)
-
Set the phenotype names. The length of phenotypes should be equal to the number of columns of the expression value matrix. You can think the phenotypes are the experimental sample names.
$expr->matrix
-
Get expression value matrix
$expr->feature
-
Get feature names, array reference.
$expr->phenotype
-
Get phenotype names, array reference.
$expr->n_feature
-
Get the number of features
$expr->n_phenotype
-
Get the number of phenotypes
$expr->is_valid
-
whether your object is valid. If, for some reason, the expression matrix is not a standard format of matrix, it would return 0. If feature names are defined but the length of the feature names is not identical to the number of matrix rows, it would return 0. If phenotype names are defined but the length of the phenotype names is not identical to the number of matrix columns, it would return 0.
$expr->remove_empty_features
-
Some features may not have names, so it is necessary to eliminate these features without any names.
$expr->unique_features('mean' | 'median')
-
It is usually that features are measured repeatly, especially when you map probe id to gene ID. Some analysis procedures need unified features. The argument can be set to choose the method for multiple feature merging. Note the order of arrays would be shuffled.
$expr->save(filename)
-
Save to file as tables.
AUTHOR
Zuguang Gu <jokergoo@gmail.com>
COPYRIGHT AND LICENSE
Copyright 2012 by Zuguang Gu
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.12.1 or, at your option, any later version of Perl 5 you may have available.