NAME

Bio::ViennaNGS::Expression - An object oriented interface for read-count based gene expression

SYNOPSIS

use Bio::ViennaNGS::Expression;

my $expression = Bio::ViennaNGS::Expression->new();

# parse read counts from an extended BED12 file
$expression->>parse_readcounts_bed12("$bed12");

# compute normalized expression of ith sample in Transcript per Million (TPM)
$expression->computeTPM($i, $readlength);

# write extended BED12 file with TPM for each condition past
# the 12th column
$expression->>write_expression_bed12("TPM", $dest, $basename);

DESCRIPTION

This module provides a Moose interface for computation of gene / transcript expression from read counts.

METHODS

parse_readcounts_bed12
Title : parse_readcounts_bed12

Usage : $obj->parse_readcounts_bed12($file)

Function: Parses a bedtools multicov (multiBamCov) file, i.e. an
          extended BED12 file, into an Array of Hash of Hashes data
          structure (C<@{$self->data}>).

 Args : C<$file> is the input file, i.e. and extended BED12 file
        where each column past the 12th lists read counts for this
        bedline's feature(s) for a specific sample/condition.

 Returns :

 Notes: This method evaluates the number of samples/conditions
        present in the input, i.e. the number of columns extending
        the canonical BED12 columns in the input multicov file and
        populates C<$self->conds>. Also populates
        C<$self->nr_features> with the number of genes/features
        present in the input (evidently, this should be the same for
        each sample/condition in the input).
computeTPM
Title : computeTPM

Usage : $obj->computeTPM($sample, $readlength)

Function : Computes expression of each gene/feature present in
           C<$self->data> in Transcript per Million (TPM) [Wagner
           et.al. Theory Biosci. (2012)].  is a reference
           to a Hash of Hashes data straucture where keys are feature
           names and values hold a hash that must at least contain
           length and raw read counts. Practically,
           C<$featCount_sample> is represented by _one_ element of
           C<@featCount>, which is populated from a multicov file by
           C<parse_multicov()>.

 Args : C<$sample> is the sample index of C<@{$self->data}>. This is
        especially handy if one is only interested in computing
        normalized expression values for a specific sample, rather
        than all samples in multicov BED12 file. C<$readlength> is
        the read length of the RNA-seq sequencing experiment.

 Returns : Returns the mean TPM of the processed sample, which is
           invariant among samples. (TPM models relative molar
           concentration and thus fulfills the invariant average
           criterion.)
write_expression_bed12
Title : write_expression_bed12

Usage : $obj->write_expression_bed12($measure, $dest, $basename)

Function : Writes normalized expression data to a bedtools multicov
           (multiBamCov)-type BED12 file.

Args : C<$measure> specifies the type in which normalized expression
       data from C<@{$self->data}> is dumped, i.e. TPM or
       RPKM. These values must have been computed and inserted into
       C<@{self->data}> beforehand by
       e.g. C<$self->computeTPM()>. C<$dest> and C<$base_name> give
       path and base name of the output file, respectively.

Returns : None. The output is position-sorted extended BED12 file.

DEPENDENCIES

Moose
Carp
Path::Class
namespace::autoclean

SEE ALSO

Bio::ViennaNGS
Bio::ViennaNGS::Bed
Bio::ViennaNGS::Util

AUTHOR

Michael T. Wolfinger, <michael@wolfinger.eu>

COPYRIGHT AND LICENSE

Copyright (C) 2015 by Michael T. Wolfinger

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.16.3 or, at your option, any later version of Perl 5 you may have available.

This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.