NAME
Bio::ViennaNGS::Peak - An object oriented interface for characterizing peaks in RNA-seq data
SYNOPSIS
use Bio::ViennaNGS::Peak;
# get an instance of Bio::ViennaNGS::peak
my $peaks = Bio::ViennaNGS::Peak->new();
# parse coverage for [+] and [-] strand from Bio::ViennaNGS::FeatureIO objects
$peaks->populate_data($filep,$filen);
# identify regions covered by RNA-seq signal ('raw peaks')
$peaks->raw_peaks($dest,$prefix,$log);
# characterize final peaks
$peaks->final_peaks($dest,$prefix,$log);
DESCRIPTION
This module provides a Moose interface for characterization of peaks in RNA-seq coverage data.
METHODS
- populate_data
-
Title : populate_data
Usage :
$obj->populate_data($filep,$filen);
Function : Parses RNA-seq coverage for positive and negative strand into
@{$self->data}
, a Hash of Arrays data structure.Args :
$filep
and$filen
are instances of Bio::ViennaNGS::FeatureIO.Returns : None.
Notes: The memory footprint of this method is rather high. It builds a Hash of Arrays data structure from Bio::ViennaNGS::FeatureIO input objects of roughly the size of the underlying genome (chromosomes are hash keys, and there is an array containing coverage information for every genomic position referenced by hash values).
- raw_peaks
-
Title : raw_peaks
Usage :
$obj->raw_peaks($dest,$prefix,$log);
Function : This method identifies genomic regions ('raw peaks') covered by RNA-seq signal by means of a sliding window approach. RNA-seq coverage is read from
@{$self->data}
(which is populated by e.g. thepopulate_data
method). The sliding window approach processes [+] and [-] strand for all chromosomes in 5' -> 3' direction, whereby the mean value of each window is used as a representative for this window. Thereby both start and end coordinates, as well as position of the maximum elevation are identified. Here the end position of a covered region is defined as the coordinate of the window whose mean is less than a certain value (i.e.$self->threshold
* peak maximum).Raw peaks are stored in
%{$self->data}->{peaks}
.Args :
$dest
contains the output path for results,$prefix
the prefix used for all output file names.$log
is the name of a log file, or undef if no logging is reuqired.Returns : None. The output is a position-sorted BED6 file containing all raw peaks.
Notes : It is highly recommended to use normalized input data in order to allow for multiple calls of this method with the same set of parameters on different samples.
- final_peaks
-
Title : final_peaks
Usage :
$obj->final_peaks($dest,$prefix,$log);
Function : This method characterizes final peaks from RNA-seq coverage found in
%{$self->data}->{peaks}
. The latter is supposed to have been populated by$self->raw_peaks
.The procedure for finding final peaks is as follows: For each raw peak found in
%{$self->data}->{peaks}
the window of maximum coverage is retrieved and a (second) sliding window approach is then applied to regions both upstream and downstream of the maximum. Peak boundaries are set at the position where the mean coverage of the respective window is lower than$self->threshold
* peak maximum).Peaks are reported if their total length (as determined by this routine) is not longer than
$self->length
.Args :
$dest
contains the output path for results,$prefix
the prefix used for all output file names.$log
is the name of a log file, or undef if no logging is reuqired.Returns : None. The output is a position-sorted BED6 file containing all candidate peaks.
Notes :
DEPENDENCIES
SEE ALSO
AUTHOR
Michael T. Wolfinger, <michael@wolfinger.eu>
COPYRIGHT AND LICENSE
Copyright (C) 2015-2017 by Michael T. Wolfinger
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.10.0 or, at your option, any later version of Perl 5 you may have available.
This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.