NAME

ShatterProof - a script for analyzing next-generation sequencing data

SYNOPSIS

use Shatterproof

DESCRIPTION

ShatterProof is a tool that can be used to analyze next generation sequencing data for signs of chromothripsis. Link to publication will be posted soon.

README

Input File Types ShatterProof can takes as input 4 different types of input files. See the scripts/conversion_scripts directory for some Perl scripts which will convert some common tools' output the the required input formats. =head3 Translocation Input Files (.spt) Tab delimited columns First line is header line: #chr1 start end chr2 start end quality

Example data entry line: 1 1000 2000 4 4000 5000 78

If no value is available for quality, use a “.” eg.:

1 1000 2000 4 4000 5000 .

Copy-Number Input Files (.spc) Tab delimited columns First line is header line: #chr start end number quality

Example data entry line: 12 2000 3000 2 63

If no value is available for quality, use a “.” eg.:

12 2000 3000 2 .

Loss of Heterozygozity Input Files (.spl) Tab delimited columns First line is header line: #chr start end quality

Example data entry line: 12 2000 3000 63

If no value is available for quality, use a “.” eg.:

12 2000 3000 .

Insertion Input Files (.vcf) Additionally, ShatterProof accepts insertion calls in VCF files as input. See http://www.1000genomes.org/node/101 for details on the VCF file format. ShatterProof analyzes the CHROM and POS fields of these files.

Installing ShatterProof To install this module type the following:

perl Makefile.PL
make
make test
make install

Configuring ShatterProof See the config.pl file in the scripts directory for a sample ShatterProof configuration file. $bin_size: number (integer) of base pairs to include in each bin of the sliding window analysis $localization_window_size: number (integer) of bins to include in each window of the sliding window analysis $expected_mutation_density: a reference value (double) used in determining if the concentration of translocation events on a particular chromosome is higher than expected. $collapse_regions: flag variable

value 1: merge overlapping CNV regions that have the same copy number

value 0: do not merge overlapping CNV regions that have the same copy number. 	If such regions are found an error is thrown

$outlier_deviations: the number of standard deviations away from the mean a value has to be in order to be considered non-significant. Used to identify highly mutated regions. $genome_localization_weight: weight given to the localization of mutations to one chromosome hallmark $chromosome_localization_weight: weight given to the localization of mutations to one area of a particular chromosome hallmark $cnv_weight: weight given to the concentrated CNV hallmark $translocation_weight: weight give to the concentrated translocations hallmark $insertion_breakpoint_weight: weight given the the short breakpoint insertions hallmark $loh_weight: weight given to the loss/retention of heterozygosity hallmark $tp53_mutated_weight: weight given to the TP53 mutation hallmark

Running ShatterProof From the scripts directory run execute the shatterproof.pl file using Perl. Main Usage: perl -w shatterproof.pl --cnv <dir> --trans <dir> [--insrt <dir>] [--loh <dir>] [--tp53] --config <path> --output <dir> Arguments:

--cnv Define the path to the directory containing the CNV input files --trans Define the path to the directory containing the Translocation input files --insrt Define the path to the directory containing the insertion VCF input files --loh Define the path to the directory containing the LOH input files --tp53 Indicate that TP53 should be considered mutated, regardless of data --config Define the path to the ShatterProof config file --output Define the path to the directory where output should be placed dir Path to a directory path Path to a file

PREREQUISITES strict; warnings; Carp; Switch; File::Basename; List::Util qw[min max]; Statistics::Distributions;

any

CPAN

1 POD Error

The following errors were encountered while parsing the POD:

Around line 3799:

Non-ASCII character seen before =encoding in '“.”'. Assuming UTF-8