NAME

get_actual_nuc_sizes.pl

A script to pull out actual nucleosome fragments and enumerate their sizes.

SYNOPSIS

get_actual_nuc_sizes.pl --in <file1.txt> --bam <file2.bam> [--options]

Options:
--in <filename>
--bam <filename.bam>
--min <integer>
--max <integer>
--win <integer>
--at
--gff
--type <gff_type>
--source <gff_source>
--out <filename>
--version
--help

OPTIONS

The command line flags and descriptions:

--in <filename>

Specify the file name of a nucleosome data file generated by the script map_nucleosomes.pl. Other data files will likely not work.

--bam <filename>

Specify the file name of a binary BAM file containing the original paired-end sequence alignment pairs representing nucleosome fragments. The file should be sorted and indexed.

--min <integer>

Optionally specify the minimum size of fragment to include when determing fragment lengths.

--max <integer>

Optionally specify the maximum size of fragment to include when determing fragment lengths.

--win <integer>

Optionally specify the window size when searching for corresponding sequence alignment pairs. The window is determined as the mapped nucleosome midpoint +/- the specified value. The default value is the calculated fuzziness value determined when mapping the nucleosome.

--at

Boolean option to indicate that only fragments whose paired sequence reads end in a [AT] nucleotide should be included in the output GFF file. Micrococcal nuclease (MNase) cuts (almost?) exclusively at AT dinucleotides; this option ensures that the fragment is more likely derived from a MNase cut. Default is false where all fragments are taken.

--gff

Indicate whether a GFF file should be written in addition to the standard text data file. The GFF file version is 3. Default is false (no GFF written).

--type <gff_type>

Provide the text to be used as the GFF type (or method) used in writing the GFF file. The default value is the Sequence Ontology term 'histone_binding_site'.

--source <gff_source>

Provide the text to be used as the GFF source used in writing the GFF file. The default value is the name of this program.

--out <filename>

Provide a new output file name. By default it overwrites the input file.

--version

Print the version number.

--help

Display the POD documentation

DESCRIPTION

This program will determine actual nucleosome fragment sizes based on the original paired-end sequence alignments. It searches a BAM file for all aligned read-pairs whose midpoints are within a specific window centered around the midpoint of a mapped nucleosome. The program accepts as input data the mapped nucleosomes identified using the script map_nucleosomes.pl. The window size may be specified explicitly, or by default it uses the fuzziness value identified in the mapping program. Once the read-pairs are identified, then the mean length of all fragments is determined. The nucleosme start and stop coordinates are then updated to accurately reflect the real length. The midpoint coordinate is not updated, nor is it checked for accuracy; it is assumed to be accurately mapped.

In addition to updating the start and stop coordinates, three additional columns of data are appended to the data table. These include the count of sequence read-pair fragments, and the standard deviation of the fragment lengths. In addition to writing a new data file, it can optionally write a GFF3 file.

AUTHOR

Timothy J. Parnell, PhD
Howard Hughes Medical Institute
Dept of Oncological Sciences
Huntsman Cancer Institute
University of Utah
Salt Lake City, UT, 84112

This package is free software; you can redistribute it and/or modify it under the terms of the GPL (either version 1, or at your option, any later version) or the Artistic License 2.0.