NAME
data2gff.pl
A script to convert data into a frequency distribution, useful for graphing.
SYNOPSIS
data2frequency.pl --bins <integer> --size <number> <filename>
data2frequency.pl --bins <integer> --max <number> <filename>
data2frequency.pl --size <number> --max <number> <filename>
Options:
--in <filename>
--bins <integer>
--size <number>
--index <list|range>
--min <number>
--max <number>
--out <filename>
--version
--help
OPTIONS
The command line flags and descriptions:
- --in <filename>
-
Specify the file name of an input data file. The file should be a tab-delimited text file. The file may be compressed with gzip.
- --bins <integer>
-
Specify the number of bins or partitions into which the data will be grouped. This argument is optional if --max and --size are provided.
- --size <number>
-
Specify the size of each bin or partition. A decimal number may be provided. This argument is optional if --bins and --max are provided.
- --min <number>
-
Optionally indicate the minimum value of the bins. When generating the list of bins, this is used as the starting value. All values less than this value will be ignored. The default is 0. A negative number may be provided using the format --min=-1.
- --max <number>
-
Specify the maximum bin value. All values greater than this value will be ignored. This argument is optional if --bins and --size are provided.
- --index <list|range>
-
Specify the datasets in the input data file to be converted to a distribution. The 0-based column number of the datasets should be provided. Multiple datasets may be provided as a comma-delimited list, as a consecutive list (start-stop), or a combination of both. Do not include spaces! If no datasets are provided, the program will interactively present to the user a list of possible datasets to convert.
- --out <filename>
-
Specify the output file name. The default is to take the input file base name and append '_frequency' to it. The file format is a tim data file.
- --version
-
Print the version number.
- --help
-
Display this help
DESCRIPTION
This program will convert a datasets in a data file into a distribution. This may then be used to conveniantly plot a histogram using a program such as 'graph_profile.pl'.
Set the distribution parameters using the --bins and --binsize arguments, which set the number of bins and the size of each bin, respectively. The start number and maximum bin value may be optionally set as well.
One or more datasets within the data file may be converted. These may be specified on the command line or chosen interactively from a list presented to the user.
A data text file will be written as output. The bin values are listed as the first column, and the number of datapoints within each bin are listed in subsequent columns for each dataset requested.
AUTHOR
Timothy J. Parnell, PhD
Howard Hughes Medical Institute
Dept of Oncological Sciences
Huntsman Cancer Institute
University of Utah
Salt Lake City, UT, 84112
This package is free software; you can redistribute it and/or modify it under the terms of the GPL (either version 1, or at your option, any later version) or the Artistic License 2.0.