NAME

convert_yeast_genome_version.pl

A script to convert genomic data between different S cerevisiae versions

SYNOPSIS

convert_yeast_genome_version.pl [options] <file1> <file2> ...

Options:
--in <file>
--convert <integer>
--roman
--db <text>
--version
--help

OPTIONS

The command line flags and descriptions:

--in <file>

Provide the input file to be converted. Supported files include GFF, BED, BedGraph, SGR, and custom text files with chromosome, start, stop, and/or position columns. The columns in custom text files must be labeled with a header line. BigWig files are also supported by going through a conversion step.

More than one file may be provided. Files may be gzipped.

--convert <integer>

If you happen to know the custom index number for the conversion method, then Great! you can put it here to save time. Otherwise, the program will interactively present a list of available conversions from which to select.

--roman

Boolean flag to indicate whether or not the chromosome name should be converted from Arabic (normal) numerals to Roman numerals (used by SGD). The default is false.

--db <text>

If you're converting a bigWig file, provide a database name to collect chromosome names and sizes.

--version

Print the version number.

--help

Display this POD documentation.

DESCRIPTION

This script will shift coordinates due to indels from one genome release to another. It will accept GFF, BED, and bigWig files. It was primarily designed to work with genomic data (microarray values, etc) and simple segments, not complex features. It will convert both start and stop coordinates, ignoring strand. It does not take into account changes in the sequence or coding potential.

The chromosome name may also be converted to standard Roman numerals as used by SGD. Multiple chromosome name styles are supported: chr1, chr01, chrI, 1, or I.

A new converted file will be written, maintaining the original format, with the file name appended with the new format date.

Only a limited number of version conversions are available. New conversions may be generated by comparing genomic sequence using the EMBOSS utility diffseq, with a word size of 8 or 10 bp, then doing some magic manipulations to identify the indels and calculate shifts, often with manual inspection of alignments. This is not a trivial process.

Check SGD (http://www.yeastgenome.org) to identify and obtain the available yeast genome version releases. In some cases, multiple conversions may be necessary to reach the desired version.

AVAILABLE CONVERSIONS

1) 20050806 (SGD R43) to 20070113 (SGD R55)

2) 20051106 (SGD R46) to 20070113 (SGD R55)

3) 20060204 (SGD R52) to 20070113 (SGD R55)

4) 20031001 (SGD R27, UCSC SacCer1) to 20070113 (SGD R55)

5) 200601xx (SGD R52?) to 20070113 (SGD R55)

6) 20060916 (Pugh's GeneTrack, SGD R53) to 20070113 (SGD R55)

7) 20070113 (SGD R55) to 20100109 (SGD R63)

8) 20050806 (perocchi..steinmetz 2007, SGD R43?) to 20100109 (SGD R63)

9) 20070901 (xu..steinmetz 2009, SGD R56) to 20100109 (SGD R63)

10) 20100109 (SGD R63) to 20110203 (SGD R64, UCSC SacCer3)

11) 20080606 (SGD R61, UCSC SacCer2) to 20110203 (SGD R64, UCSC SacCer3)

AUTHOR

Timothy J. Parnell, PhD
Dept of Oncological Sciences
Huntsman Cancer Institute
University of Utah
Salt Lake City, UT, 84112

This package is free software; you can redistribute it and/or modify it under the terms of the GPL (either version 1, or at your option, any later version) or the Artistic License 2.0.