NAME

gff3_to_ucsc_table.pl

A script to convert a GFF3 file to a UCSC style refFlat table

SYNOPSIS

gff3_to_ucsc_table.pl [--options...] <filename>

Options:
--in <filename>
--out <filename> 
--alias
--gz
--verbose
--version
--help

OPTIONS

The command line flags and descriptions:

--in <filename>

Specify the input GFF3 file. The file may be compressed with gzip.

--out <filename>

Specify the output filename. By default it uses the input file base name appended with '.refFlat'.

--alias

Specify that any additional aliases, including the primary_ID, should be appended to the gene name. They are concatenated using the pipe "|" symbol.

--gz

Specify whether (or not) the output file should be compressed with gzip. The default is to mimic the status of the input file

--verbose

Specify that extra information be printed as the GFF3 file is parsed.

--version

Print the version number.

--help

Display this POD documentation.

DESCRIPTION

This program will convert a GFF3 annotation file to a UCSC-style gene table, using the refFlat format. This includes transcription and translation start and stops, as well as exon start and stops, but does not include coding exon frames. See the documentation at http://genome.ucsc.edu/goldenPath/gbdDescriptionsOld.html#RefFlat for more information.

The program assumes the input GFF3 file includes standard parent->child relationships using primary IDs and primary tags, including gene, mRNA, exon, CDS, and UTRs. Non-standard genes, including non-coding RNAs, will also be processed too. Chromosomes, contigs, and embedded sequence are ignored. Non-pertinent features are safely ignored but reported. Most pragmas are ignored, except for close feature pragmas (###), which will aid in processing very large files. Multiple parentage and shared features, for example exons common to multiple alternative transcripts, are properly handled. See the documentation for the GFF3 file format at http://www.sequenceontology.org/resources/gff3.html for more information.

Previous versions of this script attempted to export in the UCSC genePredExt table format, often with inaccurate results. Users who need this format should investigate the gff3ToGenePred program available at http://hgdownload.cse.ucsc.edu/admin/exe/.

AUTHOR

Timothy J. Parnell, PhD
Howard Hughes Medical Institute
Dept of Oncological Sciences
Huntsman Cancer Institute
University of Utah
Salt Lake City, UT, 84112

This package is free software; you can redistribute it and/or modify it under the terms of the Artistic License 2.0.