NAME

gadfly_to_gff.pl - Massage Gadfly's GFF format into a form suitable for Bio::DB::GFF

SYNOPSIS

perl gadfly_to_gff.pl /path/to/gadfly/release/files > gadfly.gff

DESCRIPTION

This script massages the Flybase/Gadfly GFF files located at ftp://ftp.fruitfly.org/pub/genomic/gadfly/ into the "correct" version of the GFF format for use with Bio::DB::GFF. This lets you view the Drosophila annotations with the generic genome browser (http://www.gmod.org).

To use this script, get the Gadfly GFF distribution archive which is organized by GenBank accession unit (e.g. "RELEASE2GFF.tar.gz"). Unpacking it will yield a directory named after the release, e.g. RELEASE2.

Give that directory as the argument to this script, and capture the script's output to a file:

% gadfly_to_gff.pl ./RELEASE2 > gadfly.gff

The gadfly.gff file can then be loaded into a Bio::DB::GFF database using the following command:

% bulk_load_gff.pl -d <databasename> gadfly.gff

The resulting database will have the following feature types (represented as "method:source"):

Component:arm              A chromosome arm
Component:scaffold	     A chromosome scaffold (accession #)
Component:gap	             A gap in the assembly
clone:clonelocator         A BAC clone
gene:gadfly                A gene accession number
transcript:gadfly          A transcript accession number
translation:gadfly         A translation
codon:gadfly               Significance unknown
exon:gadfly                An exon
symbol:gadfly              A classical gene symbol
similarity:blastn          A BLASTN hit
similarity:blastx          A BLASTX hit
similarity:sim4            EST->genome using SIM4
similarity:groupest        EST->genome using GROUPEST
similarity:repeatmasker    A repeat

AUTHOR

Lincoln Stein <lstein@cshl.org>