NAME
gadfly_to_gff.pl - Massage Gadfly's GFF format into a form suitable for Bio::DB::GFF
SYNOPSIS
perl gadfly_to_gff.pl /path/to/gadfly/release/files > gadfly.gff
DESCRIPTION
This script massages the Flybase/Gadfly GFF files located at ftp://ftp.fruitfly.org/pub/genomic/gadfly/ into the "correct" version of the GFF format for use with Bio::DB::GFF. This lets you view the Drosophila annotations with the generic genome browser (http://www.gmod.org).
To use this script, get the Gadfly GFF distribution archive which is organized by GenBank accession unit (e.g. "RELEASE2GFF.tar.gz"). Unpacking it will yield a directory named after the release, e.g. RELEASE2.
Give that directory as the argument to this script, and capture the script's output to a file:
% gadfly_to_gff.pl ./RELEASE2 > gadfly.gff
The gadfly.gff file can then be loaded into a Bio::DB::GFF database using the following command:
% bulk_load_gff.pl -d <databasename> gadfly.gff
The resulting database will have the following feature types (represented as "method:source"):
Component:arm A chromosome arm
Component:scaffold A chromosome scaffold (accession #)
Component:gap A gap in the assembly
clone:clonelocator A BAC clone
gene:gadfly A gene accession number
transcript:gadfly A transcript accession number
translation:gadfly A translation
codon:gadfly Significance unknown
exon:gadfly An exon
symbol:gadfly A classical gene symbol
similarity:blastn A BLASTN hit
similarity:blastx A BLASTX hit
similarity:sim4 EST->genome using SIM4
similarity:groupest EST->genome using GROUPEST
similarity:repeatmasker A repeat
AUTHOR
Lincoln Stein <lstein@cshl.org>