NAME
process_gadfly.pl - Massage Gadfly/FlyBase GFF files into a version suitable for the Generic Genome Browser
SYNOPSIS
% process_gadfly.pl ./RELEASE2 > gadfly.gff
DESCRIPTION
This script massages the Flybase/Gadfly GFF files located at ftp://ftp.fruitfly.org/pub/genomic/gadfly/ into the "correct" version of the GFF format.
To use this script, get the Gadfly GFF distribution archive which is organized by GenBank accession unit (e.g. "RELEASE2GFF.tar.gz"). Unpacking it will yield a directory named after the release, e.g. RELEASE2.
Give that directory as the argument to this script, and capture the script's output to a file:
% process_gadfly.pl ./RELEASE2 > gadfly.gff
The gadfly.gff file can then be loaded into a Bio::DB::GFF database using the following command:
% bulk_load_gff.pl -d <databasename> gadfly.gff
The resulting database will have the following feature types (represented as "method:source"):
Component:arm A chromosome arm
Component:scaffold A chromosome scaffold (accession #)
Component:gap A gap in the assembly
clone:clonelocator A BAC clone
gene:gadfly A gene accession number
transcript:gadfly A transcript accession number
translation:gadfly A translation
codon:gadfly Significance unknown
exon:gadfly An exon
symbol:gadfly A classical gene symbol
similarity:blastn A BLASTN hit
similarity:blastx A BLASTX hit
similarity:sim4 EST->genome using SIM4
similarity:groupest EST->genome using GROUPEST
similarity:repeatmasker A repeat
SEE ALSO
Bio::DB::GFF, bulk_load_gff.pl, load_gff.pl
AUTHOR
Lincoln Stein <lstein@cshl.org>.
Copyright (c) 2002 Cold Spring Harbor Laboratory
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See DISCLAIMER.txt for disclaimers of warranty.