NAME

process_gadfly.pl - Massage Gadfly/FlyBase GFF files into a version suitable for the Generic Genome Browser

SYNOPSIS

% process_gadfly.pl ./RELEASE2 > gadfly.gff

DESCRIPTION

This script massages the Flybase/Gadfly GFF files located at ftp://ftp.fruitfly.org/pub/genomic/gadfly/ into the "correct" version of the GFF format.

To use this script, get the Gadfly GFF distribution archive which is organized by GenBank accession unit (e.g. "RELEASE2GFF.tar.gz"). Unpacking it will yield a directory named after the release, e.g. RELEASE2.

Give that directory as the argument to this script, and capture the script's output to a file:

% process_gadfly.pl ./RELEASE2 > gadfly.gff

The gadfly.gff file can then be loaded into a Bio::DB::GFF database using the following command:

% bulk_load_gff.pl -d <databasename> gadfly.gff

The resulting database will have the following feature types (represented as "method:source"):

Component:arm              A chromosome arm
Component:scaffold	     A chromosome scaffold (accession #)
Component:gap	             A gap in the assembly
clone:clonelocator         A BAC clone
gene:gadfly                A gene accession number
transcript:gadfly          A transcript accession number
translation:gadfly         A translation
codon:gadfly               Significance unknown
exon:gadfly                An exon
symbol:gadfly              A classical gene symbol
similarity:blastn          A BLASTN hit
similarity:blastx          A BLASTX hit
similarity:sim4            EST->genome using SIM4
similarity:groupest        EST->genome using GROUPEST
similarity:repeatmasker    A repeat

SEE ALSO

Bio::DB::GFF, bulk_load_gff.pl, load_gff.pl

AUTHOR

Lincoln Stein <lstein@cshl.org>.

Copyright (c) 2002 Cold Spring Harbor Laboratory

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See DISCLAIMER.txt for disclaimers of warranty.