buildGFF3FromEnsembl.pl [-h|--f] [--output <output_file>] [--est] <genome>
The mandatory argument is a genome which is indexed in Ensembl GB.
For example:
'Homo Sapiens' for Human (default),
'Pan troglodytes' for Chimpanzee,
'Mus musculus' for Mouse,
'Macaca mulatta' for Macaque,
'Pongo pygmaeus' for Orangutan,
etc (cf http://www.ensembl.org/info/about/species.html)
--output: put the filename to write the gff3 output (STDOUT by default)
--est: build GFF3 from Ensembl API with OtherFeatures DB (Core DB by default)
OPTIONS
-h, --help, --fullhelp
--output=I<output_file>
--est
make a GFF3 file on <output_file>
column 1: <seqname>
The name of the sequence. Commonly, this is the chromosome ID or
contig ID. Note that the coordinates used must be unique within
each sequence name in all GTFs for an annotation set.
column 2: <source>
The source column should be a unique label indicating where the
annotations came from Ensembl.
column 3: <feature>
exon, cds, five, three, gene or mRNA
column 4: <start exon>
Start coordinates of the feature relative to the beginning of the
sequence named in <seqname>.
column 5: <end exon>
End coordinates of the feature relative to the beginning of the
sequence named in <seqname>.
column 6: <score>
.
column 7: <strand>
strand of the exon relative to the genome, ie - or +
column 8: <frame>
.
column 9: a list of binome <key "value"> separated by a semicolon ";".
A GFF file has the same three mandatory attributes at the end
of the record (Note that other attributes are optional):
-ID=value A globally unique identifier for the feature.
-Parent=value1,...,valueN A list of identifier(s) for the parent(s) of the feature.
-Name=value The HGNC name of the gene
This script define the following attributes:
-transcripts_nb=value The number of transcripts contained in the gene
-exons_nb=value The number of exons contained in the transcript/gene
-exon_rank=value The rank of the exon contained in the gene
-type "prefix:value" The nature of the mRNA where the "prefix"
represents a first class level (protein_coding,
small_ncRNA, lincRNA, other_lncRNA, other_noncodingRNA)
and "value" is the biotype defined by Ensembl.
REQUIRES
Perl5.
Bio::EnsEMBL
Getopt::Long
Pod::Usage
AUTHOR
Nicolas PHILIPPE <nicolas.philippe@inserm.fr>
Module Install Instructions
To install CracTools, copy and paste the appropriate command in to your terminal.