NAME
Bio::ToolBox::db_helper::gff3_parser
DESCRIPTION
This module parses a GFF3 file into SeqFeature objects. Children features are associated with parents as sub SeqFeature objects, assuming the Parent tag is included and correctly identifies the unique ID tag of the parent. Any feature without a Parent tag is assumed to be a parent. Children features referencing a parent feature that has not been loaded may be lost.
Embedded Fasta sequences are ignored, as are most comment and pragma lines.
Close directives (###) in the GFF3 file are highly encouraged to limit parsing, otherwise the entire file will be slurped into memory. Refer to the GFF3 definition at http://www.sequenceontology.org for more details.
The SeqFeature objects that are returned are Bio::SeqFeature::Lite objects. Refer to that documentation for more information.
SYNOPSIS
use Bio::ToolBox::db_helper::gff3_parser;
my $filename = 'file.gff3';
my $parser = Bio::ToolBox::db_helper::gff3_parser->new($filename) or
die "unable to open gff file!\n";
while (my @top_features = $parser->top_features() ) {
while (@top_features) {
my $feature = shift @top_features;
# each $feature is a Bio::SeqFeature::Lite object
my @children = $feature->get_SeqFeatures();
}
}
METHODS
- new()
- new($file)
-
Initialize a new gff3_parser object.
Optionally pass the name of the GFF3 file, and it will be automatically opened by calling parse_file().
- parse_file($file)
-
Pass the name of a GFF3 file to be parsed. The file must have a .gff or .gff3 extension, and may optionally be gzipped (.gz extension).
- fh()
- fh($filehandle)
-
This method returns the IO::File object of the opened GFF file. A new file may be parsed by passing an opened IO::File or other object that inherits IO::Handle methods.
- next_feature()
-
This method will return a Bio::SeqFeature::Lite object representation of the next feature in the file. Parent - child relationships are NOT assembled. This is best used with simple GFF files with no hierarchies present. This may be used in a while loop until the end of the file is reached. Pragmas are ignored and comment lines and sequence are automatically skipped.
- top_features()
-
This method will return an array of the top (parent) features defined in the GFF3 file. The file will be progressively parsed from the beginning until either a close features pragma (###) or the end of the file is reached. Features containing a Parent attribute are associated with the corresponding feature, if it was loaded.
When close pragmas are present in the file, call this method repeatedly to finish parsing the remainder of the file.
- from_gff3_string($string)
-
This method will parse a GFF3 formatted string or line of text and return a Bio::SeqFeature::Lite object.
- unescape($text)
-
This method will unescape special characters in a text string. Certain characters, including ";" and "=", are reserved for GFF3 formatting and are not allowed, thus requiring them to be escaped.
AUTHOR
Timothy J. Parnell, PhD
Dept of Oncological Sciences
Huntsman Cancer Institute
University of Utah
Salt Lake City, UT, 84112
This package is free software; you can redistribute it and/or modify it under the terms of the GPL (either version 1, or at your option, any later version) or the Artistic License 2.0.