NAME

get_feature_info.pl

A script to collect feature information from a BioPerl SeqFeature::Store db.

SYNOPSIS

get_feature_info.pl <filename>

Options:
--in <filename> 
--db <name>
--attrib <attribute1,attribute2,...>
--type <primary_tag>
--out <filename>
--gz
--version
--help

Attributes include:
 Chromosome
 Start
 Stop
 Strand
 Score
 Name
 Alias
 Note
 Type
 Primary_tag
 Source
 Length
 Midpoint
 Phase
 RNA_count
 Exon_count
 Transcript_length (sum of exon lengths)
 Parent (name)
 Primary_ID
 <tag>

OPTIONS

The command line flags and descriptions:

--in <filename>

Specify the file name of a previously generated feature dataset. It should be in the tim data format that is generated by this program and others, although other tab-delimited text data formats may be usable. See the file description in Bio::ToolBox::file_helper.

--db <name>

Specify the name of the BioPerl gff database to use as source. This is required for new feature data files. For pre-existing input data files, this argument is optional, but if given it overrides the database listed in the file; this is useful for collecting data from multiple databases.

--attrib <attribute>

Specify the attribute to collect for each feature. Standard GFF attributes may be collected, as well as values from specific group tags. These tags are found in the group (ninth) column of the source GFF file. Standard attributes include the following

- Chromosome
- Start
- Stop
- Strand
- Score
- Name
- Alias
- Note
- Type
- Primary_tag
- Source
- Length
- Midpoint
- Phase
- RNA_count (number of RNA subfeatures)
- Exon_count (number of exons, or CDS, subfeatures)
- Transcript_length
- Parent (name)
- Primary_ID
- <tag>

If attrib is not specified on the command line, then an interactive list will be presented to the user for selection. Especially useful when you can't remember the feature's tag keys in the database.

--type <primary_tag>

When the input file does not have a type column, a type or primary_tag may be provided. This is especially useful to restrict the database search when there are multiple features with the same name.

--out <filename>

Optionally specify an alternate output file name. The default is to overwrite the input file.

--gz

Indicate whether the output file should (not) be compressed by gzip. If compressed, the extension '.gz' is appended to the filename. If a compressed file is opened, the compression status is preserved unless specified otherwise.

--version

Print the version number.

--help

Display this help.

DESCRIPTION

This program will collect attributes for a list of features from the database. The attributes may be general attributes, such as chromsome, start, stop, strand, etc., or feature specific attributes stored in the original group field of the original source GFF file.

AUTHOR

Timothy J. Parnell, PhD
Howard Hughes Medical Institute
Dept of Oncological Sciences
Huntsman Cancer Institute
University of Utah
Salt Lake City, UT, 84112

This package is free software; you can redistribute it and/or modify it under the terms of the GPL (either version 1, or at your option, any later version) or the Artistic License 2.0.