NAME

get_intersecting_features.pl

A script to pull out overlapping features from the database.

SYNOPSIS

get_intersecting_features.pl [--options] <filename>

Options:
--in <filename>
--db <database>
--feature <text>
--start <integer>
--stop <integer>
--extend <integer>
--ref [start | mid]
--out <filename>
--gz
--version
--help

OPTIONS

The command line flags and descriptions:

--in <filename>

Specify an input file containing either a list of database features or genomic coordinates for which to collect data. The file should be a tab-delimited text file, one row per feature, with columns representing feature identifiers, attributes, coordinates, and/or data values. The first row should be column headers. Bed files are acceptable, as are text files generated by other BioToolBox scripts. Files may be gzipped compressed.

--db <database>

Specify the name of a Bio::DB::SeqFeature::Store annotation database from which gene or feature annotation may be derived. A database is required for generating new data files with features. This option may skipped when using coordinate information from an input file (e.g. BED file), or when using an existing input file with the database indicated in the metadata. For more information about using annotation databases, see https://code.google.com/p/biotoolbox/wiki/WorkingWithDatabases.

--feature <text>

Specify the name of the target features to search for in the database that intersect with the list of reference features. The type may be a either a GFF "type" or a "type:method" string. If not specifed, then the database will be queried for potential GFF types and a list presented to the user to select one.

--start <integer>, --stop <integer>

Optionally specify the relative start and stop positions from the 5' end (or start coordinate for non-stranded features) with which to restrict the region when searching for target features. For example, specify "--start=-200 --stop=0" to restrict to the promoter region of genes. Both positions must be specified. Default is to take the entire region of the reference feature.

--extend <integer>

Optionally specify the number of bp to extend the reference feature's region on each side. Useful when you have small reference regions and you want to include a larger search region.

--ref [start | mid]

Indicate the reference point from which to calculate the distance between the reference and target features. The same reference point is used for both features. Valid options include "start" (or 5' end for stranded features) and "mid" (for midpoint). Default is "start".

--out <filename>

Optionally specify a new filename. A standard tim data text file is written. The default is to rewrite the input file.

--gz

Specify whether the output file should (not) be compressed with gzip.

--version

Print the version number.

--help

Display the POD documentation

DESCRIPTION

This program will take a list of reference features and identify target features which intersect them. The reference features may be either named features (name and type) or genomic regions (chromosome, start, stop). By default, the search region for each reference feature is the entire feature, but may be restricted or expanded in size with appropriate modifiers (--start, --stop, --extend). The target features are specifed as specific types.

Several attributes of the found features are appended to the original input file data. First, the number of target features are reported. If more than one are found, the feature with the most overlap with the reference feature is preferentially listed. The name, type, and strand of the selected target feature is reported. Finally, the distance from the reference feature to the target feature is reported. The reference points for measuring the distance is by default the start or 5' end of the features, or optionally the midpoints. Note that the distance measurement is relative to the coordinates after adjustment with the --start, --stop, and --extend options.

A standard tim data text file is written.

AUTHOR

Timothy J. Parnell, PhD
Howard Hughes Medical Institute
Dept of Oncological Sciences
Huntsman Cancer Institute
University of Utah
Salt Lake City, UT, 84112

This package is free software; you can redistribute it and/or modify it under the terms of the GPL (either version 1, or at your option, any later version) or the Artistic License 2.0.