NAME
Bio::ToolBox::Data::Feature - Objects representing rows in a data table
DESCRIPTION
A Bio::ToolBox::Data::Feature is an object representing a row in the data table. Usually, this in turn represents an annotated feature or segment in the genome. As such, this object provides convenient methods for accessing and manipulating the values in a row, as well as methods for working with the represented genomic feature.
In many cases, the row may represent a database feature. In this case, many of the methods will automatically retrieve the feature from the database for you to perform request, be it attribute lookup or score collection. the Database features typically presume working with a Bio::DB::SeqFeature::Store database.
This class should not be used directly by the user. Rather, Feature objects are generated from a Bio::ToolBox::Data::Iterator object (generated itself from the row_stream() function in Bio::ToolBox::Data), or the iterate() function in Bio::ToolBox::Data. Please see the documentation for Bio::ToolBox::Data for more information.
Example of working with a stream object.
my $Data = Bio::ToolBox::Data->new(file => $file);
# stream method
my $stream = $Data->row_stream;
while (my $row = $stream->next_row) {
# each $row is a Bio::ToolBox::Data::Feature object
# representing the row in the data table
my $value = $row->value($index);
# do something with $value
}
# iterate method
$Data->iterate( sub {
my $row = shift;
my $number = $row->value($index);
my $log_number = log($number);
$row->value($index, $log_number);
} );
METHODS
General information methods
- row_index
-
Returns the index position of the current data row within the data table. Useful for knowing where you are at within the data table.
- feature_type
-
Returns one of three specific values describing the contents of the data table inferred by the presence of specific column names. This provides a clue as to whether the table features represent genomic regions (defined by coordinate positions) or named database features. The return values include:
Methods to access row feature attributes
These methods return the corresponding value, if present in the data table, based on the column header name. For rows representing database features, the feature will be automatically retrieved from the database, and the attribute returned.
- seq_id
-
The name of the chromosome the feature is on.
- start
- end
- stop
-
The coordinates of the feature or segment. All coordinates are 1-based.
- strand
-
The strand of the feature or segment. Returns -1, 0, or 1. Default is 0.
- name
-
The display_name of the feature.
- type
-
The type of feature. Typically either primary_tag or primary_tag:source_tag. In a GFF3 file, this represents columns 3 and 2, respectively.
- id
-
Here, this represents the primary_ID in the database. Note that this number is unique to a specific database, and not portable between databases.
- length
-
The length of the feature or segment.
Accessing and setting values in the row.
- value($index)
- value($index, $new_value)
-
Returns or sets the value at a specific column index in the current data row.
- row_values
-
Returns an array or array reference representing all the values in the current data row.
Convenience Methods to database functions
The next three functions are convenience methods for using the attributes in the current data row to interact with databases. They are wrappers to methods in the Bio::ToolBox::db_helper module.
- feature
-
Returns a SeqFeature object from the database using the name and type values in the current Data table row. The SeqFeature object is requested from the database named in the general metadata. If an alternate database is desired, you should change it first using the $Data->database() method. If the feature name or type is not present in the table, then nothing is returned.
See Bio::DB::SeqFeature and Bio::SeqFeatureI for more information about working with these objects.
- segment
-
Returns a database Segment object corresponding to the coordinates defined in the Data table row. If a named feature and type are present instead of coordinates, then the feature is first automatically retrieved and a Segment returned based on its coordinates. The database named in the general metadata is used to establish the Segment object. If a different database is desired, it should be changed first using the general database() method.
See Bio::DB::SeqFeature::Segment and Bio::RangeI for more information about working with Segment objects.
- get_score(%args)
-
This is a convenience method for the Bio::ToolBox::db_helper::get_chromo_region_score() method. It will return a single score value for the region defined by the coordinates or typed named feature in the current data row. If the Data table has coordinates, then those will be automatically used. If the Data table has typed named features, then the coordinates will automatically be looked up for you by requesting a SeqFeature object from the database.
The name of the dataset from which to collect the data must be provided. This may be a GFF type in a SeqFeature database, a BigWig member in a BigWigSet database, or a path to a BigWig, BigBed, Bam, or USeq file. Additional parameters may also be specified; please see the Bio::ToolBox::db_helper:: get_chromo_region_score() method for full details.
If you wish to override coordinates that are present in the Data table, for example to extend or shift the given coordinates by some amount, then simply pass the new start and end coordinates as options to this method.
Here is an example of collecting mean values from a BigWig and adding the scores to the Data table.
my $index = $Data->add_column('MyData'); my $stream = $Data->row_stream; while (my $row = $stream->next_row) { my $score = $row->get_score( 'method' => 'mean', 'dataset' => '/path/to/MyData.bw', ); $row->value($index, $score); }
- get_position_scores(%args)
-
This is a convenience method for the Bio::ToolBox::db_helper:: get_region_dataset_hash() method. It will return a hash of positions => scores over the region defined by the coordinates or typed named feature in the current data row. The coordinates for the interrogated region will be automatically provided.
Just like the get_score() method, the dataset from which to collect the scores must be provided, along with any other optional arguments. See the documentation for the Bio::ToolBox::db_helper::get_region_dataset_hash() method for more details.
If you wish to override coordinates that are present in the Data table, for example to extend or shift the given coordinates by some amount, then simply pass the new start and end coordinates as options to this method.
Here is an example for collecting positioned scores around the 5 prime end of a feature from a BigWigSet directory.
my $stream = $Data->row_stream; while (my $row = $stream->next_row) { my %position2score = $row->get_position_scores( 'ddb' => '/path/to/BigWigSet/', 'dataset' => 'MyData', 'position' => 5, ) # do something with %position2score }
AUTHOR
Timothy J. Parnell, PhD
Dept of Oncological Sciences
Huntsman Cancer Institute
University of Utah
Salt Lake City, UT, 84112
This package is free software; you can redistribute it and/or modify it under the terms of the GPL (either version 1, or at your option, any later version) or the Artistic License 2.0.
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 129:
'=item' outside of any '=over'