NAME
Data::Range::Compare::Stream - Compute intersections of Ranges with Iterators
SYNOPSIS
use Data::Range::Compare::Stream;
use Data::Range::Compare::Stream::Iterator::Array;
use Data::Range::Compare::Stream::Iterator::Consolidate;
use Data::Range::Compare::Stream::Iterator::Compare::Asc;
# create the iterator for column_a's Consolidation iterator
my $column_a=Data::Range::Compare::Stream::Iterator::Array->new();
$column_a->create_range(3,11);
$column_a->create_range(17,19);
# create the iterator for column_b's Consolidation iterator
my $column_b=Data::Range::Compare::Stream::Iterator::Array->new();
$column_b->create_range(0,0);
$column_b->create_range(1,3);
$column_b->create_range(5,7);
$column_b->create_range(6,9);
$column_b->create_range(11,15);
$column_b->create_range(17,33);
# sort columns a and be in consolidate order
$column_a->prepare_for_consolidate_asc;
$column_b->prepare_for_consolidate_asc;
# create the consolidator object for column_a our iterator to it
my $column_a_consolidator=Data::Range::Compare::Stream::Iterator::Consolidate->new($column_a);
# create the consolidator object for column_b our iterator to it
my $column_b_consolidator=Data::Range::Compare::Stream::Iterator::Consolidate->new($column_b);
# create the object that will compare columns a and b
my $compare=new Data::Range::Compare::Stream::Iterator::Compare::Asc;
# add column a for processing
$compare->add_consolidator($column_a_consolidator);
# add column b for processing
$compare->add_consolidator($column_b_consolidator);
# now we can compute the intersections of our objects
while($compare->has_next) {
# fetch our current result object
my $row=$compare->get_next;
# if no ranges overlap with this row move on
next if $row->is_empty;
# now we can output the current range
my $common_range=$row->get_common;
my $overlap_count=$row->get_overlap_count;
print "A total of: [$overlap_count] Ranges intersected with Common range: $common_range\n";
my $overlap_ids=$row->get_overlap_ids;
for each my $consolidator_id (@{$overlap_ids}) {
if($consolidator_id==0) {
my $result=$row->get_consolidator_result_by_id($consolidator_id);
print " Column a contained the following overlaps $result\n";
} elsif($consolidator_id==1) {
my $result=$row->get_consolidator_result_by_id$consolidator_id);
print " Column b contained the following overlaps $result\n";
}
}
print "\n";
}
DESCRIPTION
This library implements an algorithm that can be used to compute gaps and intersections across multiple sets of 2 dimensional ranges, from both a vertical and horizontal perspective.
Overview
Given 3 complex sets of data we will outline the process used to compute the horizontal intersections and vertical gaps in these sets of ranges. See Figure 1 for the example sets of data.
Figure 1 ( Example Data )
Numeric Range set: A
+----------+
| 1 - 11 |
| 13 - 44 |
| 17 - 23 |
| 55 - 66 |
+----------+
Numeric Range set: B
+----------+
| 0 - 1 |
| 2 - 29 |
| 88 - 133 |
+----------+
Numeric Range set: C
+-----------+
| 17 - 29 |
| 220 - 240 |
| 241 - 250 |
+-----------+
Looking over data sets A,B, and C we can see several problems: Set A contains ranges that overlap with each other, and sets A through C do not align on consistent boundaries.
Part of the process of comparing sets of data involves making sure there are no duplicates or overlaps in each individual source of data. The first action taken is the removal of duplicate and consolidation of overlapping ranges, this process can be seen by looking at the resulting conversion for "Numeric Range set: A" seen in see: Figure 2. The overlapping range "17 - 23" is contained by a larger range in the set "13 - 44" thus "17 - 33" needs to be removed.
Figure 2 ( Numeric range set: A, post consolidation )
Consolidated Numeric Range set: A
+----------+
| 1 - 11 |
| 13 - 44 |
| 55 - 66 |
+----------+
The next step in the comparison process is iterating through our data and figuring out where the gaps are in each data set. Each set of data contains gaps between ranges, and those gaps need be calculated in order for a proper comparison of the data to begin.
Figure 3 ( Numeric Ranges, gaps filled )
Numeric Range set: A
+----------+
| 1 - 11 |
12 - 12 -- No Data
| 13 - 44 |
45 - 54 -- No Data
| 55 - 66 |
+----------+
Numeric Range set: B
+----------+
| 0 - 1 |
| 2 - 29 |
30 - 87 -- No Data
| 88 - 133 |
+----------+
Numeric Range set: C
+-----------+
| 17 - 29 |
30 - 219 -- No Data
| 220 - 240 |
| 241 - 250 |
+-----------+
The intersecting range represents overlapping points between ranges or the "Common Range". The concept is not just limited to missing columns, the algorithm itself will also compute missing rows. In Figure 4 row 0 the Common Range is defined as "0 - 0", this is because the only column with overlapping data is in "Numeric Range B" 0 - 1. 0 - 1 also overlaps with "Numeric Column A: 1 - 11", this means the first common range is "0 - 0", and the second common range is defined as "1 - 1". The relation ship of the common range is defined not just by the current ranges being compared, but also by the next common range to be computed as well.
Three result rows stand out became they contain no data in sets A,B or C: "45 - 54","67 - 87","134 - 219" these 3 rows represent a horizontal gap in our 3 sources of data, and can be exclude from the result set to make the data more consistent for some one reviewing just the intersections of the data from sets A,B, and C see: Figure 5.
Figure 4 ( Results )
+--------------------------------------------------------------------+
| Common Range | Numeric Range A | Numeric Range B | Numeric Range C |
+--------------------------------------------------------------------+
| 0 - 0 | No Data | 0 - 1 | No Data |
| 1 - 1 | 1 - 11 | 0 - 1 | No Data |
| 2 - 11 | 1 - 11 | 2 - 29 | No Data |
| 12 - 12 | No Data | 2 - 29 | No Data |
| 13 - 16 | 13 - 44 | 2 - 29 | No Data |
| 17 - 29 | 13 - 44 | 2 - 29 | 17 - 29 |
| 30 - 44 | 13 - 44 | No Data | No Data |
| 45 - 54 | No Data | No Data | No Data |
| 55 - 66 | 55 - 66 | No Data | No Data |
| 67 - 87 | No Data | No Data | No Data |
| 88 - 133 | No Data | 88 - 133 | No Data |
| 134 - 219 | No Data | No Data | No Data |
| 220 - 240 | No Data | No Data | 220 - 240 |
| 241 - 250 | No Data | No Data | 241 - 250 |
+--------------------------------------------------------------------+
Looking at the results form the compare process we can see the "Common Range" explains the overlap. Figure 4 shows not only sets of data we have, but it also shows sets of data that we don't have: these additional ranges are a bi-product of the comparison process and have been filtered out of the result set in Figure 5.
Figure 5 ( Filtered results )
+--------------------------------------------------------------------+
| Common Range | Numeric Range A | Numeric Range B | Numeric Range C |
+--------------------------------------------------------------------+
| 0 - 0 | No Data | 0 - 1 | No Data |
| 1 - 1 | 1 - 11 | 0 - 1 | No Data |
| 2 - 11 | 1 - 11 | 2 - 29 | No Data |
| 12 - 12 | No Data | 2 - 29 | No Data |
| 13 - 16 | 13 - 44 | 2 - 29 | No Data |
| 17 - 29 | 13 - 44 | 2 - 29 | 17 - 29 |
| 30 - 44 | 13 - 44 | No Data | No Data |
| 55 - 66 | 55 - 66 | No Data | No Data |
| 88 - 133 | No Data | 88 - 133 | No Data |
| 220 - 240 | No Data | No Data | 220 - 240 |
| 241 - 250 | No Data | No Data | 241 - 250 |
+--------------------------------------------------------------------+
The final stage, only ranges derived from our original source data is presented and we have excluded any rows that contain none of our original data.
Getting Started
The internals of Data::Range::Compare::Stream only support dealing with integers by default: This section covers how to Extend Data::Range::Compare::Stream to support your data types.
Creating IPV4 Range Support See:
Data::Range::Compare::Stream::Cookbook::COMPARE_IPV4
Creating DateTime Range Support See:
Data::Range::Compare::Stream::Cookbook::COMPARE_DateTime
OO Methods
This section covers the OO Methods in the package.
my $range=new Data::Range::Compare::Stream($range_start,$range_end);
my $range=new Data::Range::Compare::Stream($range_start,$range_end,$data);
Object constructor:
Creates a new instance of Data::Range::Compare::Stream Arguments an their meanings: $range_start -- Required Represents the start of this given range $range_end -- Required Represents the end of this range. $data -- Optional Used to tag this range with your data
my $value=$range->range_start
Returns the object that represents the start of this range.
my $string=$range->range_start_to_string
Returns a string that represents the start of the range.
my $value=$range->range_end
Returns the object that represents the end of the rage.
my $string=$range->range_end_to_string;
Returns a string that represents the end of the range.
my $new_value=$range->sub_one($value);
Computes and returns the object that came before this $value
my $new_value=$range->add_one($value)
Computes and returns the object that comes after this $value
my $cmp=$range->cmp_values($value_a,$value_b)
Returns -1,0,1 similar to <=> or cmp.
my $next_range_start_value=$range->next_range_start
Returns the starting value of the range that will come after this range.
my $previous_range_end_value=$range->previous_range_end
Returns a value that represents the end of the range that precedes this one.
my $data=$range->data($optional_value);
Used to get and set the data value for this range. Sets when called with an argument Example: $range->data('some value'); Gets the current data value when called without any arguments Example: my $value=$range->data;
my $class=$range->NEW_FROM_CLASS;
Returns the name of the class new objects will be constructed from.
my $new_range=$get_common_range([$range_a,$range_b,$range_c]);
Given an array reference of ranges that overlap $new_range will be the smallest intersecting range;
my $new_range=$range->get_overlapping_range([$range_a,$range_b,$range_c]);
Given an array reference of ranges: $new_range will contain all of the ranges listed in the array reference
my $cmp=$range_a->cmp_range_start($range_b);
Compares the starting values of $range_a and $range_b Returns: -1 0 1,see perlop: <=> or cmp
my $cmp=$range_a->cmp_range_end($range_b);
Compares the ending values of $range_a and $range_b Returns: -1 0 1, see perlop: <=> or cmp
my $cmp=$range_a->cmp_range_start_to_range_end($range_b);
Compares the start of $range_a to the end of $range_b Returns: -1 0 1, see perlop: <=> or cmp
if($range->contains_value($value)) { do something }
Returns true if $range contains $value
if($range_a->contiguous_check($range_b)) { do something }
Returns true if $range_a is immediately followed by $range_b.
my $cmp=$range_a->cmp_ranges($range_b);
Compares $range_a to $range_b in Ascending order. Returns: -1 0 1, see perlop: <=> or cmp
if($range_a->overlap($range_b) { do something }
Returns true if $range_a overlaps with $range_b
my ($start,$end)=Data::Range::Compare::Stream->find_smallest_outer_ranges($array_ref_ranges);
Returns the smallest outter most ranges as $start and $end
SEE ALSO
Data::Range::Compare::Stream::Cookbook
AUTHOR
Michael Shipper
Source-Forge Project
As of version 0.001 the Project has been moved to Source-Forge.net
Data Range Compare https://sourceforge.net/projects/data-range-comp/
COPYRIGHT
Copyright 2011 Michael Shipper. All rights reserved.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.