NAME
Data::Range::Compare::Stream - Compute intersections of Ranges with Iterators
SYNOPSIS
use Data::Range::Compare::Stream;
use Data::Range::Compare::Stream::Iterator::Array;
use Data::Range::Compare::Stream::Iterator::Consolidate;
use Data::Range::Compare::Stream::Iterator::Compare::Asc;
# create the iterator for column_a's Consolidation iterator
my $column_a=Data::Range::Compare::Stream::Iterator::Array->new();
$column_a->create_range(3,11);
$column_a->create_range(17,19);
# create the iterator for column_b's Consolidation iterator
my $column_b=Data::Range::Compare::Stream::Iterator::Array->new();
$column_b->create_range(0,0);
$column_b->create_range(1,3);
$column_b->create_range(5,7);
$column_b->create_range(6,9);
$column_b->create_range(11,15);
$column_b->create_range(17,33);
# sort columns a and be in consolidate order
$column_a->prepare_for_consolidate_asc;
$column_b->prepare_for_consolidate_asc;
# create the consolidator object for column_a our iterator to it
my $column_a_consolidator=Data::Range::Compare::Stream::Iterator::Consolidate->new($column_a);
# create the consolidator object for column_b our iterator to it
my $column_b_consolidator=Data::Range::Compare::Stream::Iterator::Consolidate->new($column_b);
# create the object that will compare columns a and b
my $compare=new Data::Range::Compare::Stream::Iterator::Compare::Asc;
# add column a for processing
$compare->add_consolidator($column_a_consolidator);
# add column b for processing
$compare->add_consolidator($column_b_consolidator);
# now we can compute the intersections of our objects
while($compare->has_next) {
# fetch our current result object
my $row=$compare->get_next;
# if no ranges overlap with this row move on
next if $row->is_empty;
# now we can output the current range
my $common_range=$row->get_common;
my $overlap_count=$row->get_overlap_count;
print "A total of: [$overlap_count] Ranges intersected with Common range: $common_range\n";
my $overlap_ids=$row->get_overlap_ids;
for each my $consolidator_id (@{$overlap_ids}) {
if($consolidator_id==0) {
my $result=$row->get_consolidator_result_by_id($consolidator_id);
print " Column a contained the following overlaps $result\n";
} elsif($consolidator_id==1) {
my $result=$row->get_consolidator_result_by_id$consolidator_id);
print " Column b contained the following overlaps $result\n";
}
}
print "\n";
}
DESCRIPTION
This library implements an algorithom that can be used to compute gaps and intersections accross multiple sets of 2 dimensional ranges, from both a vertical and horizontal perspective.
OverView
Given 3 complex sets of data we will outline the process used to compute the horizontal intersections and vertical gaps in these sets of ranges. See Figure 1 for the exaple sets of data.
Figure 1 ( Example Data )
Numeric Range set: A
+----------+
| 1 - 11 |
| 13 - 44 |
| 17 - 23 |
| 55 - 66 |
+----------+
Numeric Range set: B
+----------+
| 0 - 1 |
| 2 - 29 |
| 88 - 133 |
+----------+
Numeric Range set: C
+-----------+
| 17 - 29 |
| 220 - 240 |
| 241 - 250 |
+-----------+
Looking over data sets A,B, and C we can ses several problems: Set A contains ranges that overlap with each oher, and sets A through C do not align on consistent boundries.
Part of the process of comparing sets of data involves making sure there are no duplicates or overlaps in each individual source of data. The first action taken is the removal of duplicate and consolidation of overlapping ranges, this process can be seen by looking at the resulting conversion for "Numeric Range set: A" seen in see: Figure 2. THe overlapping range "17 - 23" is containted by a larger range in the set "13 - 44" thus "17 - 33" needs to be removed.
Figure 2 ( Numeric range set: A, post consolidation )
Consolidated Numeric Range set: A
+----------+
| 1 - 11 |
| 13 - 44 |
| 55 - 66 |
+----------+
The next step in the comparison process is iterating through our data and figuring out where the gaps are in each data set. Each set of data contains gaps between ranges, and those gaps need be calculated in order for a proper comparison of the data to begin.
Figure 3 ( Numeric Ranges, gaps filled )
Numeric Range set: A
+----------+
| 1 - 11 |
12 - 12 -- No Data
| 13 - 44 |
45 - 54 -- No Data
| 55 - 66 |
+----------+
Numeric Range set: B
+----------+
| 0 - 1 |
| 2 - 29 |
30 - 87 -- No Data
| 88 - 133 |
+----------+
Numeric Range set: C
+-----------+
| 17 - 29 |
30 - 219 -- No Data
| 220 - 240 |
| 241 - 250 |
+-----------+
The intersecting range represents overlapping points between ranges or the "Common Range". The concept is not just limited to misling columns, the algorithom itself will also compute missing rows. In Figure 4 row 0 the Common Range is defined as "0 - 0", this is becase the only column with overlapping data is in "Numeric Range B" 0 - 1. 0 - 1 also overlaps with "Numeric Column A: 1 - 11", this means the first common range is "0 - 0", and the second common range is defined as "1 - 1". The relationa ship of the common range is defined not just by he current ranges being compared, but also by the next common range to be computed as well.
Figre 4 ( Results )
+--------------------------------------------------------------------+
| Common Range | Numeric Range A | Numeric Range B | Numeric Range C |
+--------------------------------------------------------------------+
| 0 - 0 | No Data | 0 - 1 | No Data |
| 1 - 1 | 1 - 11 | 0 - 1 | No Data |
| 2 - 11 | 1 - 11 | 2 - 29 | No Data |
| 12 - 12 | No Data | 2 - 29 | No Data |
| 13 - 16 | 13 - 44 | 2 - 29 | No Data |
| 17 - 29 | 13 - 44 | 2 - 29 | 17 - 29 |
| 30 - 44 | 13 - 44 | No Data | No Data |
| 45 - 54 | No Data | No Data | No Data |
| 55 - 66 | 55 - 66 | No Data | No Data |
| 67 - 87 | No Data | No Data | No Data |
| 88 - 133 | No Data | 88 - 133 | No Data |
| 134 - 219 | No Data | No Data | No Data |
| 220 - 240 | No Data | No Data | 220 - 240 |
| 241 - 250 | No Data | No Data | 241 - 250 |
+--------------------------------------------------------------------+
Looking at the reults form the compare process we can see the "Common Range" explains the overlap. Figure 4 shows not only sets of data we have, but it also shows sets of data that we don't have: these aditional ranges are a bi-product of the comparison process and can be filtered out before presentation.
Figure 5 ( Filtered results )
+--------------------------------------------------------------------+
| Common Range | Numeric Range A | Numeric Range B | Numeric Range C |
+--------------------------------------------------------------------+
| 0 - 0 | No Data | 0 - 1 | No Data |
| 1 - 1 | 1 - 11 | 0 - 1 | No Data |
| 2 - 11 | 1 - 11 | 2 - 29 | No Data |
| 12 - 12 | No Data | 2 - 29 | No Data |
| 13 - 16 | 13 - 44 | 2 - 29 | No Data |
| 17 - 29 | 13 - 44 | 2 - 29 | 17 - 29 |
| 30 - 44 | 13 - 44 | No Data | No Data |
| 55 - 66 | 55 - 66 | No Data | No Data |
| 88 - 133 | No Data | 88 - 133 | No Data |
| 220 - 240 | No Data | No Data | 220 - 240 |
| 241 - 250 | No Data | No Data | 241 - 250 |
+--------------------------------------------------------------------+
The final stage, only ranges dirived from our original source data is presented and we have excluded any rows that contain none of our original data.
Getting Started
The internals of Data::Range::Compare::Stream only support dealing with integers by default: This section covers how to Extend Data::Range::Compare::Stream to support your data types.
Creating IPV4 Range Support See:
Data::Range::Compare::Stream::Cookbook::COMPARE_IPV4
Creating DateTime Range Support See:
Data::Range::Compare::Stream::Cookbook::COMPARE_DateTime
OO Methods
This section covers the OO Methods in the package.
my $range=new Data::Range::Compare::Stream($range_start,$range_end);
my $range=new Data::Range::Compare::Stream($range_start,$range_end,$data);
Object constructor:
Creates a new instance of Data::Range::Compare::Stream Arguments an their meanings: $range_start -- Required Represents the start of this given range $range_end -- Required Represents the end of this range. $data -- Optional Used to tag this range with your data
my $value=$range->range_start
Returns the object that represents the start of this range.
my $string=$range->range_start_to_string
Returns a string that represents the start of the range.
my $value=$range->range_end
Returns the object that represents the end of the rage.
my $string=$range->range_end_to_string;
Returns a string that represents the end of the range.
my $new_value=$range->sub_one($value);
Computes and returns the object that came before this $value
my $new_value=$range->add_one($value)
Computes and returns the object that comes after this $value
my $cmp=$range->cmp_values($value_a,$value_b)
Returns -1,0,1 similar to <=> or cmp.
my $next_range_start_value=$range->next_range_start
Returns the starting value of the range that will come after this range.
my $previous_range_end_value=$range->previous_range_end
Returns a value that represents the end of the range that precedes this one.
my $data=$range->data($optional_value);
Used to get and set the data value for this range. Sets when called with an argument Example: $range->data('some value'); Gets the current data value when called without any arguments Example: my $value=$range->data;
my $class=$range->NEW_FROM_CLASS;
Returns the name of the class new objects will be constructed from.
my $new_range=$get_common_range([$range_a,$range_b,$range_c]);
Given an array reference of ranges that overlap $new_range will be the smallest intersecting range;
my $new_range=$range->get_overlapping_range([$range_a,$range_b,$range_c]);
Given an array reference of ranges: $new_range will contain all of the ranges listed in the array reference
my $cmp=$range_a->cmp_range_start($range_b);
Compares the starting values of $range_a and $range_b Returns: -1 0 1,see perlop: <=> or cmp
my $cmp=$range_a->cmp_range_end($range_b);
Compares the ending values of $range_a and $range_b Returns: -1 0 1, see perlop: <=> or cmp
my $cmp=$range_a->cmp_range_start_to_range_end($range_b);
Compares the start of $range_a to the end of $range_b Returns: -1 0 1, see perlop: <=> or cmp
if($range->contains_value($value)) { do something }
Returns true if $range contains $value
if($range_a->contiguous_check($range_b)) { do something }
Returns true if $range_a is immediately followed by $range_b.
my $cmp=$range_a->cmp_ranges($range_b);
Compares $range_a to $range_b in Ascending order. Returns: -1 0 1, see perlop: <=> or cmp
if($range_a->overlap($range_b) { do something }
Returns true if $range_a overlaps with $range_b
my ($start,$end)=Data::Range::Compare::Stream->find_smallest_outer_ranges($array_ref_ranges);
Returns the smallest outter most ranges as $start and $end
SEE ALSO
Data::Range::Compare::Stream::Cookbook
AUTHOR
Michael Shipper
Source-Forge Project
As of version 0.001 the Project has been moved to Source-Forge.net
Data Range Compare https://sourceforge.net/projects/data-range-comp/
COPYRIGHT
Copyright 2011 Michael Shipper. All rights reserved.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.