NAME
Data::Range::Compare::Stream::Iterator::File::MergeSortAsc - On Disk Merge Sort for really big data sets!
SYNOPSIS
use Data::Range::Compare::Stream;
use Data::Range::Compare::Stream::Iterator::File;
use Data::Range::Compare::Stream::Iterator::File::MergeSortAsc;
my $iterator=Data::Range::Compare::Stream::Iterator::File::MergeSortAsc->new(
filename=>'somefile.csv',
);
while($iterator->has_next) {
my $next_range=$iterator->get_next;
print $next_range,"\n";
}
DESCRIPTION
This module Extends Data::Range::Compare::Stream::Iterator::Base and provides an on disk merge sort for objects that implement or extend Data::Range::Compare::Stream::Iterator::Base.
OO Methods
my $iterator=new Data::Range::Compare::Stream::Iterator::File::MergeSortAsc(key=>value);
Instance Constructor, all arguments are optional
At least one of the following Argument(s) is required:
filename=>'source_file.csv' # the file is assumed to be an absolute or relative path to the file location. file_list=>[] # An array ref of file names in absolute or relative paths iterator_list=>[] # an array ref of objects that implement or extend Data::Range::Compare::Stream::Iterator::Base
Optional Arguments:
auto_prepare=>0|1 # Default: 0, If set to 1 sort operations happen on object creation. unlink_result_file=>1|0 # Default: 1, If set to 0 the sorted result file will not be deleted bucket_size=>4000 # sets the number of ranges to be pre-sorted # 2 buckets are created.. so the number of objects loaded into is bucked_size * 2 NEW_ITERATOR_FROM=>'Data::Range::Compare::Stream::Iterator::File' # sets the file iterator object to be used when loading spooled files for merging # make sure you load or require the object class being passed in as an argument! NEW_ARRAY_ITERATOR_FROM=>'Data::Range::Compare::Stream::Iterator::Array' # sets the array iterator class NEW_FROM=>'Data::Range::Compare::Stream', # depricated but still supportd, see factory_instance. # sets the object class new ranges will be created from # This argument is passed to objects being constructed from: NEW_ITERATOR_FROM factory_instance =>$obj # defines the object that implements the $obj->factory($start,$end,$data). # new ranges are constructed from the factory interfcae. If a factory interface # is not created an instance of Data::Range::Compare::Stream is assumed. parse_line=>undef|code_ref # Default: undef, Sets the code ref to be used when parsing a line # if not set the default internals will be used # This argument is passed to objects being constructed from: NEW_ITERATOR_FROM result_to_line=>undef|code_ref # Default: undef, Sets the code ref used to convert a result to a line that can be parsed # if not set the default internals will be used # This argument is passed to objects being constructed from: NEW_ITERATOR_FROM sort_func=>undef|code ref # Default: undef, Sets the code ref used for comparing objects in the sort process # if not set the default internals are used. tmpdir=>undef|'/some/folder' # tmpdir is defined its value is passed to to File::Temp->new(DIR=>$self->{tmpdir});
my $class=$iterator->NEW_FROM;
Returns the Class that new Range objects are constructed from.
my $class=$iterator->NEW_ITERATOR_FROM;
$class will contain the name of the class new file Iterators are to be constructed from.
my $class=$iterator->NEW_ARRAY_ITERATOR_FROM;
$class will contain the name of the class new array Iterators are constructed from.
while($iterator->has_next) { ... }
Returns true when there are more rows to fetch.
my $result=$iterator->get_next;
Returns the next $result from the given source file.
my $line=$iterator->result_to_line($range);
Given a $result from $iterator->get_next, this interface converts the $range object into a line that can be parsed by $iterator->parse_line($line). Think of this function as a data serializer for range objects generated by an $iterator object. When overloading this function or using a call back make sure result_to_line can be parsed by parse_line.
sub result_to_line { my ($self,$result)=@_; return $self->{result_to_line}->($result) if defined($self->{result_to_line}); my $range=$result->get_common; my $line=$range->range_start_to_string.' '.$range->range_end_to_string."\n"; return $line; }
my $ref=$iterator->parse_line($line);
Given a $line returns the arguments required to construct an object that extends or implements Data::Range::Compare::Stream. When overloading or passing in constructor arguments that provide a call back make sure result_to_line produces the expected line parse_line expects.
sub parse_line { my ($self,$line)=@_; return $self->{parse_line}->($line) if defined($self->{parse_line}); chomp $line; [split /\s+/,$line]; }
my $cmp=$iterator->sort_method($left_range,$right_range);
This is the internal object compare function used when sorting.
sub sort_method { my ($self,$left_range,$right_range)=@_; return $self->{sort_func}->($left_range,$right_range) if $self->{sort_func}; my $cmp=sort_in_consolidate_order_asc($left_range->get_common,$right_range->get_common); return $cmp; }
SEE ALSO
Data::Range::Compare::Stream::Cookbook
AUTHOR
Michael Shipper
Source-Forge Project
As of version 0.001 the Project has been moved to Source-Forge.net
Data Range Compare https://sourceforge.net/projects/data-range-comp/
COPYRIGHT
Copyright 2011 Michael Shipper. All rights reserved.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.