NAME
Data::Range::Compare::Stream::Iterator::File::MergeSortAsc - On Disk Merge Sort for really big data sets!
SYNOPSIS
my
$iterator
=Data::Range::Compare::Stream::Iterator::File::MergeSortAsc->new(
filename
=>
'somefile.csv'
,
);
while
(
$iterator
->has_next) {
my
$next_range
=
$iterator
->get_next;
$next_range
,
"\n"
;
}
DESCRIPTION
This module Extends Data::Range::Compare::Stream::Iterator::Base and provides an on disk merge sort for objects that implement or extend Data::Range::Compare::Stream::Iterator::Base.
OO Methods
my $iterator=new Data::Range::Compare::Stream::Iterator::File::MergeSortAsc(key=>value);
Instance Constructor, all arguments are optional
At least one of the following Argument(s) is required:
filename
=>
'source_file.csv'
# the file is assumed to be an absolute or relative path to the file location.
file_list
=>[]
# An array ref of file names in absolute or relative paths
iterator_list
=>[]
# an array ref of objects that implement or extend Data::Range::Compare::Stream::Iterator::Base
Optional Arguments:
auto_prepare
=>0|1
# Default: 0, If set to 1 sort operations happen on object creation.
unlink_result_file
=>1|0
# Default: 1, If set to 0 the sorted result file will not be deleted
bucket_size
=>4000
# sets the number of ranges to be pre-sorted
# 2 buckets are created.. so the number of objects loaded into is bucked_size * 2
NEW_ITERATOR_FROM
=>
'Data::Range::Compare::Stream::Iterator::File'
# sets the file iterator object to be used when loading spooled files for merging
# make sure you load or require the object class being passed in as an argument!
NEW_ARRAY_ITERATOR_FROM
=>
'Data::Range::Compare::Stream::Iterator::Array'
# sets the array iterator class
NEW_FROM
=>
'Data::Range::Compare::Stream'
,
# depricated but still supportd, see factory_instance.
# sets the object class new ranges will be created from
# This argument is passed to objects being constructed from: NEW_ITERATOR_FROM
factory_instance
=>
$obj
# defines the object that implements the $obj->factory($start,$end,$data).
# new ranges are constructed from the factory interfcae. If a factory interface
# is not created an instance of Data::Range::Compare::Stream is assumed.
parse_line
=>
undef
|code_ref
# Default: undef, Sets the code ref to be used when parsing a line
# if not set the default internals will be used
# This argument is passed to objects being constructed from: NEW_ITERATOR_FROM
result_to_line
=>
undef
|code_ref
# Default: undef, Sets the code ref used to convert a result to a line that can be parsed
# if not set the default internals will be used
# This argument is passed to objects being constructed from: NEW_ITERATOR_FROM
sort_func
=>
undef
|code
ref
# Default: undef, Sets the code ref used for comparing objects in the sort process
# if not set the default internals are used.
tmpdir
=>
undef
|
'/some/folder'
# tmpdir is defined its value is passed to to File::Temp->new(DIR=>$self->{tmpdir});
my $class=$iterator->NEW_FROM;
Returns the Class that new Range objects are constructed from.
my $class=$iterator->NEW_ITERATOR_FROM;
$class will contain the name of the class new file Iterators are to be constructed from.
my $class=$iterator->NEW_ARRAY_ITERATOR_FROM;
$class will contain the name of the class new array Iterators are constructed from.
while($iterator->has_next) { ... }
Returns true when there are more rows to fetch.
my $result=$iterator->get_next;
Returns the next $result from the given source file.
my $line=$iterator->result_to_line($range);
Given a $result from $iterator->get_next, this interface converts the $range object into a line that can be parsed by $iterator->parse_line($line). Think of this function as a data serializer for range objects generated by an $iterator object. When overloading this function or using a call back make sure result_to_line can be parsed by parse_line.
sub
result_to_line {
my
(
$self
,
$result
)=
@_
;
return
$self
->{result_to_line}->(
$result
)
if
defined
(
$self
->{result_to_line});
my
$range
=
$result
->get_common;
my
$line
=
$range
->range_start_to_string.
' '
.
$range
->range_end_to_string.
"\n"
;
return
$line
;
}
my $ref=$iterator->parse_line($line);
Given a $line returns the arguments required to construct an object that extends or implements Data::Range::Compare::Stream. When overloading or passing in constructor arguments that provide a call back make sure result_to_line produces the expected line parse_line expects.
sub
parse_line {
my
(
$self
,
$line
)=
@_
;
return
$self
->{parse_line}->(
$line
)
if
defined
(
$self
->{parse_line});
chomp
$line
;
[
split
/\s+/,
$line
];
}
my $cmp=$iterator->sort_method($left_range,$right_range);
This is the internal object compare function used when sorting.
sub
sort_method {
my
(
$self
,
$left_range
,
$right_range
)=
@_
;
return
$self
->{sort_func}->(
$left_range
,
$right_range
)
if
$self
->{sort_func};
my
$cmp
=sort_in_consolidate_order_asc(
$left_range
->get_common,
$right_range
->get_common);
return
$cmp
;
}
SEE ALSO
Data::Range::Compare::Stream::Cookbook
AUTHOR
Michael Shipper
Source-Forge Project
As of version 0.001 the Project has been moved to Source-Forge.net
Data Range Compare https://sourceforge.net/projects/data-range-comp/
COPYRIGHT
Copyright 2011 Michael Shipper. All rights reserved.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.