NAME
Sort::MonsterSort - Sort huge collections
VERSION
0.01
WARNING
This is ALPHA release software. The interface may change.
SYNOPSIS
my $monster = Sort::MonsterSort->new;
while (<HUGEFILE>) {
$monster->feed( $_ );
}
$monster->burp( -outfile => "sorted.txt");;
DESCRIPTION
Use Sort::MonsterSort when you have a collection which is too large to sort in-memory.
METHODS
new()
my $sorter = sub { $Sort::MonsterSort::b <=> $Sort::MonsterSort::a }';
my $monster = Sort::MonsterSort->new(
-sortsub => $sorter, # default sort: standard lexical
-working_dir => $temp_directory, # default: see below
-line_separator => $special_string, # default: "\n"
-cache_size => 100_000, # default: 10_000;
);
Create a Sort::MonsterSort object.
- -sortsub
-
A sorting subroutine. Avoid using a sortsub if you can -- MonsterSort is faster if it can use standard lexical sorting. Be advised that you MUST use $Sort::MonsterSort::a and $Sort::MonsterSort::b instead of $a and $b in your sub.
- -working_dir
-
The directory where the temporary sortfiles will reside. By default, this directory is generated using File::Temp's tempdir() command.
- -line_separator
-
The delimiter which terminates every item. MonsterSort leaves it up to you to tack the right line_separator onto each item (feeding MonsterSort is analogous to print()ing to a self-sorting filehandle). See the perlvar documentation for $/.
- -cache_size
-
The size of MonsterSort's cache, in sortable items. Set this as high as you can for faster performance.
feed()
$monster->feed( @items );
Feed a list of sortable items to your Sort::MonsterSort object. Periodic pauses occur normally during feeding as MonsterSort digests what it has been fed so far.
burp()
while (my $stuff = $monster->burp) {
chomp $stuff;
&do_stuff_with( $stuff );
}
# or...
$monster->burp( -outfile => 'sorted.txt' );
If you call burp() with no arguments, it returns items one by one in sorted order. It may take a little while for the first item to appear.
If you call burp with the parameter -outfile specified, it will write all items to that outfile in sorted order.
DISCUSSION
The monstersort algorithm
Cache sortable items in memory. Every X items, sort the cache and empty it into a temporary sortfile. As sortfiles accumulate, consolidate them periodically by interleaving their contents into larger sortfiles. Use caching during the interleaving process to minimize disk I/O. Complete the sort by emptying the cache then interleaving the contents of all existing sortfiles.
BUGS
Please report any bugs or feature requests to bug-sort-monstersort@rt.cpan.org
, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Sort-MonsterSort.
AUTHOR
Marvin Humphrey <marvin at rectangular dot com> http://www.rectangular.com
COPYRIGHT
Copyright (c) 2005 Marvin Humphrey. All rights reserved. This module is free software. It may be used, redistributed and/or modified under the same terms as Perl itself.