NAME

Sort::MonsterSort - Sort huge collections

VERSION

0.01

WARNING

This is ALPHA release software. The interface may change.

SYNOPSIS

my $monster = Sort::MonsterSort->new;
while (<HUGEFILE>) {
    $monster->feed( $_ );
}
$monster->burp( -outfile => "sorted.txt");;

DESCRIPTION

Use Sort::MonsterSort when you have a collection which is too large to sort in-memory.

METHODS

new()

my $sorter = sub { $Sort::MonsterSort::b <=> $Sort::MonsterSort::a }';
my $monster = Sort::MonsterSort->new(
    -sortsub         => $sorter,          # default sort: standard lexical
    -working_dir     => $temp_directory,  # default: see below
    -line_separator  => $special_string,  # default: "\n"
    -cache_size      => 100_000,          # default: 10_000;
    );

Create a Sort::MonsterSort object.

-sortsub

A sorting subroutine. Avoid using a sortsub if you can -- MonsterSort is faster if it can use standard lexical sorting. Be advised that you MUST use $Sort::MonsterSort::a and $Sort::MonsterSort::b instead of $a and $b in your sub.

-working_dir

The directory where the temporary sortfiles will reside. By default, this directory is generated using File::Temp's tempdir() command.

-line_separator

The delimiter which terminates every item. MonsterSort leaves it up to you to tack the right line_separator onto each item (feeding MonsterSort is analogous to print()ing to a self-sorting filehandle). See the perlvar documentation for $/.

-cache_size

The size of MonsterSort's cache, in sortable items. Set this as high as you can for faster performance.

feed()

$monster->feed( @items );

Feed a list of sortable items to your Sort::MonsterSort object. Periodic pauses occur normally during feeding as MonsterSort digests what it has been fed so far.

burp()

while (my $stuff = $monster->burp) {
    chomp $stuff;
    &do_stuff_with( $stuff );
}
# or...
$monster->burp( -outfile => 'sorted.txt' );

If you call burp() with no arguments, it returns items one by one in sorted order. It may take a little while for the first item to appear.

If you call burp with the parameter -outfile specified, it will write all items to that outfile in sorted order.

DISCUSSION

The monstersort algorithm

Cache sortable items in memory. Every X items, sort the cache and empty it into a temporary sortfile. As sortfiles accumulate, consolidate them periodically by interleaving their contents into larger sortfiles. Use caching during the interleaving process to minimize disk I/O. Complete the sort by emptying the cache then interleaving the contents of all existing sortfiles.

BUGS

Please report any bugs or feature requests to bug-sort-monstersort@rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Sort-MonsterSort.

AUTHOR

Marvin Humphrey <marvin at rectangular dot com> http://www.rectangular.com

COPYRIGHT

Copyright (c) 2005 Marvin Humphrey. All rights reserved. This module is free software. It may be used, redistributed and/or modified under the same terms as Perl itself.