NAME
List::MergeSorted::XS - Fast merger of presorted lists
SYNOPSIS
use List::MergeSorted::XS 'merge';
# merge plain integer lists
@lists = ([1, 3, 5], [2, 6, 8], [4, 7, 9]);
merge(\@lists); # [1..9]
# return only beginning of union
merge(\@lists, limit => 4); # [1..4]
# remove duplicates
@lists = ([1, 2], [0, 2, 3], [3, 4]);
merge(\@lists, uniq_cb => sub { $_[0] }); # [0..4]
# merge complicated objects based on accompanying integer keys
@lists = ([
[1 => 'x'], [3 => {t => 1}]
],
[
[2 => bless {}, 'C'], [4 => 5]
]);
$sorted = merge(\@lists, key => sub { $_[0][0] });
DESCRIPTION
This module takes a set of presorted lists and returns the sorted union of those lists.
To maximize speed, an appropriate algorithm is chosen heuristically based on the size of the input data; additionally, efficient C implementations of the algorithms are used.
FUNCTIONS
- $merged = merge(\@list_of_lists, %opts)
-
Computes the sorted union of a set of lists of integers. The first parameter must be an array reference which itself contains a number of array references. The result set is returned in an array reference.
The constituent lists must meet these preconditions for correct behavior:
either each element of each list is an integer or an integer may be computed from the element using the
key_cb
parameter beloweach list is pre-sorted in ascending order
merge
's behavior may be modified by additional options passed after the list:limit
Specifies a maximum number of elements to return. By default all elements are returned.
key_cb
Specifies a callback routine which will be passed an element of an inner list through @_. The routine must return the integer value by which the element will be sorted. In effect, the default callback is
sub {$_[0]}
. This allows more complicated structures to be merged.uniq_cb
Specifies a callback routine which will be passed an element of an inner list through @_. The routine must return the integer value which identifies the element in some sense. Elements with the same identity value will not be duplicated in the output. Elements with the same identity must also have the same key.
If no uniq_cb is passed, duplicates are allowed in the output.
method
Specifies the algorithm to use to merge the lists. Is provided, the value must be one of the constants listed below under ALGORITHM.
If no method is given, one is chosen automatically based upon the input data. This is generally recommended.
NOTES
Only ascending order is supported. To merge lists which are sorted in descending order, use key_cb => sub { -$_[0] }
.
EXPORTS
None by default, merge
at request.
ALGORITHM
The algorithm used to merge the lists is heuristically chosen based on the number of lists (N), the total number of elements in the lists (M), and the requested limit (L). (The heuristic constants were determined by analysis of benchmarks on a 2.5GHz Intel Xeon where all data fit in memory.)
When there are many lists and the element limit is a significant fraction of the total element count (L/M > 1/4), perl's built-in sort
is used to order the concatenated lists. The time complexity is O(M log M)
. Since this method always processes the full list, it cannot short-circuit in the highly-limited case (as the priority queue methods do).
When L is a smaller fraction of M, a priority queue is used to track the list heads. For small N, this is implemented as a linked list kept in sorted order (using linear-time insertion), yielding time complexity of O(L N)
. For large N, a Fibonacci heap is used, for a time complexity of O(L log N)
. The linked list has less bookkeeping overhead than the heap, so it is more efficient for fewer lists.
To force a particular implementation, pass the method
parameter to merge
with one of these constants:
List::MergeSorted::XS::SORT
List::MergeSorted::XS::PRIO_LINEAR
List::MergeSorted::XS::PRIO_FIB
TODO
* Support comparative orderings, where no mapping from elements to integers exists but a well-defined ordering exists for which a two-element comparison callback can be provided.
* Allow modification of the heuristics (perhaps based on local benchmarks).
AUTHOR
Adam Thomason, <athomason@cpan.org>
COPYRIGHT AND LICENSE
Copyright (C) 2010 by Say Media Inc <cpan@saymedia.com>
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.9 or, at your option, any later version of Perl 5 you may have available.