NAME
List::BinarySearch - Binary Search within a sorted array.
SYNOPSIS
This module performs a binary search on an array passed by reference, or on an array or list passed as a flat list.
Examples:
use List::BinarySearch qw( :all );
use List::BinarySearch qw( binsearch binsearch_pos binsearch_range );
@num_array = ( 100, 200, 300, 400, 500 );
@str_array = qw( Bach Beethoven Brahms Mozart Schubert );
# Find the lowest index of a matching element.
$index = binsearch {$a <=> $b} 300, @num_array;
$index = binsearch {$a cmp $b} 'Mozart', @str_array; # Stringy cmp.
$index = binsearch {$a <=> $b} 42, @num_array; # not found: undef
# Find the lowest index of a matching element, or best insert point.
$index = binsearch_pos {$a cmp $b} 'Chopin', @str_array; # Insert at [3].
$index = binsearch_pos 600, @num_array; # Insert at [5].
splice @num_array, $index, 1, 600
if( $num_array[$index] != 600 ); # Insertion at [5]
$index = binsearch_pos { $a <=> $b } 200, @num_array; # Matched at [1].
# The following functions return an inclusive range.
my( $low_ix, $high_ix )
= binsearch_range { $a cmp $b } 'Beethoven', 'Mozart', @str_array;
# Returns ( 1, 3 ), meaning ( 1 .. 3 ).
my( $low_ix, $high_ix )
= binsearch_range { $a <=> $b } 200, 400, @num_array;
DESCRIPTION
A binary search searches sorted lists using a divide and conquer technique. On each iteration the search domain is cut in half, until the result is found. The computational complexity of a binary search is O(log n).
The binary search algorithm implemented in this module is known as a Deferred Detection variant on the traditional Binary Search. Deferred Detection provides stable searches. Stable binary search algorithms have the following characteristics, contrasted with their unstable binary search cousins:
In the case of non-unique keys, a stable binary search will always return the lowest-indexed matching element. An unstable binary search would return the first one found, which may not be the chronological first.
Best and worst case time complexity is always O(log n). Unstable searches may find the target in fewer iterations in the best case, but in the worst case would still be O(log n). In practical terms, this difference is usually not meaningful.
Stable binary searches only require one relational comparison of a given pair of data elements per iteration, where unstable binary searches require two comparisons per iteration.
The net result is that although an unstable binary search might have a better "best case" time complexity, the fact that a stable binary search gets away with fewer comparisons per iteration gives it better performance in the worst case, and approximately equal performance in the average case. By trading away slightly better "best case" performance, the stable search gains the guarantee that the element found will always be the lowest-indexed element in a range of non-unique keys.
This module has a companion "XS" module: List::BinarySearch::XS which users are strongly encouraged to install as well. If List::BinarySearch::XS is also installed, binsearch
and binsearch_pos
will use XS code. This behavior may be overridden by setting $ENV{List_BinarySearch_PP}
to a true value.
RATIONALE
Quoting from Wikipedia: When Jon Bentley assigned it as a problem in a course for professional programmers, he found that an astounding ninety percent failed to code a binary search correctly after several hours of working on it, and another study shows that accurate code for it is only found in five out of twenty textbooks. Furthermore, Bentley's own implementation of binary search, published in his 1986 book Programming Pearls, contains an error that remained undetected for over twenty years.
So the answer to the question "Why use a module for this?" is "So that you don't have to write, test, and debug your own implementation."
Nevertheless, before using this module the user should weigh the other options: linear searches ( grep
or List::Util::first
), or hash based searches. A binary search only makes sense if the data set is already sorted in ascending order, and if it is determined that the cost of a linear search, or the linear-time conversion to a hash-based container is too inefficient or demands too much memory. So often, it just doesn't make sense to try to optimize beyond what Perl's tools natively provide.
However, there are cases where a binary search may be an excellent choice. Finding the first matching element in a list of 1,000,000 items with a linear search would have a worst-case of 1,000,000 iterations, whereas the worst case for a binary search of 1,000,000 elements is about 20 iterations. In fact, if many lookups will be performed on a seldom-changed list, the savings of O(log n) lookups may outweigh the cost of sorting or performing occasional ordered inserts.
Profile, then benchmark, then consider (and benchmark) the options, and finally, optimize.
EXPORT
Nothing is exported by default. Upon request will export binsearch
, binsearch_pos
, and binsearch_range
.
Or import all functions by specifying :all
.
SUBROUTINES/METHODS
WHICH SEARCH ROUTINE TO USE
binsearch
: Returns lowest index where match is found, or undef.binsearch_pos
: Returns lowest index where match is found, or the index of the best insert point for needle if the needle isn't found.binsearch_range
: Performs a search for both low and high needles. Returns a pair of indices refering to a range of elements corresponding to low and high needles.
binsearch CODE NEEDLE ARRAY_HAYSTACK
$first_found_ix = binsearch { $a cmp $b } $needle, @haystack;
Pass a code block, a search target, and an array to search. Uses the supplied code block $needle
to test the needle against elements in @haystack
.
See the section entitled The Callback Block, below, for an explanation of how the comparator works (hint: very similar to sort { $a <=> $b } ...
).
Return value will be the lowest index of an element that matches target, or undef if target isn't found.
binsearch_pos CODE NEEDLE ARRAY_HAYSTACK
$first_found_ix = binsearch_pos { $a cmp $b } $needle, @haystack;
The only difference between this function and binsearch
is its return value upon failure. binsearch
returns undef upon failure. binsearch_pos
returns the index of a valid insert point for $needle
.
Pass a code block, a search target, and an array to search. Uses the code block to test $needle
against elements in @haystack
.
Return value is the index of the first element equaling $needle
. If no element is found, the best insert-point for $needle
is returned.
binsearch_range CODE LOW_NEEDLE HIGH_NEEDLE ARRAY_HAYSTACK
my( $low, $high )
= binsearch_range { $a <=> $b }, $low_needle, $high_needle, @haystack;
Given $low_needle
and $high_needle
, returns a set of indices that represent the range of elements fitting within $low_needle
and $high_needle
's bounds. This might be useful, for example, in finding all transations that occurred between 02012013 and 02292013.
This algorithm was adapted from Mastering Algorithms with Perl, page 172 and 173.
The callback block (The comparator)
Comparators in List::BinarySearch are used to compare the target (needle) with individual haystack elements, returning the result of the relational comparison of the two values. A good example would be the code block in a sort
function.
Basic comparators might be defined like this:
# Numeric comparisons:
binsearch { $a <=> $b } $needle, @haystack;
# Stringwise comparisons:
binsearch { $a cmp $b } $needle, @haystack;
# Unicode Collation Algorithm comparisons
$Collator = Unicode::Collate->new;
binsearch { $Collator->( $a, $b ) } $needle, @haystack;
$a
represents the target. $b
represents the contents of the haystack element being tested. This leads to an asymmetry that might be prone to "gotchas" when writing custom comparators for searching complex data structures. As an example, consider the following data structure:
my @structure = (
[ 100, 'ape' ],
[ 200, 'cat' ],
[ 300, 'dog' ],
[ 400, 'frog' ]
);
A numeric custom comparator for such a data structure would look like this:
sub{ $a <=> $b[0] }
In this regard, the callback is unlike sort
, because sort
is always comparing to elements, whereas binsearch
is comparing a target with an element.
DEPRECATED FUNCTIONS
The following have been deprecated and should not be used. They will be eliminated in a future version of List::BinarySearch. The reason for their deprecation is that List::BinarySearch::XS provides enough performance gain as to make numeric-specialized and string-specialized versions of the binary search routine obsolete. If you need high performance, install List::BinarySearch::XS.
bsearch_custom
Replaced by binsearch. Same syntax.
bsearch_custom_pos
Replaced by binsearch_pos. Same syntax.
bsearch_custom_range
Replaced by binsearch_range. Same syntax.
bsearch_str
Instead use binsearch { $a cmp $b } $needle, @haystack;
.
bsearch_str_pos
Instead use binsearch_pos { $a cmp $b } $needle, @haystack;
.
bsearch_str_range
Instead use binsearch_range { $a cmp $b } $needle, @haystack;
.
bsearch_num
Instead use binsearch { $a <=> $b } $needle, @haystack;
.
bsearch_num_pos
Instead use binsearch_pos { $a <=> $b } $needle, @haystack;
.
bsearch_num_range
Instead use binsearch_range { $a <=> $b } $needle, @haystack;
.
bsearch_transform
Instead use binsearch { $comparator->($a,$b) } $needle, @haystack;
.
Also, the old "pass by @_" callbacks have been deprecated in favor of callbacks that use $a
and $b
. If you have been using: bsearch_custom {$_[0] <=>$_[1]} $needle, @haystack;
, instead use: binsearch { $a <=> $b } $needle, @haystack;
. It's much easier on the eyes.
DATA SET REQUIREMENTS
A well written general algorithm should place as few demands on its data as practical. The three requirements that these Binary Search algorithms impose are:
The list must be in ascending sorted order.
This is a big one. The best sort routines run in O(n log n) time. It makes no sense to sort a list in O(n log n) time, and then perform a single O(log n) binary search when List::Util
first
could accomplish the same thing in O(n) time without sorting.The list must be in ascending sorted order.
A Binary Search consumes O(log n) time. We don't want to waste linear time verifying the list is sordted, so there is no validity checking. You have been warned.
These functions are prototyped as (&$\@) or ($\@).
What this implementation detail means is that
@haystack
is implicitly passed by reference. This is the price we pay for a familiar user interface, cleaner calling syntax, and the automatic efficiency of pass-by-reference.Objects in the search lists must be capable of being evaluated for relationaity.
I threw that in for C++ folks who have spent some time with Effective STL. For everyone else don't worry; if you know how to
sort
you know how tobinsearch
.
UNICODE SUPPORT
Lists sorted according to the Unicode Collation Algorithm must be searched using the same Unicode Collation Algorithm, Here's an example using Unicode::Collate's $Collator->cmp($a,$b)
:
my $found_index = binsearch { $Collator->cmp($a, $b) } $needle, @haystack;
CONFIGURATION AND ENVIRONMENT
By installing List::BinarySearch::XS, the pure-Perl versions of binsearch
and binsearch_pos
will be automatically replaced with XS versions for markedly improved performance. binsearch_range
also benefits from the XS plug-in, since internally it makes calls to binsearch_pos
.
Users are strongly advised to install List::BinarySearch::XS. If, after installing List::BinarySearch::XS, one wishes to disable the XS plugin, setting $ENV{List_BinarySearch_PP}
to a true value will prevent the XS module from being used by List::BinarySearch. This setting will have no effect on users who use List::BinarySearch::XS directly.
For the sake of code portability, it's recommended to use List::BinarySearch, as it will automatically and portably downgrade to the pure-Perl version if the XS module can't be loaded.
DEPENDENCIES
This module uses Exporter, and automatically makes use of List::BinarySearch::XS if it's installed on the user's system.
This module is tested on Perl 5.8 and newer. It may also be compatible with Perl 5.6 in its pure-Perl form, but hasn't been tested in that environment.
INCOMPATIBILITIES
The XS plugin for this module is not compatible with Perl 5.6.
AUTHOR
David Oswald, <davido at cpan.org>
If the documentation fails to answer your question, or if you have a comment or suggestion, send me an email.
DIAGNOSTICS
BUGS AND LIMITATIONS
Please report any bugs or feature requests to bug-list-binarysearch at rt.cpan.org
, or through the web interface at http://rt.cpan.org/NoAuth/Bugs.html?Dist=List-BinarySearch. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
SUPPORT
You can find documentation for this module with the perldoc command.
perldoc List::BinarySearch
This module is maintained in a public repo at Github. You may look for information at:
Github: Development is hosted on Github at:
RT: CPAN's request tracker (report bugs here)
AnnoCPAN: Annotated CPAN documentation
CPAN Ratings
Search CPAN
ACKNOWLEDGEMENTS
Thank-you to Max Maischein (Corion) for being a willing and helpful sounding board on API issues, and for spotting some POD problems.
Mastering Algorithms with Perl, from O'Reilly: for the inspiration (and much of the code) behind the positional and ranged searches. Quoting MAwP: "...the binary search was first documented in 1946 but the first algorithm that worked for all sizes of array was not published until 1962." (A summary of a passage from Knuth: Sorting and Searching, 6.2.1.)
Necessity, who is the mother of invention. -- plato.
Although the basic idea of binary search is comparatively straightforward, the details can be surprisingly tricky... -- Donald Knuth
LICENSE AND COPYRIGHT
Copyright 2012 David Oswald.
This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.
See http://dev.perl.org/licenses/ for more information.