NAME

Algorithm::BinarySearch::Vec - binary search functions and basic set operations for vec()-vectors with fast XS implementations

SYNOPSIS

use Algorithm::BinarySearch::Vec;

##-------------------------------------------------------------
## Constants
my $NOKEY   = $Algorithm::BinarySearch::Vec::KEY_NOT_FOUND;
my $is_fast = $Algorithm::BinarySearch::Vec::HAVE_XS;

##-------------------------------------------------------------
## Search: element-wise
$index = vbsearch   ($v,$key,$nbits,$lo,$hi);	##-- match only
$index = vbsearch_lb($v,$key,$nbits,$lo,$hi);	##-- lower bound
$index = vbsearch_ub($v,$key,$nbits,$lo,$hi);	##-- upper bound

##-------------------------------------------------------------
## Search: array-wise
$indices = vabsearch   ($v,\@keys,$nbits,$lo,$hi); ##-- match only
$indices = vabsearch_lb($v,\@keys,$nbits,$lo,$hi); ##-- lower bound
$indices = vabsearch_ub($v,\@keys,$nbits,$lo,$hi); ##-- upper bound

##-------------------------------------------------------------
## Search: vector-wise
$ixvec = vvbsearch   ($v,$keyvec,$nbits,$lo,$hi); ##-- match only
$ixvec = vvbsearch_lb($v,$keyvec,$nbits,$lo,$hi); ##-- lower bound
$ixvec = vvbsearch_ub($v,$keyvec,$nbits,$lo,$hi); ##-- upper bound

##-------------------------------------------------------------
## Set Operations
$cv = vunion($av,$bv,$nbits);          ##-- set union
$cv = vintersect($av,$bv,$nbits);      ##-- set intersection
$cv = vsetdiff($av,$bv,$nbits);        ##-- set difference

##-------------------------------------------------------------
## Debugging
$val  = vget($vec,$i,$nbits);
undef = vset($vec,$i,$nbits, $newval);
$vals = vec2array($vec,$nbits);

DESCRIPTION

The Algorithm::BinarySearch::Vec perl module provides binary search functions and basic set operations for vec()-vectors, including fast XS implementations in the package Algorithm::BinarySearch::Vec::XS. The XS implementations are used by default if available, otherwise pure-perl fallbacks are provided. You can check whether the XS implementations are available on your system by examining the boolean scalar $Algorithm::BinarySearch::Vec::HAVE_XS.

Data Conventions

All API functions provided by this module assume that the elements of the vec()-style vector arguments are sorted in strictly ascending order. The user is responsible for assuring that this is the case, since no additional checking is done by this module.

Exports

This module support the following export tags:

:api

Exports all API functions (default).

:const

Exports the following constant(s):

$KEY_NOT_FOUND

Constant returned by search functions indicating that the requested key was not found, or the requested bound is not within the data vector.

:debug

Exports debugging subs for the XS module (vget(), vset()).

:all

Exports everything available.

Search: element-wise

vbsearch($v,$key,$nbits,?$ilo,?$ihi)

Binary search for $key in the vec()-style vector $v, which contains elements of $nbits bits each, sorted in ascending order. $ilo and $ihi if specified are indices to limit the search. $ilo defaults to 0, $ihi defaults to (8*$nbits/bytes::length($v)), i.e. the entire vector is to be searched. Returns the index $i of an element in $v matching $key (vec($v,$i,$nbits)==$key, with ($ilo <= $i < $ihi)), or $KEY_NOT_FOUND if no such element exists.

vbsearch_lb($v,$key,$nbits,?$ilo,?$ihi)

Binary search for the lower-bound of $key in the vec()-style vector $v. Arguments are as for vbsearch().

Returns the maximum index $i such that vec($v,$i,$nbits) <= $key and vec($v,$j,$nbits) < $key for all $j with $ilo <= $j < $i, or $KEY_NOT_FOUND if no such $i exists (i.e. if vec($v,$ilo,$nbits) > $key). In other words, returns the least index of a match for $key in $v whenever a match exists, otherwise the greatest index whose value in $v is strictly less than $key if that exists, and $KEY_NOT_FOUND if all values in $v are strictly greater than $key.

This is equivalent to (but usually much faster than):

return $KEY_NOT_FOUND if (vec($v,$ilo,$nbits) > $key);
for (my $i=$ilo; $i < $ihi; $i++) {
  return $i   if (vec($v,$i,$nbits) == $key);
  return $i-1 if (vec($v,$i,$nbits) >  $key);
}
return ($ihi-1);

Note that the semantics of this function differ substantially from the C++ STL function lower_bound().

vbsearch_ub($v,$key,$nbits,?$ilo,?$ihi)

Binary search for the upper-bound of $key in the vec()-style vector $v. Arguments are as for vbsearch().

Returns the minimum index $i such that vec($v,$i,$nbits) >= $key and vec($v,$j,$nbits) > $key for all $j with $i < $j < $ihi, or $KEY_NOT_FOUND if no such $i exists (i.e. if vec($v,$ihi-1,$nbits) < $key). In other words, returns the greatest index of a match for $key in $v whenever a match exists, otherwise the least index whose value in $v is strictly greater than $key if that exists, and $KEY_NOT_FOUND if all values in $v are strictly less than $key.

This is equivalent to (but usually much faster than):

return $KEY_NOT_FOUND if (vec($v,$ihi-1,$nbits) < $key);
for ($i=$ihi-1; $i >= 0; $i--) {
  return $i   if (vec($v,$i,$nbits) == $key)
  return $i+1 if (vec($v,$i,$nbits) <  $key)
}
return $ilo;

Note that the semantics of this function differ substantially from the C++ STL function upper_bound().

Search: array-wise

vabsearch($v,\@keys,$nbits,?$ilo,?$ihi)

Binary search for each value in the ARRAY-ref \@keys in the vec()-style vector $v. Other arguments are as for vbsearch(). Returns an ARRAY-ref of indices. This is equivalent to (but usually much faster than):

$indices = [map {vbsearch($v,$_,$nbits,$ilo,$ihi)} @keys];
vabsearch_lb($v,\@keys,$nbits,?$ilo,?$ihi)

Binary search for the lower-bound of each value in the ARRAY-ref \@keys in the vec()-style vector $v. Other arguments are as for vbsearch(). Returns an ARRAY-ref of indices. This is equivalent to (but usually much faster than):

$indices = [map {vbsearch_lb($v,$_,$nbits,$ilo,$ihi)} @keys];
vabsearch_ub($v,\@keys,$nbits,?$ilo,?$ihi)

Binary search for the upper-bound of each value in the ARRAY-ref \@keys in the vec()-style vector $v. Other arguments are as for vbsearch(). Returns an ARRAY-ref of indices. This is equivalent to (but usually much faster than):

$indices = [map {vbsearch_ub($v,$_,$nbits,$ilo,$ihi)} @keys];

Search: vec-wise

vvbsearch($v,$keyvec,$nbits,?$ilo,?$ihi)

Binary search for each key in the key-vector $keyvec in the "haystack"-vector $v. Other arguments are as for vbsearch(). Returns a vec()-vector of 32-bit indices. This is equivalent to (but usually faster than):

$ixvec = pack('N*', @{vabsearch($v,vec2array($keyvec,$nbits),$nbits,$ilo,$ihi)});
vvbsearch_lb($v,$keyvec,$nbits,?$ilo,?$ihi)

Binary lower-bound search for each key in the key-vector $keyvec in the "haystack"-vector $v. Other arguments are as for vbsearch(). Returns a vec()-vector of 32-bit indices. This is equivalent to (but usually faster than):

$ixvec = pack('N*', @{vabsearch_lb($v,vec2array($keyvec,$nbits),$nbits,$ilo,$ihi)});
vvbsearch_ub($v,$keyvec,$nbits,?$ilo,?$ihi)

Binary upper-bound search for each key in the key-vector $keyvec in the "haystack"-vector $v. Other arguments are as for vbsearch(). Returns a vec()-vector of 32-bit indices. This is equivalent to (but usually faster than):

$ixvec = pack('N*', @{vabsearch_ub($v,vec2array($keyvec,$nbits),$nbits,$ilo,$ihi)});

Set Operations

The set operations supported by this module assume that the vec()-style vector sets are sorted in ascending order, contain no duplicates, and are encoded with $nbits >= 8; i.e. every element-boundary must lie on a byte-boundary. The vector-sets returned by these API functions should also conform to these conventions whenever the parameters do.

vunion($av,$bv,$nbits)

Computes the union of two sorted vec()-style sets $av and $bv and returns the result as a sorted vector-set. Complexity is O($a + $b)>.

vintersect($av,$bv,$nbits)

Computes the intersection of two sorted vec()-style sets $av and $bv and returns the result as a sorted vector-set. Complexity is O($A * log $B), where $A is the shorter and $B the longer of the argument vectors $a and $b.

vsetdiff($av,$bv,$nbits)

Computes the difference of two sorted vec()-style sets $av and $bv and returns the result as a sorted vector-set. Complexity is O($A * log $B), where $A is the shorter and $B the longer of the argument vectors $a and $b.

Debugging and Miscellaneous

vget($vec,$i,$nbits)

Debugging XS-wrapper equivalent to vec($vec,$i,$nbits).

vset($vec,$i,$nbits,$newval)

Debugging XS-wrapper equivalent to vec($vec,$i,$nbits)=$newval.

vec2array($vec,$nbits)

Debugging utility, equivalent to

[map {vec($vec,$_,$nbits)} (0..(length($vec)*8/$nbits-1))]

SEE ALSO

vec() in perlfunc(1), PDL(3perl), perl(1).

AUTHOR

Bryan Jurish <moocow@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2012-2016 by Bryan Jurish

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.10.1 or, at your option, any later version of Perl 5 you may have available.