NAME
List::SkipList - Perl implementation of skip lists
REQUIREMENTS
The following non-standard modules are used:
enum
Carp::Assert is no longer required. However, the assertions can be uncommented for debugging.
SYNOPSIS
my $list = new List::SkipList();
$list->insert( 'key1', 'value' );
$list->insert( 'key2', 'another value' );
$value = $list->find('key2');
$list->delete('key1');
DESCRIPTION
This is an implementation of skip lists in Perl.
Skip lists are similar to linked lists, except that they have random links at various levels that allow searches to skip over sections of the list, like so:
4 +---------------------------> +----------------------> +
| | |
3 +------------> +------------> +-------> +-------> +--> +
| | | | | |
2 +-------> +--> +-------> +--> +--> +--> +-------> +--> +
| | | | | | | | |
1 +--> +--> +--> +--> +--> +--> +--> +--> +--> +--> +--> +
A B C D E F G H I J NIL
A search would start at the top level: if the link to the right exceeds the target key, then it descends a level.
Skip lists generally perform as well as balanced trees for searching but do not have the overhead with respect to inserting new items. See the included file Benchmark.txt
for a comparison of performance with other Perl modules.
For more information on skip lists, see the "SEE ALSO" section below.
Only alphanumeric keys are supported "out of the box". To use numeric or other types of keys, see "Customizing the Node Class" below.
Methods
A detailed description of the methods used is below.
- new
-
$list = new SkipList();
Creates a new skip list.
If you need to use a different node class for using customized comparison routines, you will need to specify a different class:
$list = new SkipList( node_class => 'MyNodeClass' );
See the "Customizing the Node Class" section below.
Specialized internal parameters may be configured:
$list = new SkipList( max_level => 32 );
Defines a different maximum list level, or "max_level". (The default is 32.) It is generally a good idea to leave this value alone unless you are using small lists.
The initial list (see the "list" method) will be a random number of levels, and will increase over time if inserted nodes have higher levels.
You can also control the probability used to determine level sizes for each node by setting the P value:
$list = new SkipList( p => 0.25 );
The value defaults to
0.25
, which appears to be optimial. Smaller values may reduce the average number of pointers per node but also reduce the efficiency of the algorithm.For more information on what these values mean, consult the references below in the "SEE ALSO" section.
- insert
-
$list->insert( $key, $value );
Inserts a new node into the list.
You may also use a search finger with insert, provided that the finger is for a key that occurs earlier in the list:
$list->insert( $key, $value, $finger );
Using fingers for inserts is not recommended since there is a risk of producing corrupted lists.
- exists
-
if ($list->exists( $key )) { ... }
Returns true if there exists a node associated with the key, false otherwise.
This may also be used with search fingers:
if ($list->exists( $key, $finger )) { ... }
- find_with_finger
-
$value = $list->find_with_finger( $key );
Searches for the node associated with the key, and returns the value. If the key cannot be found, returns
undef
.Search fingers may also be used:
$value = $list->find_with_finger( $key, $finger );
To obtain the search finger for a key, call "find_with_finger" in a list context:
($value, $finger) = $list->find_with_finger( $key );
- find
-
$value = $list->find( $key ); $value = $list->find( $key, $finger );
Searches for the node associated with the key, and returns the value. If the key cannot be found, returns
undef
.This method is slightly faster than "find_with_finger" since it does not return a search finger when called in list context.
- search
-
Search is an alias to "find".
- first_key
-
$key = $list->first_key;
Returns the first key in the list.
If called in a list context, will return a search finger:
($key, $finger) = $list->first_key;
A call to "first_key" implicitly calls "reset".
- next_key
-
$key = $list->next_key( $last_key );
Returns the key following the previous key. List nodes are always maintained in sorted order.
Search fingers may also be used to improve performance:
$key = $list->next_key( $last_key, $finger );
If called in a list context, will return a search finger:
($key, $finger) = $list->next_key( $last_key, $finger );
If no arguments are called,
$key = $list->next_key;
then the value of "last_key" is assumed:
$key = $list->next_key( $list->last_key );
- next
-
($key, $value) = $list->next( $last_key, $finger );
Returns the next key-value pair.
$last_key
and$finger
are optional.This is an autoloading method.
- last_key
-
$key = $list->last_key; ($key, $finger, $value) = $list->last_key;
Returns the last key or the last key and finger returned by a call to "first_key" or "next_key".
Deletions and inserts will invalidate the "last_key" value, although they may not "reset" the last key.
Values for "last_key" can also be set by including parameters, however this feature is meant for internal use only:
$list->last_key( $key, $finger, $value );
No checking is done to make sure that the
$key
and$value
pairs match, or that the$finger
is valid. - reset
-
$list->reset;
Resets the "last_key" to
undef
. - delete
-
$value = $list->delete( $key );
Deletes the node associated with the key, and returns the value. If the key cannot be found, returns
undef
.Search fingers may also be used:
$value = $list->delete( $key, $finger );
Calling "delete" in a list context will not return a search finger.
- clear
-
$list->clear;
Erases existing nodes and resets the list.
- size
-
$size = $list->size;
Returns the number of nodes in the list.
- copy
-
$list2 = $list1->copy;
Makes a copy of a list. The "p", "max_level" and node class are copied, although the exact structure of node levels is not copied.
This is an autoloading method.
- merge
-
$list1->merge( $list2 );
Merges two lists. If both lists share the same key, then the valie from
$list1
will be used.Both lists should have the same node class.
This is an autoloading method.
- append
-
$list1->append( $list2 );
Appends
$list2
after$list1
. The last key of$list1
must be less than the first key of$list2
.Both lists should have the same node class.
This method affects both lists. The "header" of the last node of
$list1
points to the first node of$list2
, so changes to one list may affect the other list.If you do not want this entanglement, use the "merge" or "copy" methods instead:
$list1->merge( $list2 );
or
$list1->append( $list2->copy );
This is an autoloading method.
- least
-
($key, $value) = $list->least;
Returns the least key and value in the list, or
undef
if the list is empty.This is an autoloading method.
- greatest
-
($key, $value) = $list->greatest;
Returns the greatest key and value in the list, or
undef
if the list is empty.This is an autoloading method.
- keys
-
@keys = $list->keys;
Returns a list of keys (in sorted order).
This is an autoloading method.
- values
-
@values = $list->values;
Returns a list of values (corresponding to the keys returned by the "keys" method).
This is an autoloading method.
Internal Methods
Internal methods are documented below. These are intended for developer use only. These may change in future versions.
- _search_with_finger
-
($node, $finger, $cmp) = $list->_search_with_finger( $key );
Searches for the node with a key. If the key is found, that node is returned along with a "header". If the key is not found, the previous node from where the node would be if it existed is returned.
Note that the value of
$cmp
$cmp = $node->key_cmp( $key )
is returned because it is already determined by "_search".
Search fingers may also be specified:
($node, $finger, $cmp) = $list->_search_with_finger( $key, $finger );
Note that the "header" is actually a search finger.
- _search
-
($node, $finger, $cmp) = $list->_search( $key, [$finger] );
Same as "_search_with_finger", only that a search finger is not returned. (Actually, an initial "dummy" finger is returned.)
This is useful for searches where a finger is not needed. The speed of searching is improved.
- p
-
$plevel = $list->p;
Returns the P value. Intended for internal use only.
- max_level
-
$max = $list->max_level;
Returns the maximum level that "_new_node_level" can generate.
- _new_node_level
-
$level = $list->_new_node_level;
This is an internal function for generating a random level for new nodes.
Levels are determined by the P value. The probability that a node will have 1 level is P; the probability that a node will have 2 levels is P^2; the probability that a node will have 3 levels is P^3, et cetera.
The value will never be greater than "max_level".
Note: in earlier versions it was called
_random_level
. - list
-
$node = $list->list;
Returns the initial node in the list, which is a
List::SkipList::Node
(See below.)The key and value for this node are undefined.
- level
-
$level = $list->level;
Returns the number of levels in the list. It is the same as
$level = $list->list->level;
- _first_node
-
($node, $finger) = _first_node;
Returns the first node with a key (the second node) in a list and the finger. This is used by the "merge" method.
This is an autoloading method.
- _node_class
-
$node_class_name = $list->_node_class;
Returns the name of the node class used. By default this is the
List::SkipList::Node
, which is discussed below. - _set_node_class
- _set_max_level
- _set_p
-
These methods are used only during initialization of the object. Do not call these methods after the object has been created!
- _debug
-
$list->_debug;
Used for debugging skip lists by developer. The output of this function is subject to change.
Node Methods
Methods for the List::SkipList::Node
object are listed below. They are for internal use by the main Lists::SkipList
module.
- new
-
$node = new List::SkipList::Node( $key, $value, $header );
Creates a new node for the list. The parameters are optional.
Note that the versions 0.42 and earlier used a different calling convention.
- key
-
$key = $node->key;
Returns the node's key.
$node->key( $key );
When used with an argument, sets the node's key.
- key_cmp
-
if ($node->key_cmp( $key ) != 0) { ... }
Compares the node key with the parameter. Equivalent to using
if (($node->key cmp $key) != 0)) { ... }
without the need to deal with the node key being
undef
.By default the comparison is a string comparison. If you need a different form of comparison, use a custom node class.
- validate_key
-
if ($node->validate_key( $key )) { ... }
Used by "value" to validate that a key is valid. Returns true if it is ok, false otherwise.
By default this is a dummy routine that is only called when assertions are enabled.
- value
-
$value = $node->value;
Returns the node's value.
$node->value( $value );
When used with an argument, sets the node's value.
- validate_value
-
if ($node->validate_value( $value )) { ... }
Used by "value" to validate that value is valid. Returns true if it is ok, false otherwise.
By default this is a dummy routine that is only called when assertions are enabled.
- header
-
$header_ref = $node->header;
Returns the forward list array of the node. This is an array of nodes which point to the node returned, where each index in the array refers to the level.
$node->header( $header_ref );
When used with an argument, sets the forward list. It does not check if list elements are of the correct type.
Note that the interface has changed. This method only accepts or returns header references.
- level
-
$levels = $node->level;
Returns the number of levels in the node.
SPECIAL FEATURES
Tied Hashes
Hashes can be tied to List::SkipList
objects:
tie %hash, 'List::SkipList';
$hash{'foo'} = 'bar';
$list = tied %hash;
print $list->find('foo'); # returns bar
See the perltie manpage for more information.
Customizing the Node Class
The default node may not handle specialized data types. To define your own custom class, you need to derive a child class from List::SkipList::Node
.
Below is an example of a node which redefines the default type to use numeric instead of string comparisons:
package NumericNode;
use Carp::Assert; # this is required since the parent uses this
our @ISA = qw( List::SkipList::Node );
sub key_cmp {
my $self = shift;
assert( UNIVERSAL::isa($self, __PACKAGE__) ), if DEBUG;
my $left = $self->key; # node key
my $right = shift; # value to compare the node key with
# We should gracefully handle $left being undefined
unless (defined $left) { return -1; }
return ($left <=> $right);
}
sub validate_key {
my $self = shift;
assert( UNIVERSAL::isa($self, __PACKAGE__) ), if DEBUG;
my $key = shift;
return ($key =~ s/\-?\d+(\.\d+)?$/); # test if key is numeric
}
To use this, we say simply
$number_list = new List::SkipList( node_class => 'NumericNode' );
This skip list should work normally, except that the keys must be numbers.
For another example of customized nodes, see Tie::RangeHash version 1.00_b1 or later.
About Search Fingers
A side effect of the search function is that it returns a finger to where the key is or should be in the list.
We can use this finger for future searches if the key that we are searching for occurs after the key that produced the finger. For example,
($value, $finger) = $list->find('Turing');
If we are searching for a key that occurs after 'Turing' in the above example, then we can use this finger:
$value = $list->find('VonNeuman', $finger);
If we use this finger to search for a key that occurs before 'Turing' however, it may fail:
$value = $list->find('Goedel', $finger); # this may not work
Therefore, use search fingers with caution.
Search fingers are specific to particular instances of a skip list. The following should not work:
($value1, $finger) = $list1->find('bar');
$value2 = $list2->find('foo', $finger);
One useful feature of fingers is with enumerating all keys using the "first_key" and "next_key" methods:
($key, $finger) = $list->first_key;
while (defined $key) {
...
($key, $finger) = $list->next_key($key, $finger);
}
See also the "keys" method for generating a list of keys.
Similarities to Tree Classes
This module intentionally has a subset of the interface in the Tree::Base and other tree-type data structure modules, since skip lists can be used in place of trees.
Because pointers only point forward, there is no prev
method to point to the previous key.
Some of these methods (least, greatest) are autoloading because they are not commonly used.
One thing that differentiates this module from other modules is the flexibility in defining a custom node class.
See the included Benchmark.txt file for performance comparisons.
TODO
The following features may be added in future versions:
- Accessing list nodes by index number as well as key
-
The ability to tie a list to an array as well as a hash, probably as a subclass since to implement it efficiently would require some extra bookkeeping.
- Splitting lists
-
The ability to split a list into multiple segments.
- Deterministic Skip Lists
-
An additional module (probably a subclass of List::SkipList) to implement deterministic skip lists (DSLs), probably as a 1-2-3 skip list.
CAVEATS
Skip lists are non-deterministic. Because of this, bugs in programs that use this module may be subtle and difficult to reproduce without many repeated attempts. This is especially true if there are bugs in a custom node.
AUTHOR
Robert Rothenberg <rrwo at cpan.org>
Acknowledgements
Carl Shapiro <cshapiro at panix.com> for introduction to skip lists.
Suggestions and Bug Reporting
Feedback is always welcome. Please use the CPAN Request Tracker at http://rt.cpan.org to submit bug reports.
LICENSE
Copyright (c) 2003-2004 Robert Rothenberg. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
SEE ALSO
See the article by William Pugh, "A Skip List Cookbook" (1989), or similar ones by the author at http://www.cs.umd.edu/~pugh/ which discuss skip lists.
Another article worth reading is by Bruce Schneier, "Skip Lists: They're easy to implement and they work", Doctor Dobbs Journal, January 1994.
If you need a keyed list that preserves the order of insertion rather than sorting keys, see List::Indexed.