NAME
Tree::RB::XS - Red/Black Tree and LRU Cache implemented in C
SYNOPSIS
Basic dictionary features:
my $tree= Tree::RB::XS->new;
$tree->put($_ => 0) for 'a'..'z'; # store key/value, overwrite
say $tree->get('a'); # get value by key
$tree->get('a')++ # 'get' returns lvalues
if $tree->exists('a');
$tree->delete('a'); # delete by key
$tree->clear; # efficiently delete all nodes
Tree-specific features:
use Tree::RB::XS qw/ :get /;
$tree->put(a => 1);
$tree->put(m => 13);
$tree->put(z => 26);
$nd= $tree->get_node('f', GET_GE); # returns node of 'm'
$nd= $tree->get_node('f', GET_LT); # returns node of 'a'
$nd= $tree->nth_node(2); # returns node of 'z', O(log N) time
$tree->max_node->index; # returns 2, also O(log N) time
my $iter= $tree->iter; # iterates in key order
$tree->delete('f','w'); # delete a range, deletes 'm'
$tree->delete($iter, $tree->max_node); # delete using nodes or iterators
$tree->put(b => 2);
$tree->min_node->prune; # manipulate tree via node methods
Support duplicate keys:
use Tree::RB::XS qw/ :cmp /;
$tree= Tree::RB::XS->new(
compare_fn => CMP_NUMSPLIT, # string-of-numbers comparison
allow_duplicates => 1,
);
$tree->insert('192.168.0.1' => time); # 'insert' instead of 'put'
$tree->insert('192.168.0.40' => time);
$tree->insert('192.168.0.1' => time);
# analyze subnet
$first= $tree->get_node_ge('192.168.0.0');
$last= $tree->get_node_le('192.168.0.255');
say $last->index - $first->index + 1; # 3 in subnet
LRU Cache feature:
$tree= Tree::RB::XS->new(
track_recent => 1, # Remember order of added nodes
);
$tree->put($_,$_) for 1,3,2;
say $tree->newest->key; # 2
say $tree->oldest->key; # 1
@insertion_order=
$tree->iter_newer->next_keys(1e99); # (1,3,2)
$tree->get_node(1)->mark_newest; # 'touch' a node to end of list
$tree->iter_newer->next_keys(1e99); # (3,2,1)
$tree->iter_older->next_keys(1e99); # (1,2,3)
@removed= $tree->truncate_recent(2); # returns (3), leaves (2,1) in tree
Fancy iterators:
# iterator has 'current position'. inspect it, then step
for (my $i= $tree->iter; !$i->done; $i->step) {
if ($i->key ...) { $i->value .... }
}
# Call iterator as a function that returns the next node
my $i= $tree->rev_iter;
while (my $node= &$i) {
$node->prune if $node->key =~ ...;
}
# Return batches of values from iterator, to reduce loop overhead
my $i= $tree->iter;
while (my @batch= $i->next_values(100)) {
...
}
# If you delete the node an iterator is on, it moves to the next
$tree= Tree::RB::XS->new(
track_recent => 1,
kv => [ a => 1, c => 3, b => 2 ],
);
$middle_node= $tree->nth(1); # node of 'b'
$forward= $middle_node->iter; # iterate to higher keys
$reverse= $middle_node->rev_iter; # iterate to lower keys
$newer= $middle_node->iter_newer; # iterate to more recent keys
$older= $middle_node->iter_older; # iterate to less recent keys
$middle_node->prune; # remove node from tree
say $_->key for $forward, $reverse; # c, a
say $_->key for $newer, $older; # undef, c
DESCRIPTION
This is a feature-rich Red/Black Tree implemented in C. Special features (above and beyond the basics you'd expect from a treemap) include:
Optimized storage and comparisons of keys (speed)
O(log N)
Nth-node lookup (which allows the tree to act as an array)Smart bi-directional iterators that advance when you delete the current node.
Option to allow duplicate keys while preserving insertion order.
Optional linked-list of "recent" order, to facilitate LRU or MRU caches.
CONSTRUCTOR
new
my $tree= Tree::RB::XS->new( %OPTIONS );
...->new( $compare_fn );
If new
is given a single parameter, it is assumed to be the compare_fn
.
Options:
compare_fn
Choose a custom key-compare function. This can be the ID of an optimized function, a coderef, or the name of one of the optimized IDs like "int" for
CMP_INT
. See below for details.allow_duplicates
Whether to allow two nodes with the same key. Defaults to false.
compat_list_get
Whether to enable full compatibility with Tree::RB's list-context behavior for "get". Defaults to false.
track_recent
Whether to keep track of the insertion order of nodes, by default. Defaults to false. You may toggle this attribute after construction.
lookup_updates_recent
Whether "lookup" and "get" methods automatically mark a node as the most recent.
kv
An initial arrayref of
key,value
pairs to initialize the tree with. If allow_duplicates is requested, this uses "insert_multi", else it uses "put_multi" (so later duplicate keys replace the values of earlier ones).keys
An arrayref of keys to use to initialize the tree. If
values
are not provided, the value of each node will beundef
.values
An arrayref of values to use to initialize the tree. If provided, it must be the same length as
keys
.recent
Specifies a list of integers which initialize the list used by the "track_recent" feature, overriding the order seen in
keys
orkv
. The integer refers to the "nth" node of the assembled tree. The list does not need to include all the nodes.
ATTRIBUTES
compare_fn
Specifies the function that compares keys. Read-only; pass as an option to the constructor.
This is one of: "CMP_PERL" (default), "CMP_INT", "CMP_FLOAT", "CMP_MEMCMP", "CMP_UTF8", "CMP_NUMSPLIT", or a coderef. CMP_INT
and CMP_FLOAT
are the most efficient, and internally store the key as a number. CMP_MEMCMP
and CMP_UTF8
copy the key into an internal buffer, and offer moderate speed gains over CMP_PERL
. CMP_PERL
is Perl's own cmp
operator.
If set to a coderef, it should take two parameters and return an integer indicating their order in the same manner as Perl's cmp
. Beware that using a custom coderef throws away most of the speed gains from using this XS variant over plain Tree::RB. If speed is important, try pre-processing your keys in a way that allows you to use one of the built-in ones.
Patches welcome, for anyone who wants to expand the list of optimized built-in comparison functions.
allow_duplicates
Boolean, read/write. Controls whether "insert" will allow additional nodes with keys that already exist in the tree. This does not change the behavior of "put", only "insert". If you set this to false, it does not remove duplicates that already exist in the tree. The initial value is false.
compat_list_get
Boolean, read/write. Controls whether "get" returns multiple values in list context. I wanted to match the API of Tree::RB
, but I can't bring myself to make an innocent-named method like 'get' change its behavior in list context. So, by deault, this attribute is false and get
always returns one value. But if you set this to true, get
changes in list context to also return the Node, like is done in "lookup" in Tree::RB.
track_recent
Boolean, read/write. Enabling this causes all nodes added (afterward) with "put" or "insert" to be added to an insertion-order linked list. You can then inspect or iterate them with related methods. Note that each node can be tracked or un-tracked individually, and this setting just changes the default for new nodes. This allows you to differentiate more permanent data points vs. temporary ones that you might want to expire over time.
See also: "oldest_node", "newest_node", "recent_count", "iter_newer", "iter_older", "truncate_recent", and Node methods "newer", "older", "mark_newest", and "recent_tracked".
lookup_updates_recent
Whether "lookup" and "get" methods automatically mark a node as the most recent. This defaults to false, so only 'put' methods (including insert) mark a node recent. Even when true, 'exists' does not mark a node as recent, nor do iterators, min_node, max_node, nth_node, newest_node or oldest_node, as it is assumed using those methods are more about inspecting the state of the tree than representing access patterns of important keys.
key_type
The key-storage strategy used by the tree. Read-only; pass as an option to the constructor. This is an implementation detail that may be removed in a future version.
size
Returns the number of elements in the tree.
recent_count
Returns the number of nodes with insertion-order tracking enabled. See "track_recent".
root_node
Get the root node of the tree, or undef
if the tree is empty.
Alias: root
min_node
Get the tree node with minimum key. Returns undef if the tree is empty.
Alias: min
max_node
Get the tree node with maximum key. Returns undef if the tree is empty.
Alias: max
nth_node
Get the Nth node in the sequence from min to max. N is a zero-based index. You may use negative numbers to count down form max.
Alias: nth
oldest_node
$oldest= $tree->oldest_node;
$tree->oldest_node($node);
The earliest node that was inserted with "track_recent" enabled/applied. undef
if no nodes have insertion-order tracking.
Alias: oldest
newest_node
$newest= $tree->newest_node;
$tree->newest_node($node);
The most recent node that was inserted with "track_recent" enabled/applied. undef
if no nodes have insertion-order tracking.
Alias: newest
METHODS
get
my $val= $tree->get($key);
...->get($key, $mode);
$tree->get($key) += 5; # also, they're lvalues
Fetch a value from the tree, by its key. Unlike "get" in Tree::RB, this returns a single value, regardless of list context. But, you can set compat_list_get to make get
an alias for lookup
.
Mode can be used to indicate something other than an exact match: "GET_EQ", "GET_EQ_LAST", "GET_LE", "GET_LE_LAST", "GET_LT", "GET_GE", "GET_GT". (described below) It can also be "GET_OR_ADD" to automatically create a node with the key if one didn't exist.
Aliases with built-in mode constants:
- get_or_add
-
Handy for things like
( $tree->get_or_add($key) //= '') .= "test"
get_node
Same as "get", but returns the node instead of the value. In trees with duplicate keys, this returns the first matching node. (nodes with identical keys are preserved in the order they were added)
Aliases with built-in mode constants:
- get_node_last
- get_node_le
- get_node_le_last
- get_node_lt
- get_node_ge
- get_node_gt
get_key
Returns the key closest to the comparison criteria. This is a shortcut for get_node(...)->key
but avoids the undef
check and avoids inflating the tree node to a perl object.
Aliases with built-in mode constants:
- get_key_le
- get_key_lt
- get_key_ge
- get_key_gt
get_all
my @values= $tree->get_all($key);
In trees with duplicate keys, this method is useful to return the values of all nodes that match the key. This can be more efficient than stepping node-to-node for small numbers of duplicates, but beware that large numbers of duplicate could have an adverse affect on Perl's stack.
lookup
Provided for compatibility with Tree::RB. Same as "get" in scalar context, but if called in list context it returns both the value and the node from "get_node". You can also use Tree::RB's lookup-mode constants of "LUEQUAL", etc.
exists
$count= $tree->exists($key);
$count= $tree->exists(@keys);
Check whether a key exists (or multiple keys exist) in the tree, returning the total count of nodes having these keys.
put
my $old_val= $tree->put($key, $new_val);
Associate the key with a new value. If the key previously existed, this returns the old value, and updates the tree to reference the new value. If the tree allows duplicate keys, this will remove all but one node having this key and then set its value. Only the first old value will be returned.
put_multi
$added_count= $tree->put_multi($k, $v, $k, $v, ...);
$added_count= $tree->put_multi([ $k, $v, $k, $v, ... ]);
Put multiple keys and values into the tree. If duplicate keys are supplied and "allow_duplicates" is false, earlier (k,v) will be overwritten by later conflicting (k,v) in the list, the same way that happens when assigning this list to a perl hash. If allow_duplicates
is true, all key/value pairs will get added to the tree.
The return value is the number of new keys added to the tree, not counting overwrites.
insert
my $idx= $tree->insert($key, $value);
Insert a new node into the tree, and return the index at which it was inserted. If "allow_duplicates" is not enabled, and the node already existed, this returns -1 and does not change the tree. If allow_duplicates
is enabled, this adds the new node after all nodes of the same key, preserving the insertion order.
insert_multi
$added_count= $tree->insert_multi($k, $v, $k, $v, ...);
$added_count= $tree->insert_multi([ $k, $v, $k, $v, ... ]);
Perform multiple insertions, and return the number of items which got added. Like "insert" when "allow_duplicates" is false, this does not replace existing values if the key already exists.
delete
my $count= $tree->delete($key);
...->delete($key1, $key2);
...->delete($node1, $node2);
...->delete($start, $tree->get_node_lt($limit));
Delete any node with a key identical to $key
, and return the number of nodes removed. If you supply two keys (or two nodes) this will delete those nodes and all nodes inbetween; $key1
is searched with mode GET_GE
and $key2
is searched with mode GET_LE_LAST
, so the keys themselves do not need to be found in the tree. The keys (or nodes) must be given in ascending order, else no nodes are deleted.
If you want to delete a range *exclusive* of one or both ends of the range, just use the "get_node" method with the desired mode to look up each end of the nodes that you do want removed.
truncate_recent
my @nodes= $tree->truncate_recent($max_count);
Reduce the number of "recent" nodes (those with insertion-order tracking enabled) to $max_count
. (See "track_recent") The pruned nodes are returned as a list.
The intent here is that you may have some "permanent" nodes that stay in the tree, but more transient ones that you add on demand, and then you might want to purge the oldest of those when they exceed a threshold.
If there are fewer than $max_count
nodes with insertion-order tracking, this has no effect.
clear
my $count= $tree->clear();
This is the fastest way to remove all nodes from the tree. It gets to destroy all the nodes without worrying about the tree structure or shifting iterators aside.
iter
my $iter= $tree->iter; # from min_node
...->iter($from_key, $get_mode=GET_GE); # from get_node
...->iter($from_node); # from existing node
Return an iterator object that traverses the tree from min to max, or from the key or node you provide up to max.
rev_iter
Like iter
, but the ->next
and ->step
methods walk backward to smaller key values, and the default $get_mode
is "GET_LE_LAST".
iter_newer
Return an iterator that iterates the insertion-order from oldest to newest. This only iterates nodes with insertion-order tracking enabled. See "track_recent".
iter_older
Return an iterator that iterates the insertion-order from newest to oldest. This only iterates nodes with insertion-order tracking enabled. See "track_recent".
NODE OBJECTS
ITERATOR OBJECTS
Iterators are similar to Nodes, but they hold a strong reference to the tree, and if a node they point to is removed from the tree they advance to the next node. (and obviously they iterate, where node objects do not)
The iterator references a "current node" which you can inspect the key and value of. You can call 'step' to move to a new current node, and you can call 'next' which returns the current node while switching the reference to the next node.
Note that if you avoid referencing the Node, and stick to the attributes and methods of the iterator, the tree can avoid allocating the Perl object to represent the Node. This gives a bit of a performance boost for large tree operations.
Iterator Attributes
- node
-
The current node.
- key
-
The key of the current node.
- value
-
The value of the current node. Note that this returns an lvalue, which in an aliased context allows you to modify the value stored in the tree.
$_++ for $iter->value;
- index
-
The index of the current node.
- tree
-
A reference back to the Tree. Note that each iterator holds a strong reference to the tree.
- done
-
True if the iterator has reached the end of its sequence, and no longer references a current Node.
Iterator Methods
- next
-
my $nodes= $iter->next; my @nodes= $iter->next($count); my @nodes= $iter->next('*' || inf);
Return the current node (as a node object) and advance to the following node in the sequence. After the end of the sequence, calls to
next
returnundef
. If you pass the optional$count
, it will return up to that many nodes, as a list. It will also return an empty list at the end of the sequence instead of returningundef
. You can use the string'*'
for the count to indicate all the rest of the nodes in the sequence. Likewise, any numeric value larger than the number of nodes in the tree (like builtin::inf) will return them all. - next_key
-
Same as
next
, but return the keys of the nodes. - next_value
-
Same as
next
, but return the values of the nodes. Like "value", these are also aliases, and can be modified.$_++ for $iter->next_value('*');
- next_kv
-
my %x= $iter->next_kv('*');
Same as
next
, but return pairs of key and value for each node. This is useful for dumping them into a hash. (unless you have duplicate keys enabled, then don't dump them into a hash or you would lose elements) - step
-
$iter->step; # advance by one node $iter->step(10); # advance by 10 nodes $iter->step(-4); # back up 4 nodes
This moves the iterator by one or more nodes in the forward or backward direction. For a reverse-iterator, positive numbers move toward the minimum key and negative numbers move toward the maximum key. If the offset would take the iterator beyond the last node, the current node becomes
undef
. If the offset would take the iterator beyond the first node, the first node becomes the current node. - delete
-
Delete the current node, return its value, and advance to the next node.
for (my $i= $tree->iter; !$i->done;) { if ($i->key =~ ...) { say "Removing ".$i->key." = ".$i->delete; } else { $i->step; } }
This is useful when combined with the
key
andvalue
attributes of the iterator, but not so much when you are looping usingnext
, becausenext
has already moved to the next node beyond the one it returned to you. When usingnext
, calldelete
on the node, not the iterator. - clone
-
Returns a new iterator of the same direction pointing at the same node.
TIE HASH INTERFACE
This class implements the required methods needed for tie
:
my %hash
my $tree= tie %hash, 'Tree::RB::XS';
$hash{$_}= $_ for 1..10;
delete $hash{3};
$_ += 1 for values %hash;
tied(%hash)->hseek(5);
say each %hash; # 5
But you get better performance by using the tree's API directly. This should only be used when you need to integrate with code that isn't aware of the tree.
- hseek
-
tied(%hash)->hseek( $key ); ->hseek({ -reverse => $bool }); ->hseek( $key, { -reverse => $bool });
This is a method of the tree, but only relevant to the tied hash interface. It controls the behavior of the next call to
each %hash
orkeys %hash
, causing the first element to be the node at or after the$key
. (or before, if you change to a reverse iterator)This method differs from "hseek" in Tree::RB in that
Tree::RB
will change the logical first node of the iteration *indefinitely* such that repeated calls tokeys
do not see any element less than$key
. Thishseek
only applies to the next iteration. (which I'm guessing was the intent in Tree::RB?)
EXPORTS
Comparison Functions
Export all with ':cmp'
- CMP_PERL
-
Use Perl's
cmp
function. This forces the keys of the nodes to be stored as Perl Scalars. - CMP_INT
-
Compare keys directly as whatever integer type Perl was compiled with. (i.e. 32-bit or 64-bit) This is the fastest option.
- CMP_FLOAT
-
Compare the keys directly as whatever floating-point type Perl was compiled with. (i.e. 64-bit double or 80-bit long double)
- CMP_UTF8
-
Compare the keys as UTF8 byte strings, using Perl's internal
bytes_cmp_utf8
function. - CMP_MEMCMP
-
Compare the keys using C's
memcmp
function. - CMP_NUMSPLIT
-
Compare using the equivalent of this coderef:
sub { my @a_parts= split /([0-9]+)/, $_[0]; my @b_parts= split /([0-9]+)/, $_[1]; for (my $i= 0; $i < @a_parts || $i < @b_parts; $i++) { no warnings 'uninitialized'; my $cmp= ($i & 1)? ($a_parts[$i] <=> $b_parts[$i]) : ($a_parts[$i] cmp $b_parts[$i]); return $cmp if $cmp; } return 0; }
except the XS implementation is not limited by the integer size of perl, and operates directly on the strings without splitting anything. (i.e. much faster)
This results in a sort where integer portions of a string are sorted numerically, and any non-digit segment is compared as a string. This produces sort-orders like the following:
2020-01-01 2020-4-7 2020-10-12
or
14.4.2 14.14.0
If the
key_type
isKEY_TYPE_BSTR
this will sort the string portions usingmemcmp
, else they are sorted with Perl's unicode-aware sort. - cmp_numsplit
-
use Tree::RB::XS 'cmp_numsplit'; $cmp= cmp_numsplit('192.168.10.1', '192.168.4.255');
You can export the comparison function itself, for use elsewhere. It's rather useful. Maybe it should be its own module?
Key Types
Export all with ':key_type';
- KEY_TYPE_ANY
-
This
key_type
causes the tree to store whole Perl scalars for each node. Its default comparison function is Perl's owncmp
operator. - KEY_TYPE_INT
-
This
key_type
causes the tree to store keys as Perl's integers, which are either 32-bit or 64-bit depending on how Perl was compiled. Its default comparison function puts the numbers in non-decreasing order. - KEY_TYPE_FLOAT
-
This
key_type
causes the tree to store keys as Perl's floating point type, which are either 64-bit doubles or 80-bit long-doubles. Its default comparison function puts the numbers in non-decreasing order. - KEY_TYPE_BSTR
-
This
key_type
causes the tree to store keys as byte strings. The default comparison function is the standard Libcmemcmp
. - KEY_TYPE_USTR
-
Same as
KEY_TYPE_BSTR
but reads the bytes from the supplied key as UTF-8 bytes. The default comparison function is alsomemcmp
even though this does not sort Unicode correctly. (for correct unicode, useKEY_TYPE_ANY
, but it's slower...)
Lookup Mode
Export all with ':get'
- GET_EQ
-
This specifies a node with a key equal to the search key. If duplicate keys are enabled, this specifies the left-most match (least recently added). Has alias
LUEQUAL
to match Tree::RB. - GET_EQ_LAST
-
Same as
GET_EQ
, but if duplicate keys are enabled, this specifies the right-most match (most recently inserted). - GET_OR_ADD
-
Look up the key, and if it doesn't exist, insert a node for it into the tree. When getting the value, this provides an lvalue which you can assign to.
++($tree->get("a", GET_OR_ADD) //= 0); ($tree->get("b", GET_OR_ADD) //= '') .= "example";
- GET_GE
-
This specifies the same node of
GET_EQ
, unless there are no matches, then it falls back to the left-most node with a key greater than the search key. Has aliasLUGTEQ
to match Tree:RB. - GET_LE
-
This specifies the same node of
GET_EQ
, unless there are no matches, then it falls back to the right-most node with a key less than the search key. Has aliasLULTEQ
to match Tree::RB. - GET_LE_LAST
-
This specifies the same node of
GET_EQ_LAST
, unless there are no matches, then it falls back to the right-most node with a key less than the search key. - GET_GT
-
Return the first node greater than the key, or
undef
if the key is greater than any node. Has aliasLUGREAT
to match Tree::RB. - GET_LT
-
Return the right-most node less than the key, or
undef
if the key is less than any node. Has aliasLULESS
to match Tree::RB. - GET_NEXT
-
Look for the last node matching the specified key (returning
undef
if not found) then return$node->next
. This is the same asGET_GT
except it ensures the key existed. Has aliasLUNEXT
to match Tree::RB. - GET_PREV
-
Look for the first node matching the specified key (returning
undef
if not found) then return$node->prev
. This is the same asGET_LT
except it ensures the key existed. Has aliasLUPREV
to match Tree::RB.
SEE ALSO
- Tree::RB
-
The fastest pure-perl tree module on CPAN. Implemented as blessed arrayrefs.
Tree::RB::XS was originally just an XS version of this module's API, with a few important differences:
The
get
method in Tree::RB::XS is not affected by array context, unless you request "compat_list_get".resort
is not implemented in Tree::RB::XSTree structure is not mutable via the attributes of Node, nor can nodes be created independent from a tree.
Many methods have official names changed, but aliases are provided for compatibility.
- AVLTree
-
Another XS-based tree module. About 6%-70% slower than Tree::RB::XS depending on whether you use coderef comparisons or optimized comparisons.
- Tree::AVL
-
An AVL tree implementation in pure perl. The API is perhaps more convenient, with the ability to add your object to the tree with a callback that derives the key from that object. However, it runs significantly slower than Tree::RB.
- Tie::Hash::Indexed
-
Not a tree, but the second-fastest module on CPAN if you want a hash that preserves insertion order, such as for an LRU cache. Technically a hash table should out-perform a binary tree on massive collections (O(1) lookup time vs. O(log N) lookup time), but currently Tree::RB::XS benchmarks quite a bit faster.
- Hash::Ordered
-
The fastest pure-perl module on CPAN for ordered hashes / LRU caches.
VERSION
version 0.14
AUTHOR
Michael Conrad <mike@nrdvana.net>
COPYRIGHT AND LICENSE
This software is copyright (c) 2024 by Michael Conrad.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.