NAME
Tree::Trie - An implementation of the Trie data structure in Perl
SYNOPSIS
use Tree::Trie;
use strict;
my($trie) = new Tree::Trie;
$trie->add(qw[aeode calliope clio erato euterpe melete melpomene mneme
polymnia terpsichore thalia urania]);
my(@all) = $trie->lookup("");
my(@ms) = $trie->lookup("m");
$" = "--";
print "All muses: @all\nMuses beginning with 'm': @ms\n";
my(@deleted) = $trie->remove(qw[calliope thalia doc]);
print "Deleted muses: @deleted\n";
DESCRIPTION
This module implements a trie data structure. The term "trie" comes from the word retrieval, but is generally pronounced like "try". A trie is a tree structure (or directed acyclic graph), the nodes of which represent letters in a word. For example, the final lookup for the word 'bob' would look something like $ref->{'b'}{'o'}{'b'}{HASH(0x80c6bbc)}
(the HASH being an end marker). Only nodes which would represent words in the trie exist, making the structure slightly smaller than a hash of the same data set.
The advantages of the trie over other data storage methods is that lookup times are O(1) WRT the size of the index. For sparse data sets, it is probably not as efficient as performing a binary search on a sorted list, and for small files, it has a lot of overhead. The main advantage (at least from my perspective) is that it provides a relatively cheap method for finding a list of words in a large, dense data set which begin with a certain string.
As of version 0.3 of this module, the term "word" in this documentation can refer to one of two things: either a refeence to an array of strings, or a scalar which is not an array ref. In the case of the former, each element of the array is treated as a "letter" of the "word". In the case of the latter, the scalar is evaluated in string context and it is split into its component letters. Return values of methods match the values of what is passed in -- that is, if you call lookup() with an array reference, the return value will be an array reference (if appropriate).
METHODS
- new([\%options])
-
This is the constructor method for the class. You may optionally pass it a hash reference with a set of option => value pairs. Currently, the only option available is 'deepsearch' and its valid values are 'boolean', 'choose' or 'count'. The documentation on the 'lookup' method describes the effects of these different values.
- add(word0 [, word1...wordN])
-
This method attempts to add words 0 through N to the trie. Returns, in list context, the words successfully added to the trie. In scalar context, returns the number of words successfully added. As of this release, the only reason a word would fail to be added is if it is already in the trie.
- remove(word0 [, word1...wordN])
-
This method attempts to remove words 0 through N from the trie. Returns, in list context, the words successfully removed from the trie. In scalar context, returns the number of words successfully removed. As of this release, the only reason a word would fail to be removed is if it is not already in the trie.
- lookup(word0)
-
This method performs lookups on the trie. In list context, it returns a complete list of words in the trie which begin with word0. In scalar context, the value returned depends on the setting of the 'deepsearch' option. You can set this option while creating your Trie object, or by using the deepsearch method. If deepsearch is set to 'boolean', it will return a true value if any word in the trie begins with word0. This setting is the fastest. If deepsearch is 'choose', it will return one word in the trie that begins with word0, or undef if nothing is found. If word0 exists in the trie exactly, it will be returned. Finally, if deepsearch is set to 'count', it will return a count of the words in the trie that begin with word0. This operation requires walking the entire tree, so can possibly be significantly slower than the other two options. For reasons of backwards compatibilty, 'choose' is the default value of this option.
To get a list of all words in the trie, use
lookup("")
in list context. - deepsearch([option])
-
If option os specified, sets the deepsearch parameter. Option may be one of: 'boolean', 'choose', 'count'. Please see the documentation for the lookup method for the details of what these options mean. Returns the current value of the deepsearch parameter.
Future Work
The ability to associate data with each word in a trie may be added, eventually.
There are a few methods of compression that allow you same some amount of space in the trie. I have to figure out which ones are worth implemeting. I may end up making the different compression methods configurable.
I have now made one of them the default. It's the least effective one, of course.
The ability to have Tree::Trie store its internal hash as a TIEd object of some sort.
AUTHOR
Copyright 2002 Avi Finkel <avi@finkel.org>
This package is free software and is provided "as is" without express or implied warranty. It may be used, redistributed and/or modified under the terms of the Perl Artistic License (see http://www.perl.com/perl/misc/Artistic.html)