NAME
Text::Scan - Fast search for very large numbers of keys in a body of text.
SYNOPSIS
use Text::Scan;
$dict = new Text::Scan;
%terms = ( dog => 'canine',
bear => 'ursine',
pig => 'porcine' );
# load the dictionary with keys and values
# (values can be any scalar, keys must be strings)
while ($key, $val) = each %terms ){
$dict->insert( $key, $val );
}
# Scan a document for matches
%found = $dict->scan( $document );
# Or, if you need to count number of occurrences of any given
# key, use an array. This will give you a countable flat list
# of key => value pairs.
@found = $dict->scan( $document );
# Check for membership ($val is true)
$val = $dict->has('pig');
# Retrieve all keys
@keys = $dict->keys();
DESCRIPTION
This module provides facilities for fast searching on arbitrarily long texts with arbitrarily many search keys. The basic object behaves somewhat like a perl hash, except that you can retrieve based on a superstring of any keys stored. Simply scan a string as shown above and you will get back a perl hash (or list) of all keys found in the string (along with associated values). Longest-first order is observed (as in perl regular expressions).
CREDITS
Except for the actual scanning part, plus the node-rotation for self-adjusting optimization, this code is heavily borrowed from both Bentley & Sedgwick and Leon Brocard's additions to it for Tree::Ternary_XS
. The C code interface was created using Ingerson's Inline
.
Many test scripts come directly from Rogaski's Tree::Ternary
module.
SEE ALSO
Bentley & Sedgwick "Fast Algorithms for Sorting and Searching Strings", Proceedings ACM-SIAM (1997)
Bentley & Sedgewick "Ternary Search Trees", Dr Dobbs Journal (1998)
Sleator & Tarjan "Self-Adjusting Binary Search Trees", Journal of the ACM (1985)
Tree::Ternary
Tree::Ternary_XS
AUTHOR
Ira Woodhead, bunghole@pobox.com