NAME
Search::Tools::Snipper - extract keywords in context
SYNOPSIS
my $query = [ qw/ quick dog / ];
my $text = 'the quick brown fox jumped over the lazy dog';
my $s = Search::Tools::Snipper->new(
occur => 3,
context => 8,
word_len => 5,
max_chars => 300,
query => $query
);
print $s->snip( $text );
DESCRIPTION
Search::Tools::Snipper extracts keywords and their context from a larger block of text. The larger block may be plain text or HTML/XML.
METHODS
new( query => query )
Instantiate a new object. query must be either a scalar string, an array of strings, or a Search::Tools::RegExp::Keywords object.
Many of the following methods are also available as key/value pairs to new().
occur
The number of snippets that should be returned by snip().
Available via new().
context
The number of context words to include in the snippet.
Available via new().
max_chars
The maximum number of characters (not bytes! under Perl >= 5.8) to return in a snippet. NOTE: This is only used to test whether test is worth snipping at all, or if no keywords are found (see show()).
Available via new().
word_len
The estimated average word length used in combination with context(). You can usually ignore this value.
Available via new().
show
Boolean flag indicating whether snip() should succeed no matter what, or if it should give up if no snippets were found. Default is 1 (true).
Available via new().
escape
Boolean flag indicating whether snip() should escape any HTML/XML markup in the resulting snippet or not. Default is 0 (false).
Available via new().
snipper
The CODE ref used by the snip() method for actually extracting snippets. You can use your own snipper function if you want (though if you have a better snipper algorithm than the ones in this module, why not share it?). If you go this route, have a look at the source code for snip() to see how snipper() is used.
Available via new().
snipper_name
The name of the internal snipper function used. In case you're curious.
snipper_force
Boolean flag indicating whether the snipper() value should always be used, regardless of the type of query keyword. Default is 0 (false).
Available via new().
count
The number of snips made by the Snipper object.
collapse_whitespace
Boolean flag indicating whether multiple whitespace characters should be collapsed into a single space. A whitespace character is defined as anything that Perl's \s
pattern matches, plus the nobreak space (\xa0
). Default is 1 (true).
Available via new().
snip( text )
Return a snippet of text from text that matches query plus context() words of context. Matches are case insensitive.
rekw
Returns the internal Search::Tools::RegExp::Keywords object.
AUTHOR
Peter Karman perl@peknet.com
Based on the HTML::HiLiter regular expression building code, originally by the same author, copyright 2004 by Cray Inc.
Thanks to Atomic Learning www.atomiclearning.com
for sponsoring the development of this module.
COPYRIGHT
Copyright 2006 by Peter Karman. This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
SEE ALSO
SWISH::HiLiter