NAME
Search::Tools::HeatMap - locate the best matches in a snippet extract
SYNOPSIS
use Search::Tools::Tokenizer;
use Search::Tools::HeatMap;
my $tokens = $self->tokenizer->tokenize( $my_string, qr/^(interesting)$/ );
my $heatmap = Search::Tools::HeatMap->new(
tokens => $tokens,
window_size => 20,
);
if ( $heatmap->has_spans ) {
my $tokens_arr = $tokens->as_array;
# stringify positions
my @snips;
for my $span ( @{ $heatmap->spans } ) {
push( @snips, $span->{str} );
}
my $occur_index = $self->occur - 1;
if ( $#snips > $occur_index ) {
@snips = @snips[ 0 .. $occur_index ];
}
printf("%s\n", join( ' ... ', @snips ));
}
DESCRIPTION
Search::Tools::HeatMap implements a simple algorithm for locating the densest clusters of unique, hot terms in a TokenList.
HeatMap is used internally by Snipper but documented here in case someone wants to abuse and/or improve it.
METHODS
new( tokens => TokenList )
Create a new HeatMap. The TokenList object may be either a Search::Tools::TokenList or Search::Tools::TokenListPP object.
init
Builds the HeatMap object. Called internally by new().
window_size
The max width of a span. Defaults to 20 tokens, including the matches.
Set this in new(). Access it later if you need to, but the spans will have already been created by new().
spans
Returns an array ref of matching clusters. Each span in the array is a hash ref with the following keys:
- cluster
- pos
- heat
- str
- str_w_pos
- unique
has_spans
Returns the number of spans found.
AUTHOR
Peter Karman <karman at cpan dot org>
ACKNOWLEDGEMENTS
The idea of the HeatMap comes from KinoSearch, though the implementation here is original.
BUGS
Please report any bugs or feature requests to bug-search-tools at rt.cpan.org
, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Search-Tools. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
SUPPORT
You can find documentation for this module with the perldoc command.
perldoc Search::Tools
You can also look for information at:
RT: CPAN's request tracker
AnnoCPAN: Annotated CPAN documentation
CPAN Ratings
Search CPAN
COPYRIGHT
Copyright 2009 by Peter Karman.
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
SEE ALSO
KinoSearch