NAME
SWISH::HiLiter - simple interface to SWISH::API and HTML::HiLiter
VERSION
0.04
SYNOPSIS
my $query = "foo OR bar";
require SWISH::API;
my $swish = SWISH::API->new( 'my_index' );
require SWISH::HiLiter;
# create an object
my $hiliter = SWISH::HiLiter->new( swish=>$swish, query=>$query );
# search and display highlighted results
my $results = $swish->Query( $query );
while ( my $result = $results->NextResult ) {
my $path = $result->Property( "swishdocpath" );
my $title = $hiliter->light(
$result->Property( "swishtitle" )
);
my $snip = $hiliter->light(
$hiliter->snip(
$result->Property( "swishdescription" )
)
);
my $rank = $result->Property( "swishrank" );
my $file = $result->Property( "swishreccount" );
print join("\n", $file, $path, $title, $rank, $snip );
}
DESCRIPTION
SWISH::HiLiter is a simple interface to the HTML::HiLiter module. It is designed to work specifically with the SWISH::API module for searching SWISH indexes and displaying snippets of highlighted text from the stored SWISH properties.
SWISH::HiLiter is NOT a drop-in replacement for the highlighting modules that come with the SWISH-E distribution. Instead, it is intended to be used when programming with SWISH::API.
REQUIREMENTS
HTML::HiLiter of course, which can also be used for full-page highlighting.
If you intend to use full-page highlighting, also get the HTML::Parser and its required modules.
Perl 5.6.1 or later.
SWISH::API 0.04 or later.
VARIABLES
You may set the following package variables. The default values are listed after each.
$SWISH::HiLiter::Debug [ 0 ]
$SWISH::HiLiter::Max_Chars [ 300 ]
$SWISH::HiLiter::Occurrences [ 5 ]
$SWISH::HiLiter::Context [ 7 ]
METHODS
new()
Create a SWISH::HiLiter object. The new() method takes either a single parameter (a SWISH::API object), or a hash of parameter key/values. Available parameters include:
- swish
-
A SWISH::API object. Version 0.03 or newer. [ Required ]
- colors
-
A reference to an array of HTML color names.
- query
-
The query string you want highlighted.
- metanames
-
A reference to an array of SWISH metanames in the index you are searching. This list is used in setq() to accomodate searches like 'metafoo=bar'. If you do not provide a list, it will be automatically retrieved from the index you are searching using the SWISH::API object you pass to new().
- occurrences
-
How many query matches to display when running snip(). See $SWISH::HiLiter::Occurrences.
- max_chars
-
Number of words around match to return in snip(). See $SWISH::HiLiter::Context.
- noshow
-
Bashful setting. If snip() fails to match any of your query (as can happen if the match is beyond the range of SwishDescription as set in your index), don't show anything. The default is to show the first $Max_Chars of the text. See also snip=>'dumb'.
- snip
-
There are three different snipping approaches. Each has its advantages.
- dumb
-
Use the brute force dumb_snip approach in snippet(). This is fastest but likely won't show any matches to your query unless they occur in the first $Max_Chars of your text.
- re
-
Uses the regular expression snip. It's slower but a little smarter. NOTE: The regexp snip is used by default if any phrases are in your query.
- loop [ default ]
-
Splits your text into 'swish words' and compares each against your query. A medium (speed and accuracy) approach for non-phrase queries.
- escape
-
Your text is assumed not to contain HTML markup and so it is HTML-escaped by default. If you have included markup in your text and want it left as-is, set 'escape' to 0. Highlighting should still work, but snip() might break...
stem( word )
Return the stemmed version of a word. Only works if your first index in SWISH::API object used Fuzzy Mode.
This method is just a wrapper around SWISH::API::Fuzzify.
NOTE: stem() requires SWISH::API version 0.04 or newer.
light( text )
Returns highlighted text. See new() for ways to control context, length, etc.
setq( query )
Set the query in the highlighting object. Called automatically by new() if 'query' is present in the new() call.
With no query, returns the parsed query string (scalar) currently in the object. Otherwise, returns the array of query terms that will be highlighted. Identical to calling HTML::HiLiter::Queries in array context.
You should only call setq() if changing the query value (as under mod_perl or in a loop) or to simply see what the parsed query looks like.
NOTE: setq() is not the same as the ParsedWords() SWISH::API method. The chief differences:
- stemming
-
ParsedWords() returns the query as it was actually used for the search, which means that the words were stemmed if your index used stemming. SWISH::HiLiter needs the query as you entered it, not as it was stemmed. SWISH::HiLiter will handle the stemming internally, by abusing some regexp trickery.
- phrases
-
Phrases are kept together in setq(), while they are broken up by white space in ParsedWords().
Example:
my (@q) = $hiliter->setq( 'my query' );
snip( text )
Return a snippet of text from text that matches query plus N words of context. N is defined in config as context
.
Gives you the google(tm)-like context for queries in search results.
NOTE: This method can be a real bottleneck. Consider the snip=>'dumb' option in new() if you see it slowing down your code.
LIMITATIONS
If your text contains HTML markup and escape = 0, snip() may fail to return valid HTML. I don't consider this a bug, but listing here in case it happens to you.
Stemming and regular expression building considers only the first index's header values from your SWISH::API object. If those header values differ (for example, WordCharacters is defined differently), be aware that only the first index from SWISH::API::IndexNames is used.
REMINDER: Use HTML::HiLiter to highlight HTML markup; use SWISH::HiLiter to highlight plain text.
AUTHOR
Peter Karman, karman@cray.com
Thanks to the SWISH-E developers, in particular Bill Moseley for graciously sharing time, advice and code examples.
Comments and suggestions are welcome.
COPYRIGHT
###############################################################################
# CrayDoc 4
# Copyright (C) 2004 Cray Inc swpubs@cray.com
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
###############################################################################
SUPPORT
Send email to swpubs@cray.com.