NAME

SWISH::HiLiter -- simple interface to SWISH::API and HTML::HiLiter

VERSION

0.01

SYNOPSIS

   my $query = "foo OR bar";
   
   require SWISH::API;
   my $swish = SWISH::API->new( 'my_index' );
   
   require SWISH::HiLiter;
   
   # create an object
   
   my $hiliter = SWISH::HiLiter->new( swish=>$swish, query=>$query );
      
   # search and display highlighted results
   
   my $results = $swish->Query( $query );
   
   while ( my $result = $results->NextResult ) {
	
	my $path 	= $result->Property( "swishdocpath" );
	my $title 	= $hiliter->light(
				$result->Property( "swishtitle" )
			  );
	my $snip 	= $hiliter->light(
			    $hiliter->snip(
				$result->Property( "swishdescription" )
			    )
			  );
	my $rank 	= $result->Property( "swishrank" );
	my $file	= $result->Property( "swishreccount" );
        
	print join("\n", $file, $path, $title, $rank, $snip );
	
   }
   

DESCRIPTION

SWISH::HiLiter is a simple interface to the HTML::HiLiter module. It is designed to work specifically with the SWISH::API module for searching SWISH indexes and displaying snippets of highlighted text from the stored SWISH properties.

SWISH::HiLiter is NOT a drop-in replacement for the highlighting modules that come with the SWISH-E distribution. Instead, it is intended to be used when programming with SWISH::API.

REQUIREMENTS

HTML::HiLiter of course, which can also be used for full-page highlighting.

If you intend to use full-page highlighting, also get the HTML::Parser and its required modules.

Perl 5.6.1 or later.

SWISH::API 0.03 or later.

VARIABLES

You may set the following package variables. The default values are listed after each.

  • $SWISH::HiLiter::Debug [ 0 ]

  • $SWISH::HiLiter::Max_Chars [ 300 ]

  • $SWISH::HiLiter::Occurrences [ 5 ]

  • $SWISH::HiLiter::Context [ 7 ]

METHODS

new()

Create a SWISH::HiLiter object. The new() method takes either a single parameter (a SWISH::API object), or a hash of parameter key/values. Available parameters include:

swish

A SWISH::API object. Version 0.03 or newer. [ Required ]

colors

A reference to an array of HTML color names.

query

The query string you want highlighted.

metanames

A reference to an array of SWISH metanames in the index you are searching. This list is used in setq() to accomodate searches like 'metafoo=bar'. If you do not provide a list, it will be automatically retrieved from the index you are searching using the SWISH::API object you pass to new().

occurrences

How many query matches to display when running snip(). See $SWISH::HiLiter::Occurrences.

max_chars

Number of words around match to return in snip(). See $SWISH::HiLiter::Context.

noshow

Bashful setting. If snip() fails to match any of your query (as can happen if the match is beyond the range of SwishDescription as set in your index), don't show anything. The default is to show the first $Max_Chars of the text. See also snip=>'dumb'.

snip

There are three different snipping approaches. Each has its advantages.

dumb

Use the brute force dumb_snip approach in snippet(). This is fastest but likely won't show any matches to your query unless they occur in the first $Max_Chars of your text.

re

Uses the regular expression snip. It's slower but a little smarter. NOTE: The regexp snip is used by default if any phrases are in your query.

loop [ default ]

Splits your text into 'swish words' and compares each against your query. A medium (speed and accuracy) approach for non-phrase queries.

escape

Your text is assumed not to contain HTML markup and so it is HTML-escaped by default. If you have included markup in your text and want it left as-is, set 'escape' to 0. Highlighting should still work, but snip() might break...

stem( word )

Return the stemmed version of a word. Only works if your first index in SWISH::API object used Fuzzy Mode.

This method is just a wrapper around SWISH::API::Fuzzy.

light( text )

Returns highlighted text. See new() for ways to control context, length, etc.

setq( query )

Set the query in the highlighting object. Called automatically by new() if 'query' is present in the new() call.

With no query, returns the parsed query string (scalar) currently in the object. Otherwise, returns the array of query terms that will be highlighted. Identical to calling HTML::HiLiter::Queries in array context.

You should only call setq() if changing the query value (as under mod_perl or in a loop) or to simply see what the parsed query looks like.

NOTE: setq() is not the same as the ParsedWords() SWISH::API method. The chief differences:

stemming

ParsedWords() returns the query as it was actually used for the search, which means that the words were stemmed if your index used stemming. SWISH::HiLiter needs the query as you entered it, not as it was stemmed. SWISH::HiLiter will handle the stemming internally, by abusing some regexp trickery.

phrases

Phrases are kept together in setq(), while they are broken up by white space in ParsedWords().

Example:

my (@q) = $hiliter->setq( 'my query' );

snip( text )

Return a snippet of text from text that matches query plus N words of context. N is defined in config as context.

Gives you the google(tm)-like context for queries in search results.

NOTE: This method can be a real bottleneck. Consider the snip=>'dumb' option in new() if you see it slowing down your code.

LIMITATIONS

If your text contains HTML markup and escape = 0, snip() may fail to return valid HTML. I don't consider this a bug, but listing here in case it happens to you.

Stemming and regular expression building considers only the first index's header values from your SWISH::API object. If those header values differ (for example, WordCharacters is defined differently), be aware that only the first index from SWISH::API::IndexNames is used.

REMINDER: Use HTML::HiLiter to highlight HTML markup; use SWISH::HiLiter to highlight plain text.

AUTHOR

Peter Karman, karman@cray.com

Thanks to the SWISH-E developers, in particular Bill Moseley for graciously sharing time, advice and code examples.

Comments and suggestions are welcome.

COPYRIGHT

###############################################################################
#    CrayDoc 4
#    Copyright (C) 2004 Cray Inc swpubs@cray.com
#
#    This program is free software; you can redistribute it and/or modify
#    it under the terms of the GNU General Public License as published by
#    the Free Software Foundation; either version 2 of the License, or
#    (at your option) any later version.
#
#    This program is distributed in the hope that it will be useful,
#    but WITHOUT ANY WARRANTY; without even the implied warranty of
#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#    GNU General Public License for more details.
#
#    You should have received a copy of the GNU General Public License
#    along with this program; if not, write to the Free Software
#    Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
###############################################################################

SUPPORT

Send email to swpubs@cray.com.

SEE ALSO

HTML::HiLiter, SWISH::API