NAME
WE_Frontend::Indexer::Htdig - interface to the htdig search engine
SYNOPSIS
use WE_Frontend::Indexer::Htdig;
my $results = WE_Frontend::Indexer::Htdig::search(-words => "word");
DESCRIPTION
This is an interface to the htdig
search engine. The result of the search
function call is a perl hash reference containing the results.
FUNCTIONS
search(%args)
Arguments are:
- -words
-
A string with the words to search. Multiple words are space-separated. This argument is required.
- -conf
-
Specify a different htdig configuration file, otherwise the default
htdig.conf
is used. - -lang
-
(Optional) Specify a language. The configuration parameter given by conf may contain %{lang} placeholders which are substituted by the value of this argument.
- -debug
-
Output some diagnostics to stderr.
- -httpshack
-
Set to a true value if operating on a https server. htdig does not handle SSL, so a parallel http should be setup for the indexing. With the https hack the URLs in the search result
list
are translated at template display time.
The result is a hash reference with the following keys:
- logical_words
- matches_per_page
- max_stars
- page
- pages
- list
-
Holds an array with the search results. See below.
- nomatch
-
This variable is set to a true value if the search produces no results. Also detectable by an empty result list.
- pageurllist
-
A list of URLs for the 1 .. 10 result pages.
- pagenumberlist
-
The corresponding numbers for the pageurllist. Please note that perl/Template arrays start with index 0 (which would be page 1).
- prevpageurl
- nextpageurl
-
Hold the URLs for the previous resp. next result page.
- prevpagenumber
- nextpagenumber
-
Usually not needed: the number of the previous resp. next result page. In fact you would label them "Prev"/"Next" or "<"/">".
- ...
There are more keys. For a complete list refer to the htdig documentation at http://www.htdig.org, htsearch
, Templates. Note that the original template variable names are converted to lowercase.
The value of list
is an array reference with the matches. Each match is a hash reference with the following keys:
- url
-
The URL of the page. See also the
-httpshack
option above. - title
-
The title of the page, as specified by the <title> html tag.
- anchor
- excerpt
-
The first lines of text in the document.
- score
- percent
- modified
-
The date and time the document was last modified. See also the documentation of the
iso_8601
config variable inhtdig.conf
. - ...
The complete list is also in the htdig documentation at http://www.htdig.org, htsearch
, Templates.
CONFIGURATION FILES
It is best to just use the original conf/htdig.tpl.conf
file found in the webeditor distribution. The indexing program in webeditor will use the template file and fill it with the configuration found in WEsiteinfo
. Please look also into htdig.txt in the webeditor/doc directory for a first-time installation/configuration.
WEsiteinfo configuration:
To override the searchindexer path (default is "rundig" without a path):
$searchengine->searchindexer("/usr/local/bin/rundig");
To set the template htdig and target htdig configuration files (these settings are highly recommended):
$searchengine->htdigconftemplate($paths->uprootdir . "/conf/htdig.tpl.conf");
$searchengine->htdigconf($paths->uprootdir . "/conf/htdig.%{lang}.conf");
where $paths
is the WEsiteinfo::Paths object documented in WE_Frontend::Info. If the configuration file should not be language dependent, then use
$searchengine->htdigconf($paths->uprootdir . "/conf/htdig.conf");
instead.
Own htdig.conf
If you decide to make your own htdig.conf
, put at least the following lines into the configuration file:
template_map: Long long ${common_dir}/long.html \
Short short ${common_dir}/short.html \
Perl perl ${common_dir}/perl/match.pl
template_name: perl
search_results_header: ${common_dir}/perl/header.pl
search_results_footer: ${common_dir}/perl/footer.pl
nothing_found_file: ${common_dir}/perl/nomatch.pl
${common_dir}/perl
should be a link to the directory .../lib/WE_Frontend/Indexer/htdig_common
.
INSTALLING HTDIG
htdig is available e.g. from this location: http://www.htdig.org/files/snapshots/htdig-3.2.0b5-20040404.tar.gz.
To compile and install htdig from scratch, the following configure line could be used to create a path layout similar to the RedHat one:
sh configure --prefix=/usr --with-search-dir=/usr/share/htdig --with-image-dir=/usr/share/htdig --with-cgi-bin-dir=/usr/bin --with-config-dir=/etc --with-database-dir=/usr/share/htdig
CAVEATS
Many. Mind the permissions. Especially, rundig may use the default database directory (/usr/local/share/htdig/database
or such) as the temporary directory for sorting, which will fail if the apache user (usually nobody
or www
) has no permissions to write to this directory. In this case change the TMPDIR
definition in rundir or set appropriate write permissions.
AUTHOR
Slaven Rezic - slaven@rezic.de