The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

bin/mkinv.pl - build ROADS database index

SYNOPSIS

  bin/mkinv.pl [-adhu] [-i directory] [-m minsize]
    [-s directory] [-t directory] [-x stoplist]
    [-y stopattr] [-z alltemps] [-I indexspecial]
    [-P percentencode] [handle1 handle2 ... handleN]

DESCRIPTION

The mkinv.pl program generates an index of IAFA templates which can be searched using the search.pl and admin.pl CGI programs. The index is used by these programs to rapidly match keywords and boolean expressions in a large number of IAFA templates.

OPTIONS

A number of options are available to the mkinv.pl program to control where it looks for its files:

-a

Index all the templates in the specified source directory.

-d

Turn on debugging mode.

-h

Provides online help and exits.

-i directory

Set the absolute pathname of the directory in which the resulting inverted index is to be placed.

-m minsize

Don't index terms which are shorter than this - default is two characters.

-s directory

Set the absolute pathname of the directory containing the source IAFA templates.

-t directory

Set the absolute pathname of the directory to be used for intermediate temporary files. This option is useful if you find that you are running out of room in the system default temporary directory during particularly large indexing runs.

-u

Unlink temporary files when in debug mode. Gives visual feedback without leaving lots of unsightly junk lying around.

-x stoplist

The absolute pathname of a file containing a list of terms which should not be indexed.

-y stopattr

The absolute pathname of a file containing a list of attributes which should not be indexed.

-z alltemps

The absolute pathname of the file to which the list of template handle to filename mappings should be saved.

-I indexspecial

The absolute pathname of a file containing a list of regular expressions which will be used to override the standard tokenisation algorithm used by the ROADS indexer - see below for examples. This is needed when indexing URIs, to avoid the standard ROADS tokenising behaviour breaking the URI up into little chunks.

-P percentencode

The absolute pathname of a file containing URI::Escape patterns on a per attribute basis - see below for examples. This is needed to percent escape URIs in the ROADS index so that they can be matched when a WHOIS++ search is done.

If the -a option is not used, the mkinv.pl script expects one or more filenames containing IAFA templates to be given. These files are then processed, and all the templates in them are indexed.

FILES

config/indexspecial - list of attributes (regular expression pattern matches) which should be percent escaped, e.g.

  URI:\s+
  Description-URI:\s+

config/percentencode - lists the characters which should be percent escaped for a particular attribute, e.g.

  URI:^-A-Za-z0-9

causes all characters other than alphanumerics and - to be percent escaped when the index is built.

config/stopattr - default list of attributes to exclude from the index.

config/stoplist - default list of terms to exclude from the index.

guts/index* - index files themselves.

guts/alltemps - list of template handle to filename mappings.

source - the source templates themselves.

SEE ALSO

"admin.pl" in admin-cgi, "deindex.pl" in bin, "deindex.pl" in admin-cgi, "search.pl" in cgi-bin, "mktemp.pl" in admin-cgi

BUGS

The indexer will only correctly index IAFA templates that have a Template-Type attribute first and a Handle attribute second. All other attributes can be in any order. All templates generated by the ROADS software are in this format but the actual IAFA Internet Draft is not as strict. If you are processing templates derived from outside the ROADS system, be sure to ensure that these conditions hold before attempting to index them with mkinv.pl.

COPYRIGHT

Copyright (c) 1988, Martin Hamilton <martinh@gnu.org> and Jon Knight <jon@net.lut.ac.uk>. All rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

It was developed by the Department of Computer Studies at Loughborough University of Technology, as part of the ROADS project. ROADS is funded under the UK Electronic Libraries Programme (eLib), the European Commission Telematics for Research Programme, and the TERENA development programme.

AUTHORS

Jon Knight <jon@net.lut.ac.uk>, Martin Hamilton <martinh@gnu.org>