The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

bin/z3950_centroid.pl - extract centroid from NWI/EWI objects

SYNOPSIS

  bin/z3950_centroid.pl [-d] [-h hashtemp1] [-H hashtemp2]
    [-s serverhandle] < filename

DESCRIPTION

This Perl program creates a WHOIS++ compatible centroid from the attributes and values in a collection of NWI/EWI index objects, as created by the Combine harvester. Note that you should give a server handle when invoking this program, or the default value of 'undefined' will be used.

The Combine harvester creates its database in a two level directory hierarchy, with a separate file for each indexed object. You can combine them together for feeding into this program using a simple find invocation :-

  find HDB/hdb -type f -exec cat {} \; | z3950_centroid.pl -s test01

Or perhaps something more complicated!

OPTIONS

-d

Turn on debugging output - very verbose!

-h hashtemp1

Filename to use for temporary DB hash database used in the construction of the centroid. This defaults to hashtemp1, and is used to hold a list of the document titles being indexed.

-H hashtemp2

Filename to use for temporary DB hash database used in the construction of the centroid. This defaults to hashtemp2, and is used to hold a list of the terms in the document text being indexed.

-s serverhandle

BUGS

We could traverse the filesystem and look at the timestamps on the index objects - this would let us do a relative centroid.

We don't do anything special about character sets/encodings.

Not up to date with current CIP specifications - this is really intended for use with a WHOIS++ server which speaks the old RFC 1913 indexing protocol.

SEE ALSO

"harvest_centroid.pl" in bin, RFC 1913

COPYRIGHT

Copyright (c) 1988, Martin Hamilton <martinh@gnu.org> and Jon Knight <jon@net.lut.ac.uk>. All rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

It was developed by the Department of Computer Studies at Loughborough University of Technology, as part of the ROADS project. ROADS is funded under the UK Electronic Libraries Programme (eLib), the European Commission Telematics for Research Programme, and the TERENA development programme.

AUTHOR

Martin Hamilton <martinh@gnu.org>