NAME
Net::Z3950::Tutorial - tutorial for the Net::Z3950 module
SYNOPSIS
Apparently, every POD document has to have a SYNOPSIS. So here's one.
DESCRIPTION
Net::Z3950
is a Perl module for writing Z39.50 clients. (If you want to write a Z39.50 server, you want the Net::Z3950::SimpleServer
module.)
Its goal is to hide all the messy details of the Z39.50 protocol - at least by default - while providing access to all of its glorious power. Sometimes, this involves revealing the messy details after all, but at least this is the programmer's choice. The result is that writing Z39.50 clients works the way it should according my favourite of the various Perl mottos: ``Simple things should be simple, and difficult things should be possible.''
If you don't know what Z39.50 is, then the best place to find out is at http://lcweb.loc.gov/z3950/agency/ the web site of the Z39.50 Maintenance Agency. Among its many other delights, this site contains a complete downloadable soft-copy of the standard itself. In briefest summary, Z39.50 is the international standard for distributed searching and retrieval.
A VERY SIMPLE CLIENT
The Net::Z3950
distribution includes a couple of sample clients in the samples
directory. The simplest of them, trivial.pl
reads as follows:
use Net::Z3950;
$conn = new Net::Z3950::Connection('indexdata.dk', 210,
databaseName => 'gils');
$rs = $conn->search('mineral');
print "found ", $rs->size(), " records:\n";
my $rec = $rs->record(1);
print $rec->render();
This complete program retrieves from the database called ``gils'' on the Z39.50 server on port 210 of indexdata.dk
the first record matching the search ``mineral'', and renders it in human-readable form. Typical output would look like this:
6 fields:
(1,1) 1.2.840.10003.13.2
(1,14) "2"
(2,1) {
(1,19) "UTAH EARTHQUAKE EPICENTERS"
(3,Acronym) "UUCCSEIS"
}
(4,52) "UTAH GEOLOGICAL AND MINERAL SURVEY"
(4,1) "ESDD0006"
(1,16) "198903"
HOW IT WORKS
Let's pick the trivial client apart line by line (it won't take long!)
use Net::Z3950;
This line simply tells Perl to pull in the Net::Z3950
module - a prerequisite for using types like Net::Z3950::Connection
.
$conn = new Net::Z3950::Connection('indexdata.dk', 210,
databaseName => 'gils');
Creates a new connection to the Z39.50 server on port 210 of the host indexdata.dk
, noting that searches on this connection will default to the database called ``gils''. A reference to the new connection is stored in $conn
.
$rs = $conn->search('mineral');
Performs a single-word search on the connection referenced by $conn
(in the previously established default database, ``gils''.) In response, the server generates an result set, notionally containing all the matching records; a reference to the new connection is stored in $rs
.
print "found ", $rs->size(), " records:\n";
Prints the number of records in the new result set $rs
.
my $rec = $rs->record(1);
Fetches from the server the first record in the result set $rs
, requesting the default record syntax (GRS-1) and the default element set (brief, ``b''); a reference to the newly retrieved record is stored in $rec
.
print $rec->render();
Prints a human-readable rendition of the record $rec
. The exact format of the rendition is dependent on issues like the record syntax of the record that the server sent.
MORE COMPLEX BEHAVIOUR
Searching
Searches may be specified in one of several different syntaxes. The default syntax is so-called Prefix Query Notation, or PQN, a bespoke format invented by Index Data to map simply to the Z39.50 type-1 query structure. A second is the Common Command Language (CCL) an international standard query language often used in libraries. The third is the Common Query Language (CQL) the query language used by SRW and SRU.
CCL queries may be interpreted on the client side and translated into a type-1 query which is forwarded to the server; or it may be sent ``as is'' for the server to interpret as it may. CQL queries may only be passed ``as is''.
The interpretation of the search string may be specified by passing an argument of -prefix
, -ccl
, -ccl2rpn
or -cql
to the search()
method before the search string itself, as follows:
Prefix Queries
$rs = $conn->search(-prefix => '@or rock @attr 1=21 mineral');
Prefix Query Notation is fully described in section 8.1 (Query Syntax Parsers) of the Yaz toolkit documentation, YAZ User's Guide and Reference.
Briefly, however, keywords begin with an @
-sign, and all other words are interpreted as search terms. Keywords include the binary operators @and
and @or
, which join together the two operands that follow them, and @attr
, which introduces a type=value expression specifying an attribute to be applied to the following term.
So:
fruit
searches for the term ``fruit'',@and fruit fish
searches for records containing both ``fruit'' and ``fish'',@or fish chicken
searches for records containing either ``fish'' or ``chicken'' (or both),@and fruit @or fish chicken
searches for records containing both ``fruit'' and at least one of ``fish'' or ``chicken''.@or rock @attr 1=21 mineral
searches for records either containing ``rock'' or ``mineral'', but with the ``mineral'' search term carrying an attribute of type 1, with value 21 (typically interpreted to mean that the search term must occur in the ``subject'' field of the record.)
CCL Queries
$rs = $conn->search(-ccl2rpn => 'rock or su=mineral');
$rs = $conn->search(-ccl => 'rock or su=mineral');
CCL is formally specified in the international standard ISO 8777 (Commands for interactive text searching) and also described in section 8.1 (Query Syntax Parsers) of the Yaz toolkit documentation, YAZ User's Guide and Reference.
Briefly, however, there is a set of well-known keywords including and
, or
and not
. Words other than these are interpreted as search terms. Operating grouping (precedence) is specified by parentheses, and the semantics of a search term may be modified by prepending one or more comma-separated qualifiers qualifiers and an equals sign.
So:
fruit
searches for the term ``fruit'',fruit and fish
searches for records containing both ``fruit'' and ``fish'',fish or chicken
searches for records containing either ``fish'' or ``chicken'' (or both),fruit and (fish or chicken)
searches for records containing both ``fruit'' and at least one of ``fish'' or ``chicken''.rock or su=mineral
searches for records either containing ``rock'' or ``mineral'', but with the ``mineral'' search term modified by the qualifier ``su'' (typically interpreted to mean that the search term must occur in the ``subject'' field of the record.)
For CCL searches sent directly to the server (query type ccl
), the exact interpretation of the qualifiers is the server's responsibility. For searches compiled on the client side (query side ccl2rpn
) the interpretation of the qualifiers in terms of type-1 attributes is determined by the contents of a file called ### not yet implemented. The format of this file is described in the Yaz documentation.
CQL Queries
$rs = $conn->search(-cql => 'au-(kernighan and ritchie)');
CQL syntax is very similar to that of CCL.
Setting Search Defaults
As an alternative to explicitly specifying the query type when invoking the search()
method, you can change the connection's default query type using its option()
method:
$conn->option(querytype => 'prefix');
$conn->option(querytype => 'ccl');
$conn->option(querytype => 'ccl2rpn');
The connection's current default query type can be retrieved using option()
with no ``value'' argument:
$qt = $conn->option('querytype');
The option()
method can be used to set and get numerous other defaults described in this document and elsewhere; this method exists not only on connections but also on managers (q.v.) and result sets.
Another important option is databaseName
, whose value specifies which database is to be searched.
Retrieval
By default, records are requested from the server one at a time; this can be quite slow when retrieving several records. There are two ways of improving this. First, the present()
method can be used to explicitly precharge the cache. Its parameters are a start record and record count. In the following example, the present() is optional and merely makes the code run faster:
$rs->present(11, 5) or die ".....";
foreach my $i (11..15) {
my $rec = $rs->record($i);
...
}
The second way is with the prefetch
option. Setting this to a positive integer makes the record()
method fetch the next N records and place them in the cache if the the current record isn't already there. So the following code would cause two bouts of network activity, each retrieving 10 records.
$rs->option(prefetch => 10);
foreach my $i (1..20) {
my $rec = $rs->record($i);
...
}
In asynchronous mode, present()
and prefetch
merely cause the records to be scheduled for retrieval.
Element Set
The default element set is ``b'' (brief). To change this, set the result set's elementSetName
option:
$rs->option(elementSetName => "f");
Record Syntax
The default record syntax preferred by the Net::Z3950
module is GRS-1 (the One True Record syntax). If, however, you need to ask the server for a record using a different record syntax, then the way to do this is to set the preferredRecordSyntax
option of the result set from which the record is to be fetched:
$rs->option(preferredRecordSyntax => "SUTRS");
The record syntaxes which may be requested are listed in the Net::Z3950::RecordSyntax
enumeration in the file Net/Z3950.pm
; they include Net::Z3950::RecordSyntax::GRS1
, Net::Z3950::RecordSyntax::SUTRS
, Net::Z3950::RecordSyntax::USMARC
, Net::Z3950::RecordSyntax::TEXT_XML
, Net::Z3950::RecordSyntax::APPLICATION_XML
and Net::Z3950::RecordSyntax::TEXT_HTML
(As always, option()
may also be invoked with no ``value'' parameter to return the current value of the option.)
Scanning
### Note to self - write this section!
WHAT TO DO WITH YOUR RECORDS
Once you've retrieved a record, what can you do with it?
There are two broad approaches. One is just to display it to the user: this can always be done with the render()
method, as used in the sample code above, whatever the record syntax of the record.
The more sophisticated approach is to perform appropriate analysis and manipulation of the raw record according to the record syntax. The raw data is retrieved using the rawdata()
method, and the record syntax can be determined using the universal isa()
method:
$raw = $rec->rawdata();
if ($rec->isa('Net::Z3950::Record::GRS1')) {
process_grs1_record($raw);
elsif ($rec->isa('Net::Z3950::Record::USMARC')) {
process_marc_record($raw);
} # etc.
MARC RECORDS
For further manipulation of MARC records, we recommend the existing MARC module in Ed Summers's directory at CPAN, http://cpan.valueclick.com/authors/id/E/ES/ESUMMERS/
GRS-1 RECORDS
The raw data of GRS-1 records in the Net::Z3950
module closely follows the structure of physcial GRS-1 records - see Appendices REC.5 (Generic Record Syntax 1), TAG (TagSet Definitions and Schemas) and RET (Z39.50 Retrieval) of the standard more details.
The raw GRS-1 data is intended to be more or less self-describing, but here is a summary.
The raw data is a reference to an array of elements, each representing one of the fields of the record.
Each element is a
Net::Z3950::APDU::TaggedElement
object. These objects support the accessor methodstagType()
,tagValue()
,tagOccurrence()
andcontent()
; the first three of these return numeric values, or strings in the less common case of string tag-values.The
content()
of an element is an object of typeNet::Z3950::ElementData
. Itswhich()
method returns a constant indicating the type of the content, which may be any of the following:Net::Z3950::ElementData::Numeric
indicates that the content is a number; access it via thenumeric()
method.Net::Z3950::ElementData::String
indicates that the content is a string of characters; access it via thestring()
method.Net::Z3950::ElementData::OID
indicates that the content is an OID, represented as a string with the components separated by periods (``.
''); access it via theoid()
method.Net::Z3950::ElementData::Subtree
is a reference to anotherNet::Z3950::Record::GRS1
object, enabling arbitrary recursive nesting; access it via thesubtree()
method.
In the future, we plan to take you away from all this by introducing a Net::Z3950::Data
module which provides a DOM-like interface for walking hierarchically structured records independently of their record syntax. Keep watchin', kids!
CHANGING SESSION PARAMETERS
As with customising searching or retrieval behaviour, whole-session behaviour is customised by setting options. However, this needs to be done before the session is created, because the Z39.50 protocol doesn't provide a method for changing (for example) the preferred message size of an existing connection.
In the Net::Z3950
module, this is done by creating a manager - a controller for one or more connections. Then the manager's options can be set; then connections which are opened through the manager use the specified values for those options.
As a matter of fact, every connection is made through a manager. If one is not specified in the connection constructor, then the ``default manager'' is used; it's automatically created the first time it's needed, then re-used for any other connections that need it.
Make or Find a Manager
A new manager is created as follows:
$mgr = new Net::Z3950::Manager();
Once the manager exists, a new connection can be made through it by specifying the manager reference as the first argument to the connection constructor:
$conn = new Net::Z3950::Connection($mgr, 'indexdata.dk', 210);
Or equivalently,
$conn = $mgr->connect('indexdata.dk', 210);
In order to retrieve the manager through which a connection was made, whether it was the implicit default manager or not, use the manager()
method:
$mgr = $conn->manager();
Set the Parameters
There are two ways to set parameters. One we have already seen: the option()
method can be used to get and set option values for managers just as it can for connections and result sets:
$pms = $mgr->option('preferredMessageSize');
$mgr->option(preferredMessageSize => $pms*2);
Alternatively, options may be passed to the manager constructor when the manager is first created:
$mgr = new Net::Z3950::Manager(
preferredMessageSize => 100*1024,
maximumRecordSize => 10*1024*1024,
preferredRecordSyntax => "GRS-1");
This is exactly equivalent to creating a ``vanilla'' manager with new Net::Z3950::Manager()
, then setting the three options with the option()
method.
Message Size Parameters
The preferredMessageSize
and maximumRecordSize
parameters can be used to specify values of the corresponding parameters which are proposed to the server at initialisation time (although the server is not bound to honour them.) See sections 3.2.1.1.4 (Preferred-message-size and Exceptional-message-size) and 3.3 (Message/Record Size and Segmentation) of the Z39.50 standard itself for details.
Both options default to one megabyte.
Implementation Identification
The implementationId
, implementationName
and implementationVersion
options can be used to control the corresponding parameters in initialisation request sent to the server to identify the client. The default values are listed below in the section OPTION INHERITANCE.
Authentication
The user
, pass
and group
options can be specified for a manager so that they are passed as identification tokens at initialisation time to any connections opened through that manager. The three options are interpreted as follows:
If
user
is not specified, then authentication is omitted (which is more or less the same as ``anonymous'' authentication).If
user
is specified but notpass
, then the value of theuser
option is passed as an ``open'' authentication token.If both
user
andpass
are specified, then their values are passed in an ``idPass'' authentication structure, together with the value ofgroup
if is it specified.
By default, all three options are undefined, so no authentication is used.
Character set and language negotiation
The charset
and language
options can be used to negotiate the character set and language to be used for connections opened through that manager. If these options are set, they are passed to the server in a character-negotition otherInfo package attached to the initialisation request.
OPTION INHERITANCE
The values of options are inherited from managers to connections, result sets and finally to records.
This means that when a record is asked for an option value (whether by an application invoking its option()
method, or by code inside the module that needs to know how to behave), that value is looked for first in the record's own table of options; then, if it's not specified there, in the options of the result set from which the record was retrieved; then if it's not specified there, in those of the connection across which the result set was found; and finally, if not specified there either, in the options for the manager through which the connection was created.
Similarly, option values requested from a result set are looked up (if not specified in the result set itself) in the connection, then the manager; and values requested from a connection fall back to its manager.
This is why it made sense in an earlier example (see the section Set the Parameters) to specify a value for the preferredRecordSyntax
option when creating a manager: the result of this is that, unless overridden, it will be the preferred record syntax when any record is retrieved from any result set retrieved from any connection created through that manager. In effect, it establishes a global default. Alternatively, one might specify different defaults on two different connections.
In all cases, if the manager doesn't have a value for the requested option, then a hard-wired default is used. The defaults are as follows. (Please excuse the execrable formatting - that's what pod2html
does, and there's no sensible way around it.)
die_handler
-
undef
A function to invoke ifdie()
is called within the main event loop. timeout
-
undef
The maximum number of seconds a manager will wait when itswait()
method is called. If the timeout elapses,wait()
returns an undefined value. Can not be set on a per-connection basis. async
-
0
(Determines whether a given connection is in asynchronous mode.) preferredMessageSize
-
1024*1024
maximumRecordSize
-
1024*1024
user
-
undef
pass
-
undef
group
-
undef
implementationId
-
'Mike Taylor (id=169)'
implementationName
-
'Net::Z3950.pm (Perl)'
implementationVersion
-
$Net::Z3950::VERSION
charset
-
undef
language
-
undef
querytype
-
'prefix'
databaseName
-
'Default'
smallSetUpperBound
-
0
(This and the next four options provide flexible control for run-time details such as what record syntax to use when returning records. See sections 3.2.2.1.4 (Small-set-element-set-names and Medium-set-element-set-names) and 3.2.2.1.6 (Small-set-upper-bound, Large-set-lower-bound, and Medium-set-present-number) of the Z39.50 standard itself for details.) largeSetLowerBound
-
1
mediumSetPresentNumber
-
0
smallSetElementSetName
-
'f'
mediumSetElementSetName
-
'b'
preferredRecordSyntax
-
'GRS-1'
responsePosition
-
1
(Indicates the one-based position of the start term in the set of terms returned from a scan.) stepSize
-
0
(Indicates the number of terms between each of the terms returned from a scan.) numberOfEntries
-
20
(Indicates the number of terms to return from a scan.) elementSetName
-
'b'
namedResultSets
-
1
indicating boolean true. This option tells the client to use a new result set name for each new result set generated, so that oldResultSet
objects remain valid. For the benefit of old, broken servers, this option may be set to 0, indicating that same result-set name,default
, should be used for each search, so that each search invalidates all existingResultSet
s.
Any other option's value is undefined.
ASYNCHRONOUS MODE
I don't propose to discuss this at the moment, since I think it's more important to get the Tutorial out there with the synchronous stuff in place than to write the asynchronous stuff. I'll do it soon, honest. In the mean time, let me be clear: the asynchronous code itself is done and works (the synchronous interface is merely a thin layer on top of it) - it's only the documentation that's not yet here.
### Note to self - write this section!
NOW WHAT?
This tutorial is only an overview of what can be done with the Net::Z3950
module. If you need more information that it provides, then you need to read the more technical documentation on the individual classes that make up the module - Net::Z3950
itself, Net::Z3950::Manager
, Net::Z3950::Connection
, Net::Z3950::ResultSet
and Net::Z3950::Record
.
AUTHOR
Mike Taylor <mike@indexdata.com>
First version Sunday 28th January 2001.