NAME
Alvis::Pipeline - Perl extension for passing XML documents along the Alvis pipeline
SYNOPSIS
use Alvis::Pipeline;
$in = new Alvis::Pipeline::Read(host => "harvester.alvis.info",
port => 16716,
spooldir => "/home/alvis/spool");
$out = new Alvis::Pipeline::Write(port => 29168);
while ($xmlDOM = $in->read(1)) {
$transformed = process($xmlDOM);
$out->write($transformed);
}
DESCRIPTION
This module provides a simple means for components in the Alvis pipeline to pass documents between themselves without needing to know about the underlying transfer protocol. Pipe objects may be created either for reading or writing; components in the middle of the pipeline will create one of each. Pipes support exactly one method, which is either read()
or write()
depending on the type of the pipe. The granularity of reading and writing is the XML document; neither smaller fragments nor larger aggregates can be transferred.
The documents expected to pass through this pipeline are those representing documents acquired for, and being analysed by, Alvis. These documents are expressed as XML contructed according to the specifications described in the Metadata Format for Enriched Documents. However, while this is the motivating example pipeline that led to the creation of this module, there is no reason why other kinds of documents should not also be passed through pipeline using this software.
METHODS
new()
$in = new Alvis::Pipeline::Read(host => "harvester.alvis.info",
port => 16716,
spooldir => "/home/alvis/spool");
$out = new Alvis::Pipeline::Write(port => 29168);
Creates a new pipeline, either for reading or for writing. Any number of name-value pairs may be passed as parameters. Among these, most are optional but some are mandatory:
Read-pipes must specify both the
host
andport
of the component that they will read from, andspooldir
, a directory that is writable to the user the process is running as. (When files become available by being written down a write-pipe, they are immediately read in the background, then stored in the specified spool directory until picked up by a reader.)Pipes may specify
loglevel
[default 0]: higher levels providing some commentary on under-the-hood behaviour.
option()
$old = $pipe->option("foo");
$pipe->option(bar => 23);
Can be used to set the value for a specific option, or to retrieve its value.
read()
# Read-pipes only
$xmlDOM = $in->read($block);
Reads an XML document from the specified inbound pipe, and returns a DOM tree representing it. If there is no document ready to read, it either returns an undefined value (if no argment is provided, or if the argument is false) or blocks if the argument is provided and true. read()
throws an exception if an error occurs.
Once a document has been read in this way, it will no longer be available for subsequent read()
s, so a sequence of read()
calls will read all the available records one at a time.
write()
# Write-pipes only
$out->write($xmlDocument);
Writes an XML document to the specified outbound pipe. The document may be passed in either as a DOM tree (XML::LibXML::Element
) or a string containing the text of the document. Throws an exception if an error occurs.
close()
$pipe->close();
Closes a pipe, after which no further reading or writing may be done on it. This is important for read-pipes, as it frees up the Internet port that the server is listening on.
SEE ALSO
Alvis Task T3.2 - Metadata Format for Enriched Documents. Milestone M3.2 - Month 12 (December 2004). Includes a useful overview of the Alvis processing pipeline. http://www.miketaylor.org.uk/alvis/t3-2/m3-2.html
AUTHOR
Mike Taylor, <mike@indexdata.com>
COPYRIGHT AND LICENSE
Copyright (C) 2005 by Index Data ApS.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available.