NAME

PICA::Parser - Parse PICA+ data

SYNOPSIS

use PICA::Parser;

PICA::Parser->parsefile( $filename_or_handle ,
    Field => \&field_handler,
    Record => \&record_handler
);

PICA::Parser->parsedata( $string_or_function ,
    Field => \&field_handler,
    Record => \&record_handler,
    Limit => 5
);

$parser = PICA::Parser->new(
    Record => \&record_handler,
    Proceed => 1
);
$parser->parsefile( $filename );
$parser->parsedata( $picadata );
print $parser->counter() . " records read.\n";

You can also parsedata and parsefile:

use PICA::Parser qw(parsefile);

parsefile( $filename, Record => sub {
    my $record = shift;
    print $record->to_string() . "\n";
});

Both function return the parser, so you can use constructs like

my @records = parsefile($filename)->records();

To parse just one record, use

my ($record) = parsefile($filename, Limit => 1)->records();

DESCRIPTION

This module can be used to parse normalized PICA+ and PICA+ XML. The conrete parsers are implemented in PICA::PlainParser and PICA::XMLParser.

CONSTRUCTOR

new ( [ %params ] )

Creates a Parser to store common parameters (see below). These parameters will be used as default when calling parsefile or parsedata. Note that you do not have to use the constructor to use PICA::Parser. These two methods do the same:

my $parser = PICA::Parser->new( %params );
$parser->parsefile( $file );

PICA::Parser->parsefile( $file, %params );

Common parameters that are passed to the specific parser are:

Field

Reference to a handler function for parsed PICA+ fields. The function is passed a PICA::Field object and it should return it back to the parser. You can use this function as a simple filter by returning a modified field. If no PICA::Field object is returned then it will be skipped.

Record

Reference to a handler function for parsed PICA+ records. The function is passed a PICA::Record. If the function returns a record then this record will be stored in an array that is passed to Collection. You can use this method as a filter by returning a modified record.

Error

This handler is used if an error occured while parsing, for instance if data does not look like PICA+. By default errors are just ignored.

TODO: Count errors and return the number of errors in the errors method.

Offset

Skip a given number of records. Default is zero.

Limit

Stop after a given number of records. Non positive numbers equal to unlimited.

Dumpformat

If set to true, parse dumpformat (no newlines).

Proceed

By default the internal counters are reset and all read records are forgotten before each call of parsefile and parsedata. If you set the Proceed parameter to a true value, the same parser will be reused without reseting counters and read record.

METHODS

parsefile ( $filename-or-handle [, %params ] )

Parses pica data from a file, specified by a filename or filehandle. The default parser is PICA::PlainParser. If the filename extension is .xml or .xml.gz or the Format parameter set to xml then PICA::XMLParser is used instead.

PICA::Parser->parsefile( "data.picaplus", Field => \&field_handler );
PICA::Parser->parsefile( \*STDIN, Field => \&field_handler, Format='XML' );
PICA::Parser->parsefile( "data.xml", Record => sub { ... } );

See the constructor new for a description of parameters.

parsedata ( $data [, %params ] )

Parses data from a string, array reference, or function and returns the PICA::Parser that was used. See parsefile and the parsedata method of PICA::PlainParser and PICA::XMLParser for a description of parameters. By default PICA::PlainParser is used unless there the Format parameter set to xml.

PICA::Parser->parsedata( $picastring, Field => \&field_handler );
PICA::Parser->parsedata( \@picalines, Field => \&field_handler );

# called as a function
my @records = parsedata( $picastring )->records();

If data is a PICA::Record object, it is directly passed to the record handler without re-parsing. See the constructor new for a description of parameters.

records ( )

Get an array of the read records (as returned by the record handler which can thus be used as a filter). If no record handler was specified, records will be collected unmodified. For large record sets it is recommended not to collect the records but directly use them with a record handler.

counter ( )

Get the number of read records so far. Please note that the number of records as returned by the records method may be lower because you may have filtered out some records.

INTERNAL METHODS

_getparser ( [ %params] )

Internal method to get a new parser of the internal parser of this object. By default, gives a PICA:PlainParser unless you specify the Format parameter. Single parameters override the default parameters specified at the constructor (except the the Proceed parameter).

TODO

Better logging needs to be added, for instance a status message every n records. This may be implemented with multiple (piped?) handlers per record. Error handling of broken records should also be improved.

AUTHOR

Jakob Voss <jakob.voss@gbv.de>

LICENSE

Copyright (C) 2007-2009 by Verbundzentrale Goettingen (VZG) and Jakob Voss

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.