NAME

parsepica - parse PICA+ data and print summary information

SYNOPSIS

parsepica [options] [file(s) or SRU-Server(s) and queries(s)..]

OPTIONS

-help          brief help message
-man           full documentation with examples
-log FILE      print logging to a given file ('-': STDOUT, default)
-input FILE    file with input files on each line ('-': STDIN)
-output FILE   print all valid records to a given file ('-': STDOUT)
-xml           output of records in XML
-pretty        pretty output (useful for PICA XML)
-quiet         supress logging
-select FIELD  select a specific field or subfield (no XML output possible yet)

Not fully implemented yet: -sru SRU fetch records via SRU. command line arguments are cql statements instead of files -z3950 fetch records via Z39.50

DESCRIPTION

This script demonstrates how to use the Perl PICA module. It can be used to check and count records. Input files can be specified as arguments or from an input file list. Compressed files (.gz) can directly be read. If no input file is specified then input is read from STDIN.

Logging information is printed to STDOUT (unless quiet mode is set) or to a specified logfile. Read records can be written back to a given file or to STDOUT ('-') . Records that cannot be parseded produce error messages to STDERR.

Selecting fields with parsepica is around half as fast as using grep, but grep does not really parse and check for wellformedness.

EXAMPLES

parsepica picadata -o checkedrecords

Read records from 'picadata' and print parseable records to 'checkedrecords'.

parsepica picadata -s 021A -o - -q

Select all fields '021A' from 'picadata' and write to STDOUT.

parsepica http://gso.gbv.de/sru/DB=2.1/ pica.isb=3-423-31039-1

Get records with ISBN 3-423-31039-1 via SRU.

TODO

Error handling needs to be implemented to collect broken records.

Examples to implement:

parsepica -b errors picadata

Parse records in picadata. The number of records will be reported.

parsepica -out checked -quiet picadata.gz

Parse records in picadata.gz. Print records that are wellformed to checked and the other records to errors. Supress any messages.

AUTHOR

Jakob Voss jakob.voss@gbv.de