NAME
parsepica - fetch, parse and transform PICA+ data
VERSION
version 0.585
SYNOPSIS
parsepica [options] [input file(s) or SRU-Server(s) and queries(s)]
DESCRIPTION
This script provides a simple command line client to fetch and transform PICA+ records. You can parse and transform local files (compressed .gz
files can directly be read) or query records from a server via various protocols. You can also specify a configuration file for PICA::Source which includes a pointer to an SRU, Z39.50, PSI, or unAPI source.
The records can then be written to a file or STDOUT in PICA+ or PICA/XML format. Instead of writing full records you can select single PICA+ fields. Selecting fields with parsepica is around half as fast as using grep, but grep does not really parse and check for wellformedness.
By default input is read from STDIN and written to STDOUT ('-') without logging. On request logging information is printed to STDOUT or to a specified logfile. Records that cannot be parseded produce error messages to STDERR.
OPTIONS
-input FILE file
with
input files on
each
line (
'-'
: STDIN)
-files FILE
read
input files from another file (
'-'
: STDIN)
-output FILE
all valid records to a
given
file (
'-'
: STDOUT)
-xml [FILE]
records in XML
-pxml [FILE]
records in pretty XML (
with
linebreaks)
-pretty [FILE]
records in pretty
format
-null supress record output
-quiet supress logging
-
select
FIELD
select
a specific field or subfield (not
if
XML output)
-count
simple statistics
-stats 0|1|2
full statistics (1: fields, 2: subfields)
-config FILE
read
configuration from a file (
'-'
: search
default
file)
-auto
use
default
config file
$PICASOURCE
or ./pica.conf
-
log
[FILE]
logging to a
given
file (
'-'
: STDOUT,
default
)
-help brief help message
-limit N limit the result set to N records (only
for
SRU)
-man full documentation
with
examples
EXAMPLES
- parsepica file1 -o file2
-
Read from 'file1' and print parseable records to 'file2'
- parsepica file1 -px file2.xml
-
Parse from 'file1' and pretty print XML format to 'file2.xml'.
- parsepica http://gso.gbv.de/sru/DB=2.1/ pica.isb=3-423-31039-1
-
Get records with ISBN 3-423-31039-1 via SRU.
- parsepica -c pica.isb=3-423-31039-1
-
Get records with ISBN 3-423-31039-1 via SRU if the default config file contains
SRU =.http://gso.gbv.de/sru/DB=2.1/
. - parsepica -se 021A -o - -q picadata
-
Select all fields '021A' from 'picadata' and write to STDOUT.
- parsepica -log -count -null file1
-
Parse from 'file1' and count fileds
- parsepica -log -stat 2 file1
-
Parse from 'file1' and print detailed statistics
LIMITATIONS
Error handling for broken records is not fully implemented. If you want to parse PICA+ records downloaded via WinIBW, you may need to first clean them with the script winibw2pica.
The limit parameter should also be implemented for other sources but SRU and an offset parameter would be useful. Fetching records via other protocols but SRU has not been tested. The statistics method can be improved a lot.
AUTHOR
Jakob Voß <voss@gbv.de>
COPYRIGHT AND LICENSE
This software is copyright (c) 2014 by Verbundzentrale Goettingen (VZG) and Jakob Voss.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.