Tutorial for MARC::MIR DSL
Warnings
for the moment, everything is found in MARC::MIR namespace, this will change. Also t/* is empty (this is clearly the next step) the scripts that uses MARC::MIR are yet working.
What is MARC::MIR
I dealt with lot of MARC records in the past (mainly from/to iso2709 files) and was really annoyed by the existing libraries. A MARC record is a very simple structure. every library i saw missed this point by wrapping records into painfull OO approach, this make the MARC manipulation anoying and slow. Perl is awesome for manipulate datastructures: i wanted those power and simplicity back!
Simple datastructure
A MIR record is an array containing a leader and the MIR field_collection
[ $leader, [@fields] ]
A MIR field_collection is a collection of either data field or control field.
A MIR control field is a tag and a value.
[ '001', '1231313145' ]
A MIR data field is a tag, a MIR subfield_collection and an optionnal MIR indicator. The MIR indicator is a 2 char string or a 2 elements array. so all those MIR field_collection are valid.
[ $tag, [@subfield] ]
[ $tag, [@subfield], " " ]
[ $tag, [@subfield], [' ',' '] ]
a MIR subfield_collection is a list of pairs tag/value.
This is an example of a complete MIR record:
[ "Header" =>
[ [ '001' => '2344564564' ] # this is the ID
, [ '856'
, [ [ q => "jpeg" ]
, [ z => "cover from original version" ]
, [ u => "http://localhost/img/" ]
]
]
]
]
the DSL
to make things more readable and less error prone, we also add a DSL. Every keywords of this DSL works the same way. FIXME : explain.
also, iso2709_records_of is an helper that stream the records of an ISO2709 formatted file.
some examples
the perfect boilerplate
use autodie;
use Modern::Perl;
use MARC::MIR;
print all the ids of the records (assuming the id is in 001, the common case)
now { say record_id from_iso2709 } iso2709_records_of "biblio.marc";
or
marawk { say $ID } "biblio.marc";
remove every 9.. fields
now {
$_ = from_iso2709;
with_fields { @$_ = grep { (tag) !~ /^9/ } @$_ };
print to_iso2709;
} iso2709_records_of "biblio.marc";
every 856$q must be jpeg
now {
$_ = from_iso2709;
map_fields {
tag eq '856' and map_subfields {
(tag) eq 'z' and with_value { $_ = 'jpeg' }
}
}
with_fields { @$_ = grep_fields { (tag) !~ /^9/ } @$_ };
} iso2709_records_of "biblio.marc";
or
marawk { map_values { $_ = 'jpeg' } [qw< 856 z >] } "biblio.marc"
collect every 856$z by id
use Modern::Perl;
use YAML;
use MARC::MIR;
my %seen;
marawk {
map_values { push @{ $seen{$ID} }, $_ } [qw< 856 z >]
} "data/*.RAW";
say YAML::Dump \%seen;