Tutorial for MARC::MIR DSL

Warnings

for the moment, everything is found in MARC::MIR namespace, this will change. Also t/* is empty (this is clearly the next step) the scripts that uses MARC::MIR are yet working.

What is MARC::MIR

I dealt with lot of MARC records in the past (mainly from/to iso2709 files) and was really annoyed by the existing libraries. A MARC record is a very simple structure. every library i saw missed this point by wrapping records into painfull OO approach, this make the MARC manipulation anoying and slow. Perl is awesome for manipulate datastructures: i wanted those power and simplicity back!

Simple datastructure

A MIR record is an array containing a leader and the MIR field_collection

[ $leader, [@fields] ]

A MIR field_collection is a collection of either data field or control field.

A MIR control field is a tag and a value.

[ '001', '1231313145' ]

A MIR data field is a tag, a MIR subfield_collection and an optionnal MIR indicator. The MIR indicator is a 2 char string or a 2 elements array. so all those MIR field_collection are valid.

[ $tag, [@subfield] ]
[ $tag, [@subfield], "  " ]
[ $tag, [@subfield], [' ',' '] ]

a MIR subfield_collection is a list of pairs tag/value.

This is an example of a complete MIR record:

    [ "Header" => 
	[ [ '001' => '2344564564' ] # this is the ID
	,   [ '856'
	    , [ [ q => "jpeg" ]
	      , [ z => "cover from original version" ] 
	      , [ u => "http://localhost/img/" ] 
	      ]
	    ]
	]
    ]

the DSL

to make things more readable and less error prone, we also add a DSL. Every keywords of this DSL works the same way. FIXME : explain.

also, iso2709_records_of is an helper that stream the records of an ISO2709 formatted file.

some examples

the perfect boilerplate

use autodie;
use Modern::Perl;
use MARC::MIR;

print all the ids of the records (assuming the id is in 001, the common case)

now    { say record_id from_iso2709 } iso2709_records_of "biblio.marc";

or

marawk { say $ID } "biblio.marc";

remove every 9.. fields

    now {
	$_ = from_iso2709;
	with_fields { @$_ = grep { (tag) !~ /^9/ } @$_ };
	print to_iso2709;
    } iso2709_records_of "biblio.marc";

every 856$q must be jpeg

    now {
	$_ = from_iso2709;
	map_fields {
	    tag eq '856' and map_subfields {
		(tag) eq 'z' and with_value { $_ = 'jpeg' }
	    }
	}
	with_fields { @$_ = grep_fields { (tag) !~ /^9/ } @$_ };
    } iso2709_records_of "biblio.marc";

or

marawk { map_values { $_ = 'jpeg' } [qw< 856 z >] } "biblio.marc"

collect every 856$z by id

    use Modern::Perl;
    use YAML;
    use MARC::MIR;

    my %seen;
    marawk {
	map_values { push @{ $seen{$ID} }, $_ } [qw< 856 z >]
    } "data/*.RAW";
    say YAML::Dump \%seen;

marawk