NAME
MARC::Transform - Perl module to transform a MARC record using a YAML configuration file
VERSION
Version 0.001001
SYNOPSIS
Perl script:
use MARC::Transform;
# For this synopsis, we create a small record:
my $record = MARC::Record->new();
$record->insert_fields_ordered( MARC::Field->new(
'501', '', '',
'a' => 'foo',
'b' => '1',
'c' => 'bar' ) );
print "--init record--\n". $record->as_formatted ."\n";
# Here we load our YAML configuration file:
open my $yamls, '< conf.yaml' or die "can't open file: $!";
my @yaml = YAML::LoadFile($yamls);
# And we transform our record with our YAML:
$record = MARC::Transform->new ( $record, \@yaml );
print "\n--transformed record--\n". $record->as_formatted ."\n";
conf.yaml:
---
condition : $f501a eq "foo"
create :
f502a : New 502a subfield's value
update :
$f501b : \&LUT("$this")
LUT :
1 : first
2 : second value in this LUT (LookUp Table)
---
delete : f501c
Result (with $record->as_formatted
):
--init record--
LDR
501 _afoo
_b1
_cbar
--transformed record--
LDR
501 _afoo
_bfirst
502 _aNew 502a subfield's value
DESCRIPTION
This is a Perl module to transform a MARC record using a YAML configuration file.
It allows you to create , update , delete , duplicate fields and subfields of a record. You can also use scripts and lookup tables. You can specify conditions to execute these actions.
All conditions, actions, functions and lookup tables are defined in the YAML.
MARC::Transform use MARC::Record.
METHOD
new()
$record = MARC::Transform->new($record,\@yaml);
This is the only method you'll use. It takes a MARC::Record and a YAML arrayref as arguments.
Verbose mode
Each YAML rule (see basis below to understand what is a rule) generates a script that is evaluated, in the record, for each field and subfield specified in the condition (If there is a condition). By adding an argument 1 to the method, it displays the generated script. This can be useful to understand what is happening:
$record = MARC::Transform->new($record,\@yaml,1);
YAML
Basis
- YAML is divided in rules (separated by --- ), each rule is executed one after the other, rules whithout condition will allways be executed:
---
condition : $f501a eq "foo"
create :
f600a : new field value
---
delete : f501c
---
- conditions are written in perl, which allows great flexibility. They must be defined with condition :
condition : ($f501a=~/foo/ and $f503a=~/bar/) or ($f102a eq "bib")
# if a 501$a and 503$a contain foo and bar, or if a 102$a = bib
- Conditions test records field by field (only for fields defined in the condition)
For example, this means, that if we have more '501' fields in the record, if our condition is $f501a eq "foo" and $f501b eq "bar"
, that condition will be true only if a '501' field has a 'a' subfield = "foo" AND a 'b' subfield = 'bar' (it will be false if there is a '501' field with a 'a' subfield = "foo" and ANOTHER '501' field with a 'b' subfield = "bar").
- It's possible to run more than one different actions in a single rule:
---
condition : $f501a eq "foo"
create :
f600a : new field value
delete : f501c
---
- The order in which actions are written does not matter. Actions will always be executed in the following order:
create
duplicatefield
forceupdate
forceupdatefirst
update
updatefirst
execute
delete
- Each rule can be divided into sub-rules (separated by - ) similar to 'if,elsif' or 'switch,case' scripts. If the first sub-rule's condition is true, other sub-rules will not be read.
---
-
condition : $f501a eq "foo"
create :
f502a : value if foo
-
condition : $f501a eq "bar"
create :
f502a : value elsif bar
-
create :
f502a : value else
---
# It is obvious that if a sub-rule has no condition, it will be
# considered as an 'else' (following sub-rules will not be read)
- It is not allowed to define more than one similar action into a single (sub-)rule. However, it remains possible to execute a similar action several times in a single rule (refer to the specific syntax of each action in order to see how to do this):
. this is not allowed:
---
delete : f501b
delete : f501c
. it works:
---
delete :
- f501b
- f501c
- it is strongly recommended to test each rule on a test record before using it on a large batch of records.
Field's and subfield's naming convention
In actions
- Field's and subfield's names are very important:
They must begin with the letter f followed by the 3-digit field name (e.g. f099), followed, for the subfields, by their letter or digit (e.g. f501b).
Controlfields names begin with the letter f followed by 3-digit lower than 010 followed by underscore (e.g. f005_).
Indicators must begin with the letter i, followed by the 3-digit field name followed by the indicator's position (1 or 2) (par exemple i0991).
In actions, you can define a subfield directly (or an indicator with i1 or i2). Depending on context, it refers to the condition's field (if we define only one field to be tested in the condition), or to the field currently being processed in action:
---n condition : $f501a eq "foo" create : b : new 'b' subfield's value in unique condition's field (501) f600 : i1 : 1 a : new subfield (a) in this new 600 field ---
In conditions
In conditions, Field's and subfield's naming convention follow the same rules that actions, but they must be preceded by a dollar signs $ (e.g.
$f110c
for a subfield or$i0991
for an indicator).The record leader can be defined with $ldr.
It's possible to test only one character's value in subfields or leader. To do this, you have to add the this character's position from 0:
#to test the 3rd char. in leader and the 2nd char. in '501$a': condition : $ldr2 eq "t" and $f501a1 eq "z"
Run actions only on the condition's fields
We have already seen that to refers to the condition's field in actions, it is possible to define subfields directly. It works only if we define only one field to be tested in the condition. If we ve'got more than one field in condition, their names must also begin with $ to refer them (it works also with a unique field in condition).
For example, if you test $f501a value's in condition:
- this will delete 'c' subfields only in the '501' field which is true in the condition:
condition : $f501a eq "foo" and defined $f501b
delete : $f501c
- this will delete 'c' subfields in all '501' fields:
condition : $f501a eq "foo" and defined $f501b
delete : f501c
- this will create a new '701' field with a 'c' subfield containing '501$a' subfield's value defined in the condition:
create :
f701a : $f501a
WARNING: To get subfield's value of the condition's fields, these subfields must be defined in the condition:
- it doesn't work:
condition : $f501a eq "foo"
create :
f701a : $f501c
- it works (create a new '701' field with a subfield 'a' containing the condition's '501$c' subfield's value ):
condition : $f501a eq "foo" and defined $f501c
create :
f701a : $f501c
- this restriction is true only for the subfield's values, but isn't true to specify the fields affected by an action: the example below will create a new 'c' subfield in a field defined in the condition.
condition : $f501a eq "foo" and $f110a == 2
create :
$f501c : new subfield value
# If there are multiple '501' fields, only the one with a subfield 'a'='foo' will have a new 'c' subfield created
Actions
create
As the name suggests, this action allows you to create new fields and subfields.
Syntax:
# basic: create : <subfield name> : <value> # to create two subfields (in one field) with same name: create : <subfield name> : - <value> - <value> # advanced: create : <field name> : <subfield name> : - <value> - <value> <subfield name> : <value>
Example:
--- condition : $f501a eq "foo" create : b : new subfield's value on the condition's field f502a : this is the subfield's value of a new 502 field f502b : - this is the first 'b' value of another new 502 - this is the 2nd 'b' value of another new 502 f600 : a : - first 'a' subfield of this new 600 field - second 'a' subfield of this new 600 field b : the 600b value
result (with
$record->as_formatted
):--init record-- LDR 501 _afoo _b1 _cbar --transformed record-- LDR 501 _afoo _b1 _cbar _bnew subfield's value on the condition's field 502 _bthis is the first 'b' value of another new 502 _bthis is the 2nd 'b' value of another new 502 502 _athis is the subfield's value of a new 502 field 600 _afirst 'a' subfield of this new 600 field _asecond 'a' subfield of this new 600 field _bthe 600b value
be careful: You need to use lists to create several subfields with the same name in a field:
# does not work: create : f502b : value f502b : value
update
This action allows you to update existing fields. This action updates all the specified subfields of all specified fields (if the specified field is a condition's field, it will be the only one to be updated)
Syntax:
# basic: update : <subfield name> : <value> # advanced: update : <subfield name> : <value> <subfield name> : <value> <field name> : <subfield name> : <value> <subfield name> : <value>
Example:
--- condition : $f502a eq "second a" update : b : updated value of all 'b' subfields in the condition field f502c : updated value of all 'c' subfields into all '502' fields f501 : a : updated value of all 'a' subfields into all '501' fields b : $f502a is the 502a condition's field's value
result (with
$record->as_formatted
):--init record-- LDR 501 _afoo _b1 _cbar 502 _afirst a _asecond a _bbbb _cccc1 _cccc2 502 _apoto 502 _btruc _cbidule --transformed record-- LDR 501 _aupdated value of all 'a' subfields into all '501' fields _bsecond a is the 502a condition's field's value _cbar 502 _afirst a _asecond a _bupdated value of all 'b' subfields in the condition field _cupdated value of all 'c' subfields into all '502' fields _cupdated value of all 'c' subfields into all '502' fields 502 _apoto 502 _btruc _cupdated value of all 'c' subfields into all '502' fields
updatefirst
This action is identical to the update, except that it updates only the first subfield of the specified fields
Syntax: except for the action's name, it's the same than the update's syntax
Example:
--- condition : $f502a eq "second a" updatefirst : b : updated value of first 'b' subfields in the condition's field f502c : updated value of first 'c' subfields into all '502' fields f501 : a : updated value of first 'a' subfields into all '501' fields b : $f502a is the value of 502a conditionnal field
result (with
$record->as_formatted
):--init record-- LDR 501 _afoo _b1 _cbar 502 _afirst a _asecond a _bbbb _cccc1 _cccc2 502 _apoto 502 _btruc _cbidule --transformed record-- LDR 501 _aupdated value of first 'a' subfields into all '501' fields _bsecond a is the value of 502a conditionnal field _cbar 502 _afirst a _asecond a _bupdated value of first 'b' subfields in the condition's field _cupdated value of first 'c' subfields into all '502' fields _cccc2 502 _apoto 502 _btruc _cupdated value of first 'c' subfields into all '502' fields
forceupdate and forceupdatefirst
If the specified subfields exist: these actions are identical to the update and the updatefirst actions
If the specified subfields doesn't exist: these actions are identical to the create action
Syntax: except for the action's name, it's the same than the update's syntax
Example:
--- condition : $f502a eq "second a" forceupdate : b : 'b' subfield's value in the condition's field f502c : '502c' value's f503 : a : '503a' value's b : $f502a is the 502a condition's value
result (with
$record->as_formatted
):--init record-- LDR 501 _afoo _b1 _cbar 502 _btruc _cbidule 502 _apoto 502 _afirst a _asecond a _bbbb _ccc1 _ccc2 --transformed record-- LDR 501 _afoo _b1 _cbar 502 _btruc _c'502c' value's 502 _apoto _c'502c' value's 502 _afirst a _asecond a _b'b' subfield's value in the condition's field _c'502c' value's _c'502c' value's 503 _a'503a' value's _bsecond a is the 502a condition's value --transformed record if we had used forceupdatefirst-- LDR 501 _afoo _b1 _cbar 502 _btruc _c'502c' value's 502 _apoto _c'502c' value's 502 _afirst a _asecond a _b'b' subfield's value in the condition's field _c'502c' value's _ccc2 503 _a'503a' value's _bsecond a is the value of 502a conditionnal field
delete
As the name suggests, this action allows you to delete fields and subfields.
Syntax:
# basic: delete : <field or subfield name> # advanced: delete : - <field or subfield name> - <field or subfield name>
Example:
--- condition : $f501a eq "foo" delete : $f501 --- condition : $f501a eq "bar" delete : b --- delete : f502 --- delete : - f503 - f504a
result (with
$record->as_formatted
):--init record-- LDR 501 _abar _bbb1 _bbb2 501 _afoo 502 _apata 502 _apoto 503 _apata 504 _aata1 _aata2 _btbbt --transformed record-- LDR 501 _abar 504 _btbbt
duplicatefield
As the name suggests, this action allows you to duplicate entire fields.
Syntax:
# basic: duplicatefield : <field name> > <field name> # advanced: duplicatefield : - <field name> > <field name> - <field name> > <field name>
Example:
--- condition : $f501a eq "bar" duplicatefield : $f501 > f400 --- condition : $f501a eq "foo" duplicatefield : - $f501 > f401 - f005 > f006
result (with
$record->as_formatted
):--init record-- LDR 005 controlfield_content2 005 controlfield_content1 501 _afoo 501 12 _abar _bbb1 _bbb2 --transformed record-- LDR 005 controlfield_content2 005 controlfield_content1 006 controlfield_content1 006 controlfield_content2 400 12 _abar _bbb1 _bbb2 401 _afoo 501 _afoo 501 12 _abar _bbb1 _bbb2
execute
This action allows you to to define Perl code that will be eval.
You can run functions written directly in the YAML ( for details on writing perl subs in the YAML, refer to next chapter: Use Perl functions and LookUp Tables ).
Syntax:
# basic: execute : <perl code> # advanced: execute : - <perl code> - <perl code>
Example:
--- condition : $f501a eq "bar" execute : - warn("f501a eq $f501a") - warn("barbar") --- - condition : $f501a eq "foo" execute : \&warnfoo("f501a eq $f501a") - subs : > sub warnfoo { my $string = shift;warn $string; }
result (in stderr):
f501a eq bar at (eval 30) line 6, <$yamls> line 1. barbar at (eval 30) line 7, <$yamls> line 1. f501a eq foo at (eval 33) line 2, <$yamls> line 1.
Use Perl functions and LookUp Tables
You can use Perl functions (subs) and lookup tables (LUT) to define with greater flexibility values that will be created or updated by the actions: create, forceupdate, forceupdatefirst, update and updatefirst.
These functions can be written in a rule (in this case they can be used only by this rule) or after the last rule ( after the last ---, can be used in all rules: global_subs and global_LUT ).
Variables
Three types of variables can be used:
$this, and condition's elements
variables pointing on the condition's subfield's values are those we have already seen in Chapter 'Run actions only on condition fields' (e.g. $f110c)
$this: this is the variable to use to pointing to the value of current subfield. $this can also be used outside a sub or a LUT.
Example (N.B.: sub 'fromo2e' converts 'o' to 'e'):
--- - condition : $f501a eq "foo" create : c : \&fromo2e("$f501a") update : d : this 501d value's is $this b : \&fromo2e("$this") - subs: > sub fromo2e { my $string=shift; $string =~ s/o/e/g; $string; }
result (with
$record->as_formatted
):--init record-- LDR 501 _afoo _bboo _ddoo --transformed record-- LDR 501 _afoo _bbee _d this 501d value's is doo _cfee
$record
$record is the current MARC::Record object.
subs
Internal rules
Syntax:
#full rule: --- - <method invokation syntax in the actions values, in sub-rule(s)> - subs: > <one or more Perl subs> --- # method invokation syntax: \&<sub name>("<arguments>")
Example:
--- - condition : $f501a eq "foo" and defined $f501d update : b : \&convertbaddate("$this") c : \&trim("$f501d") - subs: > sub convertbaddate { #this function convert date like "21/2/98" to "1998-02-28" my $in = shift; if ($in =~/^(\d{1,2})\/(\d{1,2})\/(\d{2}).*/) { my $day=$1; my $month=$2; my $year=$3; if ($day=~m/^\d$/) {$day="0".$day;} if ($month=~m/^\d$/) {$month="0".$month;} if (int($year)>12) {$year="19".$year;} else {$year="20".$year;} return "$year-$month-$day"; } else { return $in; } } sub trim { # This function removes ",00" at the end of a string my $in = shift; $in=~s/,00$//; return $in; }
result (with
$record->as_formatted
):--init record-- LDR 501 _afoo _b8/12/10 _cboo _d40,00 --transformed record-- LDR 501 _afoo _b2010-12-08 _c40 _d40,00
global_subs
Syntax:
--- global_subs: > <one or more Perl subs> # method invokation syntax: \&<sub name>("<arguments>")
Example:
--- condition : $f501a eq "foo" update : b : \&return_record_encoding() c : \&trim("$this") --- global_subs: > sub return_record_encoding { $record->encoding(); } sub trim { # This function removes ",00" at the end of a string my $in = shift; $in=~s/,00$//; return $in; }
result (with
$record->as_formatted
):--init record-- LDR 501 _afoo _bbar _c40,00 --transformed record-- LDR 501 _afoo _bMARC-8 _c40
LUT
If a value has no match in a LookUp Table, it isn't modified.
If you want to use more than one LookUp Table in a rule, you must use a global_LUT because it differentiates tables with titles.
Internal rules
Syntax:
#full rule: --- - <LUT invokation syntax in the actions values, inside sub-rule(s)> - LUT : <starting value> : <final value> <starting value> : <final value> --- # LUT invokation syntax: \&LUT("<starting value>")
Example:
--- - condition : $f501b eq "bar" create : f604a : \&LUT("$f501b") update : c : \&LUT("$this") - LUT : 1 : first 2 : second bar : openbar
result (with
$record->as_formatted
):--init record-- LDR 501 _bbar _c1 --transformed record-- LDR 501 _bbar _cfirst 604 _aopenbar
global_LUT
Syntax:
--- global_LUT: <LUT title> : <starting value> : <final value> <starting value> : <final value> <LUT title> : <starting value> : <final value> <starting value> : <final value> # global_LUT invokation syntax: \&LUT("<starting value>","<LUT title>")
Example:
--- update : f501a : \&LUT("$this","numbers") f501b : \&LUT("$this","cities") f501c : \&LUT("$this","cities") --- global_LUT: cities: NY : New York SF : San Fransisco TK : Tokyo numbers: 1 : one 2 : two
result (with
$record->as_formatted
):--init record-- LDR 501 _a1 _bfoo _cSF --transformed record-- LDR 501 _aone _bfoo _cSan Fransisco
Latest tips and a big YAML example's
Restriction: the specific case of double-quotes (") and dollar signs ($):
In YAML, these characters are interpreted differently. To use them in string context, you will need to replace them in YAML by
#_dbquote_#
(for ") and#_dollars_#
(for $):. Example:
--- condition : $f501a eq "I want #_dbquote_##_dollars_##_dbquote_#" create : f604a : "#_dbquote_#$f501a#_dbquote_# contain a #_dollars_# sign"
. result (with
$record->as_formatted
):--init record-- LDR 501 _aI want "$" --transformed record-- LDR 501 _aI want "$" 604 _a"I want "$"" contain a $ sign
Example: feel free to copy the examples in this documentation. Be aware that I have added four space characters at the beginning of each line to make them better displayed by the POD interpreter. If you copy / paste them into your YAML configuration file, Be sure to remove the first four characters of each line (e.g. with vim,
:%s/^\s\s\s\s//g
).--- condition : $f501a eq "foo" create : f502a : this is the value of a subfield of a new 502 field --- condition : $f401a=~/foo/ create : b : new value of the 401 condition's field f600 : a : - first a subfield of this new 600 field - second a subfield of this new 600 field b : the 600b value execute : \&reencodeRecordtoUtf8() --- - condition : $f501a =~/foo/ and $f503a =~/bar/ forceupdate : $f503b : mandatory b in condition's field f005_ : mandatory 005 f006_ : \&return_record_encoding() f700 : a : the a subfield of this mandatory 700 field b : \&sub1("$f503a") forceupdatefirst : $f501b : update only the first b in condition's field 501 - condition : $f501a =~/foo/ execute : \&warnfoo("f501a contain foo") - subs : > sub return_record_encoding { $record->encoding(); } sub sub1 {my $string=shift;$string =~ s/a/e/g;return $string;} sub warnfoo { my $string = shift;warn $string; } --- - condition : $f501b2 eq "o" update : c : updated value of all c in condition's field f504a : updated value of all 504a if exists f604 : b : \&LUT("$this") c : \&LUT("NY","cities") updatefirst : f604a : update only the first a in 604 - condition : $f501c eq "1" delete : $f501 - LUT : 1 : first 2 : second bar : openbar --- delete : - f401a - f005 --- condition : $ldr2 eq "t" execute : \&SetRecordToLowerCase($record) --- condition : $f008_ eq "controlfield_content8b" duplicatefield : - $f008 > f007 - f402 > f602 delete : f402 --- global_subs: > sub reencodeRecordtoUtf8 { $record->encoding( 'UTF-8' ); } sub warnfee { my $string = shift;warn $string; } global_LUT: cities: NY : New York SF : San Fransisco numbers: 1 : one 2 : two
result (with
$record->as_formatted
) :--init record-- LDR optionnal leader 005 controlfield_content 008 controlfield_content8a 008 controlfield_content8b 106 _aVaLuE 401 _aafooa 402 2 _aa402a2 402 1 _aa402a1 501 _c1 501 _afoo _afoao _b1 _bbaoar _cbig 503 _afee _ababar 504 _azut _asisi 604 _afoo _afoo _bbar _ctruc --transformed record-- LDR optionnalaleader 006 UTF-8 007 controlfield_content8b 008 controlfield_content8a 008 controlfield_content8b 106 _aVaLuE 401 _bnew value of the 401 condition's field 501 _c1 501 _afoo _afoao _bupdate only the first b in condition's field 501 _bbaoar _cupdated value of all c in condition's field 502 _athis is the value of a subfield of a new 502 field 503 _afee _ababar _bmandatory b in condition's field 504 _aupdated value of all 504a if exists _aupdated value of all 504a if exists 600 _afirst a subfield of this new 600 field _asecond a subfield of this new 600 field _bthe 600b value 602 1 _aa402a1 602 2 _aa402a2 604 _aupdate only the first a in 604 _afoo _bopenbar _cNew York 700 _athe a subfield of this mandatory 700 field _bbeber
TODO
Subs are redefined at each execution. It's not blocking, but it will display messages like "Subroutine foo redefined at (eval 2) line 1" on the stderr starting from the second record.
SEE ALSO
MARC::Record (http://search.cpan.org/perldoc?MARC::Record)
MARC::Field (http://search.cpan.org/perldoc?MARC::Field)
Library Of Congress MARC pages (http://www.loc.gov/marc/)
The definitive source for all things MARC.
AUTHOR
Stephane Delaune, (delaune.stephane at gmail.com)
COPYRIGHT
Copyright 2011 Stephane Delaune for Biblibre.com, all rights reserved.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.