NAME
MARC::Charset - convert MARC-8 encoded strings to UTF-8
SYNOPSIS
# import the marc8_to_utf8 function
# prepare STDOUT for utf8
binmode
(STDOUT,
'utf8'
);
# print out some marc8 as utf8
marc8_to_utf8(
$marc8_string
);
DESCRIPTION
MARC::Charset allows you to turn MARC-8 encoded strings into UTF-8 strings. MARC-8 is a single byte character encoding that predates unicode, and allows you to put non-Roman scripts in MARC bibliographic records.
EXPORTS
ignore_errors()
Tells MARC::Charset whether or not to ignore all encoding errors, and returns the current setting. This is helpful if you have records that contain both MARC8 and UNICODE characters.
my
$ignore
= MARC::Charset->ignore_errors();
MARC::Charset->ignore_errors(1);
# ignore errors
MARC::Charset->ignore_errors(0);
# DO NOT ignore errors
assume_unicode()
Tells MARC::Charset whether or not to assume UNICODE when an error is encountered in ignore_errors mode and returns the current setting. This is helpful if you have records that contain both MARC8 and UNICODE characters.
my
$setting
= MARC::Charset->assume_unicode();
MARC::Charset->assume_unicode(1);
# assume characters are unicode (utf-8)
MARC::Charset->assume_unicode(0);
# DO NOT assume characters are unicode
assume_encoding()
Tells MARC::Charset whether or not to assume a specific encoding when an error is encountered in ignore_errors mode and returns the current setting. This is helpful if you have records that contain both MARC8 and other characters.
my
$setting
= MARC::Charset->assume_encoding();
MARC::Charset->assume_encoding(
'cp850'
);
# assume characters are cp850
MARC::Charset->assume_encoding(
''
);
# DO NOT assume any encoding
marc8_to_utf8()
Converts a MARC-8 encoded string to UTF-8.
my
$utf8
= marc8_to_utf8(
$marc8
);
If you'd like to ignore errors pass in a true value as the 2nd parameter or call MARC::Charset->ignore_errors() with a true value:
my
$utf8
= marc8_to_utf8(
$marc8
,
'ignore-errors'
);
or
MARC::Charset->ignore_errors(1);
my
$utf8
= marc8_to_utf8(
$marc8
);
utf8_to_marc8()
Will attempt to translate utf8 into marc8.
my
$marc8
= utf8_to_marc8(
$utf8
);
If you'd like to ignore errors, or characters that can't be converted to marc8 then pass in a true value as the second parameter:
my
$marc8
= utf8_to_marc8(
$utf8
,
'ignore-errors'
);
or
MARC::Charset->ignore_errors(1);
my
$utf8
= marc8_to_utf8(
$marc8
);
DEFAULT CHARACTER SETS
If you need to alter the default character sets you can set the $MARC::Charset::DEFAULT_G0 and $MARC::Charset::DEFAULT_G1 variables to the appropriate character set code:
$MARC::Charset::DEFAULT_G0
= BASIC_ARABIC;
$MARC::Charset::DEFAULT_G1
= EXTENDED_ARABIC;
SEE ALSO
AUTHOR
Ed Summers (ehs@pobox.com)