Changes for version 0.98 - 2007-08-07

  • addition of two code elements to etc/codetables.xml that enable the conversion of some Arabic records that contain 0x8D and 0x8E which ought to map to 0x200D and 0x200C in Unicode. These mappings are present for Basic and Extended Latin, but are not present in Arabic codetables. There are actually some records that seem to prove the need for these rules (LCCN 2006552991). Thanks to François Charette <fcharette@ankabut.net> for finding and proposing the fix. Rules were forwarded on to LC for inclusion in canonical character set mapping.
  • added t/farsi.t and t/farsi.marc to enable testing of new code rules. Hopefully this will fail if the codetables.xml is inadvertently removed without LC having added the new rules.

Documentation

compile the LoC mapping table
print the marc8 conversion table as HTML

Modules

convert MARC-8 encoded strings to UTF-8
represents a MARC-8/UTF-8 mapping
compile XML mapping rules from LoC
constants for MARC::Charset
character mapping db