NAME

MHonArc::CharEnt - HTML Character routines for MHonArc.

SYNOPSIS

use MHonArc::CharEnt;

MHonArc resource file:

  <CharsetConverters>
  ...
  iso-8859-15;    MHonArc::CharEnt::str2sgml;     MHonArc/CharEnt.pm
  ...
  </CharsetConverters>

DESCRIPTION

MHonArc::CharEnt provides the main character conversion routine used by MHonArc for converting non-ASCII encoded message header data and text/plain character data into HTML. This module was initially written to just support 8-bit only charsets. However, it has been extended to support multibyte charsets.

All characters are mapped to HTML 4.0 character entity references (e.g. &lt; &gt;) or to Unicode numeric character entity references (e.g. &#x203E;). Most modern browsers will support the Unicode references directly.

NOTES

  • This module relies on MHonArc's CHARSETALIASES resource for defining alternate names for charset supported.

  • Most character conversion is done through mapping tables that are dynamicly loaded on a as-needed basis. There is probably room for optimization by trying to replace tables for charsets with algorithmic conversion solutions.

    UTF-8 conversion is done algorithmically.

  • A main goal of this module is to convert raw non-ASCII data of various character sets to ASCII data using entity references for non-ASCII characters. This way, archive files will all be in ASCII, with modern compliant HTML browsers being able to handle the rendering of non-ASCII characters from the standard named and numeric character entity references.

    This does make reading the raw HTML source for non-English languages difficult, but this may be a non-issue with most users.

VERSION

$Id: CharEnt.pm,v 1.17 2010/12/31 18:23:02 ehood Exp $

AUTHOR

Earl Hood, earl@earlhood.com

MHonArc comes with ABSOLUTELY NO WARRANTY and MHonArc may be copied only under the terms of the GNU General Public License, which may be found in the MHonArc distribution.