NAME
MHonArc::CharEnt - HTML Character routines for MHonArc.
SYNOPSIS
use MHonArc::CharEnt;
MHonArc resource file:
<CharsetConverters>
...
iso-8859-15; MHonArc::CharEnt::str2sgml; MHonArc/CharEnt.pm
...
</CharsetConverters>
DESCRIPTION
MHonArc::CharEnt provides the main character conversion routine used by MHonArc for converting non-ASCII encoded message header data and text/plain character data into HTML. This module was initially written to just support 8-bit only charsets. However, it has been extended to support multibyte charsets.
All characters are mapped to HTML 4.0 character entity references (e.g. < >) or to Unicode numeric character entity references (e.g. ‾). Most modern browsers will support the Unicode references directly.
NOTES
This module relies on MHonArc's CHARSETALIASES resource for defining alternate names for charset supported.
Most character conversion is done through mapping tables that are dynamicly loaded on a as-needed basis. There is probably room for optimization by trying to replace tables for charsets with algorithmic conversion solutions.
UTF-8 conversion is done algorithmically.
A main goal of this module is to convert raw non-ASCII data of various character sets to ASCII data using entity references for non-ASCII characters. This way, archive files will all be in ASCII, with modern compliant HTML browsers being able to handle the rendering of non-ASCII characters from the standard named and numeric character entity references.
This does make reading the raw HTML source for non-English languages difficult, but this may be a non-issue with most users.
VERSION
$Id: CharEnt.pm,v 1.17 2010/12/31 18:23:02 ehood Exp $
AUTHOR
Earl Hood, earl@earlhood.com
MHonArc comes with ABSOLUTELY NO WARRANTY and MHonArc may be copied only under the terms of the GNU General Public License, which may be found in the MHonArc distribution.