docs/MIME/Latin1.pm.html

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<!-- Generated by pod2coolhtml 1.101
  -- Using Pod::CoolHTML 1.104 , (C) 1997 by Eryq (eryq@enteract.com).
  --
  -- DO NOT EDIT THIS HTML FILE! All your changes will be lost.
  -- Edit the POD or Perl file that was used to create it.
  -->
<HTML>

<HEAD>
<TITLE>MIME::Latin1</TITLE>
</HEAD>
<BODY LINK=#C00000 ALINK=#FF2020 VLINK=#900000>
<A NAME="__top"> </A><CENTER><TABLE BORDER=2 CELLPADDING=2 WIDTH=100%>

<TR>
<TD WIDTH=25% ALIGN=CENTER><B><FONT SIZE=+1>
<A HREF="Tools.pm.html">MIME::Tools</A></FONT></B></TD>

<TD WIDTH=25% ALIGN=CENTER><B><SMALL>
<A HREF="Body.pm.html">MIME::Body</A></SMALL></B></TD>

<TD WIDTH=25% ALIGN=CENTER><B><SMALL>
<A HREF="Decoder.pm.html">MIME::Decoder</A></SMALL></B></TD>

<TD WIDTH=25% ALIGN=CENTER><B><SMALL>
<A HREF="Entity.pm.html">MIME::Entity</A></SMALL></B></TD>

<TR>
<TD WIDTH=25% ALIGN=CENTER><B><SMALL>
<A HREF="Head.pm.html">MIME::Head</A></SMALL></B></TD>

<TD WIDTH=25% ALIGN=CENTER><B><SMALL>
<A HREF="IO.pm.html">MIME::IO</A></SMALL></B></TD>

<TD WIDTH=25% ALIGN=CENTER><B><SMALL>
MIME::Latin1</SMALL></B></TD>

<TD WIDTH=25% ALIGN=CENTER><B><SMALL>
<A HREF="Parser.pm.html">MIME::Parser</A></SMALL></B></TD>

<TR>
<TD WIDTH=25% ALIGN=CENTER><B><SMALL>
<A HREF="ParserBase.pm.html">MIME::ParserBase</A></SMALL></B></TD>

<TD WIDTH=25% ALIGN=CENTER><B><SMALL>
<A HREF="ToolUtils.pm.html">MIME::ToolUtils</A></SMALL></B></TD>

<TD WIDTH=25% ALIGN=CENTER><B><SMALL>
<A HREF="Tools.pm.html">MIME::Tools</A></SMALL></B></TD>

<TD WIDTH=25% ALIGN=CENTER><B><SMALL>
<A HREF="Words.pm.html">MIME::Words</A></SMALL></B></TD>
</TABLE></CENTER>

<P><TABLE WIDTH="100%">

<TR VALIGN="TOP"><TD ALIGN="LEFT"><CENTER>
<H1><FONT SIZE=7 COLOR=#600020><B>MIME::<BR>Latin1</B></FONT></H1><IMG SRC="mime-sm.gif" ALT="MIME!"></CENTER>
<TD>
<UL>
<LI><A HREF="#name">NAME</A>
</LI><LI><A HREF="#synopsis">SYNOPSIS</A>
</LI><LI><A HREF="#description">DESCRIPTION</A>
</LI><LI><A HREF="#public_interface">PUBLIC INTERFACE</A>
</LI><LI><A HREF="#notes">NOTES</A>
</LI><LI><A HREF="#author">AUTHOR</A>
</LI><LI><A HREF="#version">VERSION</A>
</LI></UL>

</TABLE>

<P><HR>
<A NAME="name">
<H1><FONT COLOR=#600020>
<A HREF="#__top"><IMG SRC="h1bullet.gif" ALT="" BORDER="0"></A>
NAME</FONT></H1>
</A>


<P>
MIME::Latin1 - DEPRECATED package to translate ISO-8859-1 
               into 7-bit approximations


<P><HR>
<A NAME="synopsis">
<H1><FONT COLOR=#600020>
<A HREF="#__top"><IMG SRC="h1bullet.gif" ALT="" BORDER="0"></A>
SYNOPSIS</FONT></H1>
</A>


<P>
<PRE>    use MIME::Latin1 qw(latin1_to_ascii);</PRE>



<P>
<PRE>    $dirty = &quot;Fran\347ois&quot;;
    print latin1_to_ascii($dirty);      # prints out &quot;Fran\c,ois&quot;</PRE>



<P><HR>
<A NAME="description">
<H1><FONT COLOR=#600020>
<A HREF="#__top"><IMG SRC="h1bullet.gif" ALT="" BORDER="0"></A>
DESCRIPTION</FONT></H1>
</A>


<P>
<I>This module is so deprecated, it's not funny.</I>  
File this under &quot;seemed like a good idea at the time&quot;... I'm still
including it with the distribution so that existing code won't
break too badly, but it will be detached from the main MIME code
base, and ultimately may vanish (at least from MIME::).


<P>
This is a small package used by the <CODE>&quot;7bit&quot;</CODE> encoder/decoder for
handling the case where a user wants to 7bit-encode a document
that contains 8-bit (presumably Latin-1) characters.  It provides
a mapping whereby every 8 bit character is mapped to a unique
sequence of two 7-bit characters that approximates the appearance
or pronunciation of the Latin-1 character.  For example:


<P>
<PRE>    This...                   maps to...
    --------------------------------------------------
    A c with a cedilla        c,
    A C with a cedilla        C,
    An &quot;AE&quot; ligature          AE
    An &quot;ae&quot; ligature          ae
    Yen sign                  Y-</PRE>



<P>
I call each of these 7-bit 2-character encodings <I>mnemonic encodings</I>, 
since they (hopefully) are visually reminiscent of the 8-bit
characters they are meant to represent.


<P><HR>
<A NAME="public_interface">
<H1><FONT COLOR=#600020>
<A HREF="#__top"><IMG SRC="h1bullet.gif" ALT="" BORDER="0"></A>
PUBLIC INTERFACE</FONT></H1>
</A>

<DL>
<P><DT><B><A NAME="latin1_to_ascii">latin1_to_ascii STRING,[OPTS]</A></B><DD>

<I>Function.</I>
Map the Latin-1 characters in the string to sequences of the form:


<P>
<PRE>     \xy</PRE>



<P>
Where <CODE>xy</CODE> is a two-character sequence that visually approximates
the Latin-1 character.  For example:


<P>
<PRE>     c cedilla      =&gt; \c,
     n tilde        =&gt; \n~
     AE ligature    =&gt; \AE
     small o slash  =&gt; \o/</PRE>



<P>
The sequences are taken almost exactly from the Sun character composition
sequences for generating these characters.  The translation may be further
tweaked by the (optional) OPTS string:

<DL>
<P><DT><B><A NAME="readable">READABLE</A></B><DD>

<I>Currently the default.</I>  
Only 8-bit characters are affected, and their output is of the form <CODE>\xy</CODE>:


<P>
<PRE>      \&lt;&lt;Fran\c,ois M\u&quot;ller\&gt;&gt;   c:\usr\games</PRE>



<P><DT><B><A NAME="noslash">NOSLASH</A></B><DD>

Exactly like READABLE, except the leading <CODE>&quot;\&quot;</CODE> is not inserted,
making the output more compact:


<P>
<PRE>      &lt;&lt;Franc,ois Mu&quot;ller&gt;&gt;       c:\usr\games</PRE>



<P><DT><B><A NAME="encode">ENCODE</A></B><DD>

Not only is the leading <CODE>&quot;\&quot;</CODE> output, but any other occurences of <CODE>&quot;\&quot;</CODE> 
are escaped as well by turning them into <CODE>&quot;\\&quot;</CODE>.  Unlike the other options,
this produces output which may easily be parsed and turned back into the 
original 8-bit characters, so in a way it is its own full-fledged encoding... 
and given that <CODE>&quot;\&quot;</CODE> is a rare-enough character, not much uglier that the 
normal output:


<P>
<PRE>      \&lt;&lt;Fran\c,ois M\u&quot;ller\&gt;&gt;   c:\\usr\\games</PRE>



<P>
You may use <CODE>ascii_to_latin1</CODE> to decode this.

</DL>


<P>
<B>Note:</B> as of 3.12, the options string must, if defined,
be one of the above options.  Composite options like &quot;ENCODE|NOSLASH&quot;
will no longer be supported (most will be self-contradictory anyway).


<P><DT><B><A NAME="ascii_to_latin1">ascii_to_latin1 STRING</A></B><DD>

<I>Function.</I>
Map the Latin-1 escapes in the string (sequences of the form <CODE>\xy</CODE>)
back into actual 8-bit characters.


<P>
<PRE>   # Assume $enc holds the actual text...    \&lt;&lt;Fran\c,ois \\ M\u&quot;ller\&gt;&gt;
   print ascii_to_latin1($enc);</PRE>



<P>
Unrecognized sequences are turned into '?' characters.


<P>
<B>Note:</B> <I>you must have specified the &quot;ENCODE&quot; option when encoding 
in order to decode!</I>

</DL>


<P><HR>
<A NAME="notes">
<H1><FONT COLOR=#600020>
<A HREF="#__top"><IMG SRC="h1bullet.gif" ALT="" BORDER="0"></A>
NOTES</FONT></H1>
</A>

<DL>
<P><DT><B><A NAME="hex">Hex encoding</A></B><DD>

Characters in the octal range \200-\237 (hexadecimal \x80-\x9F) 
currently do not have mnemonic Latin-1 equivalents, and therefore 
are represented by the hex sequences &quot;80&quot; through &quot;9F&quot;, where 
the second hex digit is <B>upcased.</B>  That is:


<P>
<PRE>   80  81  82  83  84  85  86  87  88  89  8A  8B  8C  8D  8E  8F
   90  91  92  93  94  95  96  97  98  99  9A  9B  9C  9D  9E  9F</PRE>



<P>
To allow this scheme to work properly for <I>all</I> 8-bit-on characters, 
the general rule is: 
<I>the first hex digit is DOWNcased, and the second hex digit is UPcased.</I>
Hence, these are all decodable sequences:


<P>
<PRE>   a0  a1  a2  a3  a4  a5  a6  a7  a8  a9  aA  aB  aC  aD  aE  aF   </PRE>



<P>
This &quot;downcase-upcase&quot; style is so we don't conflict with mnemonically-encoded 
ligatures like &quot;ae&quot; and &quot;AE&quot;, the latter of which could reasonably 
have been represented as &quot;Ae&quot;.


<P>
Note that we must never have a mnemonic encoding that could be mistaken for
a hex sequence from &quot;80&quot; to &quot;fF&quot;, since the ambiguity would make it impossible
to decode.  (However, &quot;12&quot;, &quot;34&quot;, &quot;Ff&quot;, etc. are perfectly fine.)


<P>
<I>Thanks to Rolf Nelson for reporting the &quot;gap&quot; in the encoding.</I>


<P><DT><B><A NAME="other">Other restrictions</A></B><DD>

<B>The first character of a 2-character encoding can not be a &quot;\&quot;</B>.  
This is because &quot;\\&quot; represents an encoded &quot;\&quot;: to allow &quot;\\x&quot;
would introduce an ambiguity for the decoder.


<P><DT><B><A NAME="going">Going backwards</A></B><DD>

Since the mappings may fluctuate over time as I get more input, 
anyone writing a translator would be well-advised to use ascii_to_latin1()
to perform the reverse mapping.  I will strive for backwards-compatibility
in that code.


<P><DT><B><A NAME="got">Got a problem?</A></B><DD>

If you have better suggestions for some of the character representations,
please contact me.

</DL>


<P><HR>
<A NAME="author">
<H1><FONT COLOR=#600020>
<A HREF="#__top"><IMG SRC="h1bullet.gif" ALT="" BORDER="0"></A>
AUTHOR</FONT></H1>
</A>


<P>
Copyright (c) 1996, 1997 by Eryq / eryq@zeegee.com


<P>
All rights reserved.  This program is free software; you can redistribute 
it and/or modify it under the same terms as Perl itself.


<P><HR>
<A NAME="version">
<H1><FONT COLOR=#600020>
<A HREF="#__top"><IMG SRC="h1bullet.gif" ALT="" BORDER="0"></A>
VERSION</FONT></H1>
</A>


<P>
$Revision: 4.102 $ $Date: 1997/12/14 08:51:50 $


<P><HR>
<SMALL>
		Last updated: Sat Jan 17 23:01:53 1998 <BR>
		Generated by pod2coolhtml 1.101.  Want a copy?  Just email
		<A HREF="mailto:eryq@enteract.com">eryq@enteract.com</A>.
		(Yes, it's free.)
		</SMALL></BODY>
</HTML>
	Global
`s`	Focus search bar
`?`	Bring up this help dialog
	GitHub
`g` `p`	Go to pull requests
`g` `i`	Go to GitHub issues (only if GitHub is preferred repository)
	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse
Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)
Keyboard Shortcuts