docs/MIME/ParserBase.pm.html

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<!-- Generated by pod2coolhtml 1.101
  -- Using Pod::CoolHTML 1.104 , (C) 1997 by Eryq (eryq@enteract.com).
  --
  -- DO NOT EDIT THIS HTML FILE! All your changes will be lost.
  -- Edit the POD or Perl file that was used to create it.
  -->
<HTML>

<HEAD>
<TITLE>MIME::ParserBase</TITLE>
</HEAD>
<BODY LINK=#C00000 ALINK=#FF2020 VLINK=#900000>
<A NAME="__top"> </A><CENTER><TABLE BORDER=2 CELLPADDING=2 WIDTH=100%>

<TR>
<TD WIDTH=25% ALIGN=CENTER><B><FONT SIZE=+1>
<A HREF="Tools.pm.html">MIME::Tools</A></FONT></B></TD>

<TD WIDTH=25% ALIGN=CENTER><B><SMALL>
<A HREF="Body.pm.html">MIME::Body</A></SMALL></B></TD>

<TD WIDTH=25% ALIGN=CENTER><B><SMALL>
<A HREF="Decoder.pm.html">MIME::Decoder</A></SMALL></B></TD>

<TD WIDTH=25% ALIGN=CENTER><B><SMALL>
<A HREF="Entity.pm.html">MIME::Entity</A></SMALL></B></TD>

<TR>
<TD WIDTH=25% ALIGN=CENTER><B><SMALL>
<A HREF="Head.pm.html">MIME::Head</A></SMALL></B></TD>

<TD WIDTH=25% ALIGN=CENTER><B><SMALL>
<A HREF="IO.pm.html">MIME::IO</A></SMALL></B></TD>

<TD WIDTH=25% ALIGN=CENTER><B><SMALL>
<A HREF="Latin1.pm.html">MIME::Latin1</A></SMALL></B></TD>

<TD WIDTH=25% ALIGN=CENTER><B><SMALL>
<A HREF="Parser.pm.html">MIME::Parser</A></SMALL></B></TD>

<TR>
<TD WIDTH=25% ALIGN=CENTER><B><SMALL>
MIME::ParserBase</SMALL></B></TD>

<TD WIDTH=25% ALIGN=CENTER><B><SMALL>
<A HREF="ToolUtils.pm.html">MIME::ToolUtils</A></SMALL></B></TD>

<TD WIDTH=25% ALIGN=CENTER><B><SMALL>
<A HREF="Tools.pm.html">MIME::Tools</A></SMALL></B></TD>

<TD WIDTH=25% ALIGN=CENTER><B><SMALL>
<A HREF="Words.pm.html">MIME::Words</A></SMALL></B></TD>
</TABLE></CENTER>

<P><TABLE WIDTH="100%">

<TR VALIGN="TOP"><TD ALIGN="LEFT"><CENTER>
<H1><FONT SIZE=7 COLOR=#600020><B>MIME::<BR>ParserBase</B></FONT></H1><IMG SRC="mime-sm.gif" ALT="MIME!"></CENTER>
<TD>
<UL>
<LI><A HREF="#name">NAME</A>
</LI><LI><A HREF="#synopsis">SYNOPSIS</A>
</LI><LI><A HREF="#description">DESCRIPTION</A>
</LI><LI><A HREF="#public_interface">PUBLIC INTERFACE</A>
</LI><UL>
<LI><A HREF="#construction_and_setting_options">Construction, and setting options</A>
</LI><LI><A HREF="#parsing_messages">Parsing messages</A>
</LI></UL>
<LI><A HREF="#writing_subclasses">WRITING SUBCLASSES</A>
</LI><LI><A HREF="#notes">NOTES</A>
</LI><LI><A HREF="#warnings">WARNINGS</A>
</LI><LI><A HREF="#under_the_hood">UNDER THE HOOD</A>
</LI><LI><A HREF="#author">AUTHOR</A>
</LI><LI><A HREF="#version">VERSION</A>
</LI></UL>

</TABLE>

<P><HR>
<A NAME="name">
<H1><FONT COLOR=#600020>
<A HREF="#__top"><IMG SRC="h1bullet.gif" ALT="" BORDER="0"></A>
NAME</FONT></H1>
</A>


<P>
MIME::ParserBase - abstract class for parsing MIME streams


<P><HR>
<A NAME="synopsis">
<H1><FONT COLOR=#600020>
<A HREF="#__top"><IMG SRC="h1bullet.gif" ALT="" BORDER="0"></A>
SYNOPSIS</FONT></H1>
</A>


<P>
This is an <I>abstract</I> class; however, here's how one of its 
<I>concrete subclasses</I> is used:


<P>
<PRE>    # Create a new parser object:
    my $parser = new MIME::Parser;</PRE>



<P>
<PRE>    # Parse an input stream:
    $entity = $parser-&gt;read(\*STDIN) or die &quot;couldn't parse MIME stream&quot;;</PRE>



<P>
<PRE>    # Congratulations: you now have a (possibly multipart) MIME entity!
    $entity-&gt;dump_skeleton;          # for debugging </PRE>



<P>
There are also some convenience methods:


<P>
<PRE>    # Parse an in-core MIME message:
    $entity = $parser-&gt;parse_data($message)                 or die &quot;parse&quot;;</PRE>



<P>
<PRE>    # Parse an MIME message in a file:
    $entity = $parser-&gt;parse_in(&quot;/some/file.msg&quot;)           or die &quot;parse&quot;;</PRE>



<P>
<PRE>    # Parse an MIME message out of a pipeline:
    $entity = $parser-&gt;parse_in(&quot;gunzip - &lt; file.msg.gz |&quot;) or die &quot;parse&quot;;</PRE>



<P>
<PRE>    # Parse already-split input (as &quot;deliver&quot; would give it to you):
    $entity = $parser-&gt;parse_two(&quot;msg.head&quot;, &quot;msg.body&quot;)    or die &quot;parse&quot;;</PRE>



<P>
In case a parse fails, it's nice to know who sent it to us.  So...


<P>
<PRE>    # Parse an input stream:
    if (!($entity = $parser-&gt;read(\*STDIN))) {   # oops!
	$decapitated = $parser-&gt;last_head;          # get last top-level head
    }</PRE>



<P>
You can also alter the behavior of the parser:


<P>
<PRE>    # Parse contained &quot;message/rfc822&quot; objects as nested MIME streams:
    $parser-&gt;parse_nested_messages('REPLACE');</PRE>



<P>
<PRE>    # Automatically attempt to RFC-1522-decode the MIME headers:
    $parser-&gt;decode_headers(1);</PRE>



<P>
Cute stuff...


<P>
<PRE>    # Convert a Mail::Internet object to a MIME::Entity:
    @lines = (@{$mail-&gt;header}, &quot;\n&quot;, @{$mail-&gt;body});
    $entity = $parser-&gt;parse_data(\@lines);</PRE>



<P><HR>
<A NAME="description">
<H1><FONT COLOR=#600020>
<A HREF="#__top"><IMG SRC="h1bullet.gif" ALT="" BORDER="0"></A>
DESCRIPTION</FONT></H1>
</A>


<P>
Where it all begins.


<P>
This is the class that contains all the knowledge for <I>parsing</I> MIME
streams.  It's an abstract class, containing no methods governing
the <I>output</I> of the parsed entities: such methods belong in the
concrete subclasses.


<P>
You can inherit from this class to create your own subclasses 
that parse MIME streams into MIME::Entity objects.  One such subclass, 
<B>MIME::Parser</B>, is already provided in this kit.  I strongly suggest
you base your application classes off of MIME::Parser instead of this class.


<P><HR>
<A NAME="public_interface">
<H1><FONT COLOR=#600020>
<A HREF="#__top"><IMG SRC="h1bullet.gif" ALT="" BORDER="0"></A>
PUBLIC INTERFACE</FONT></H1>
</A>


<P><HR>
<A NAME="construction_and_setting_options">
<H2><FONT COLOR=#600020>
<A HREF="#__top"><IMG SRC="h2bullet.gif" ALT="" BORDER="0"></A>
Construction, and setting options</FONT></H2>
</A>

<DL>
<P><DT><B><A NAME="new">new ARGS...</A></B><DD>

<I>Class method.</I>
Create a new parser object.  Passes any subsequent arguments
onto the <CODE>init()</CODE> method.


<P>
Once you create a parser object, you can then set up various parameters
before doing the actual parsing.  Here's an example using one of our
concrete subclasses:


<P>
<PRE>    my $parser = new MIME::Parser;
    $parser-&gt;output_dir(&quot;/tmp&quot;);
    $parser-&gt;output_prefix(&quot;msg1&quot;);
    my $entity = $parser-&gt;read(\*STDIN);</PRE>



<P><DT><B><A NAME="decode_headers">decode_headers ONOFF</A></B><DD>

<I>Instance method.</I>
If set true, then the parser will attempt to decode the MIME headers
as per RFC-1522 the moment it sees them.  This will probably be of
most use to those of you who expect some international mail,
especially mail from individuals with 8-bit characters in their names.


<P>
If set false, no attempt at decoding will be done.


<P>
With no argument, just returns the current setting.


<P>
<B>Warning:</B> some folks already have code which assumes that no decoding
is done, and since this is pretty new and radical stuff, I have
initially made &quot;off&quot; the default setting for backwards compatibility in 2.05.
However, I will possibly change this in future releases, so <I>please:</I>
if you want a particular setting, declare it when you create your parser
object.


<P><DT><B><A NAME="interface">interface ROLE,[VALUE]</A></B><DD>

<I>Instance method.</I>
During parsing, the parser normally creates instances of certain classes, 
like MIME::Entity.  However, you may want to create a parser subclass
that uses your own experimental head, entity, etc. classes (for example,
your &quot;head&quot; class may provide some additional MIME-field-oriented methods).


<P>
If so, then this is the method that your subclass should invoke during 
init.  Use it like this:


<P>
<PRE>    package MyParser;
    @ISA = qw(MIME::Parser);
    ...
    sub init {
	my $self = shift;
	$self-&gt;SUPER::init(@_);        # do my parent's init
        $self-&gt;interface(ENTITY_CLASS =&gt; 'MIME::MyEntity');
	$self-&gt;interface(HEAD_CLASS   =&gt; 'MIME::MyHead');
	$self;                         # return
    }</PRE>



<P>
With no VALUE, returns the VALUE currently associated with that ROLE.


<P><DT><B><A NAME="last_head">last_head</A></B><DD>

<I>Instance method.</I>
Return the top-level MIME header of the last stream we attempted to parse.
This is useful for replying to people who sent us bad MIME messages.


<P>
<PRE>    # Parse an input stream:
    $entity = $parser-&gt;read(\*STDIN);
    if (!$entity) {           # oops!
	my $decapitated = $parser-&gt;last_head;    # last top-level head
    }</PRE>



<P><DT><B><A NAME="parse_nested_messages">parse_nested_messages OPTION</A></B><DD>

<I>Instance method.</I>
Some MIME messages will contain a part of type <CODE>message/rfc822</CODE>:
literally, the text of an embedded mail/news/whatever message.  
The normal behavior is to save such a message just as if it were a 
<CODE>text/plain</CODE> document, without attempting to decode it.  However, you can 
change this: before parsing, invoke this method with the OPTION you want:


<P>
<B>If OPTION is false,</B> the normal behavior will be used.


<P>
<B>If OPTION is true,</B> the body of the <CODE>message/rfc822</CODE> part
is decoded (after all, it might be encoded!) into a temporary filehandle, 
which is then rewound and parsed by this parser, creating an 
entity object.  What happens then is determined by the OPTION:

<DL>
<P><DT><B><A NAME="nest">NEST or 1</A></B><DD>

The contained message becomes a &quot;part&quot; of the <CODE>message/rfc822</CODE> entity,
as though the <CODE>message/rfc822</CODE> were a special kind of <CODE>multipart</CODE> entity.
However, the <CODE>message/rfc822</CODE> header (and the content-type) <I>is retained.</I>


<P>
<B>Warning:</B> since it is not legal MIME for anything but <CODE>multipart</CODE>
to have a &quot;part&quot;, the <CODE>message/rfc822</CODE> message <I>will appear to 
have no content</I> if you simply <CODE>print()</CODE> it out.  You will have to have to 
get at the reparsed body manually, by the <CODE>MIME::Entity::parts()</CODE> method.


<P>
IMHO, this option is probably only useful if you're <I>processing</I> messages,
but <I>not</I> saving or re-sending them.  In such cases, it is best to <I>not</I>
use &quot;parse nested&quot; at all.


<P><DT><B><A NAME="replace">REPLACE</A></B><DD>

The contained message replaces the <CODE>message/rfc822</CODE> entity, as though
the <CODE>message/rfc822</CODE> &quot;envelope&quot; never existed.


<P>
<B>Warning:</B> notice that, with this option, all the header information 
in the <CODE>message/rfc822</CODE> header is lost.  This might seriously bother
you if you're dealing with a top-level message, and you've just lost
the sender's address and the subject line.  <CODE>:-/</CODE>.

</DL>


<P>
<I>Thanks to Andreas Koenig for suggesting this method.</I>

</DL>


<P><HR>
<A NAME="parsing_messages">
<H2><FONT COLOR=#600020>
<A HREF="#__top"><IMG SRC="h2bullet.gif" ALT="" BORDER="0"></A>
Parsing messages</FONT></H2>
</A>

<DL>
<P><DT><B><A NAME="parse_data">parse_data DATA</A></B><DD>

<I>Instance method.</I>
Parse a MIME message that's already in-core.  You may supply the DATA 
in any of a number of ways...

<UL>

<P><LI><B>A scalar</B> which holds the message.


<P><LI><B>A ref to a scalar</B> which holds the message.  This is an efficiency hack.


<P><LI><B>A ref to an array of scalars.</B>  They are treated as a stream
which (conceptually) consists of simply concatenating the scalars.

</UL>


<P>
Returns a MIME::Entity, which may be a single entity, or an 
arbitrarily-nested multipart entity.  Returns undef on failure.


<P>
<B>Note:</B> where the parsed body parts are stored (e.g., in-core vs. on-disk)
is not determined by this class, but by the subclass you use to do the 
actual parsing (e.g., MIME::Parser).  For efficiency, if you know you'll 
be parsing a small amount of data, it is probably best to tell the parser 
to store the parsed parts in core.  For example, here's a short test 
program, using MIME::Parser:


<P>
<PRE>        use MIME::Parser;</PRE>



<P>
<PRE>        my $msg = &lt;&lt;EOF;
    Content-type: text/html
    Content-transfer-encoding: 7bit</PRE>



<P>
<PRE>    &lt;H1&gt;Hello, world!&lt;/H1&gt;;</PRE>



<P>
<PRE>    EOF
        $parser = new MIME::Parser;
        $parser-&gt;output_to_core('ALL');
        $entity = $parser-&gt;parse_data($msg);
        $entity-&gt;print(\*STDOUT);</PRE>



<P><DT><B><A NAME="parse_in">parse_in EXPR</A></B><DD>

<I>Instance method.</I>
Convenience front-end onto <CODE>read()</CODE>.
Simply give this method any expression that may be sent as the second
argument to open() to open a filehandle for reading.


<P>
Returns the parsed entity, or undef on error.


<P><DT><B><A NAME="parse_two">parse_two HEADFILE, BODYFILE</A></B><DD>

<I>Instance method.</I>
Convenience front-end onto <CODE>parse_in()</CODE>, intended for programs 
running under mail-handlers like <B>deliver</B>, which splits the incoming
mail message into a header file and a body file.
Simply give this method the paths to the respective files.


<P>
<B>Warning:</B> it is assumed that, once the files are cat'ed together,
there will be a blank line separating the head part and the body part.


<P>
<B>Warning:</B> new implementation slurps files into line array
for portability, instead of using 'cat'.  May be an issue if 
your messages are large.


<P>
Returns the parsed entity, or undef on error.


<P><DT><B><A NAME="read">read INSTREAM</A></B><DD>

<I>Instance method.</I>
Takes a MIME-stream and splits it into its component entities,
each of which is decoded and placed in a separate file in the splitter's
output_dir().


<P>
The INSTREAM can be given as a readable FileHandle, 
a globref'd filehandle (like <CODE>\*STDIN</CODE>),
or as <I>any</I> blessed object conforming to the IO:: interface.


<P>
Returns a MIME::Entity, which may be a single entity, or an 
arbitrarily-nested multipart entity.  Returns undef on failure.

</DL>


<P><HR>
<A NAME="writing_subclasses">
<H1><FONT COLOR=#600020>
<A HREF="#__top"><IMG SRC="h1bullet.gif" ALT="" BORDER="0"></A>
WRITING SUBCLASSES</FONT></H1>
</A>


<P>
All you have to do to write a subclass is to provide or override
the following methods:

<DL>
<P><DT><B><A NAME="init">init ARGS...</A></B><DD>

<I>Instance method, private.</I>
Initiallize the new parser object, with any args passed to <CODE>new()</CODE>.


<P>
You don't <I>need</I> to override this in your subclass.
If you override it, however, make sure you call the inherited
method to init your parents!


<P>
<PRE>    package MyParser;
    @ISA = qw(MIME::ParserBase);
    ...
    sub init {
	my $self = shift;
	$self-&gt;SUPER::init(@_);        # do my parent's init</PRE>



<P>
<PRE>	# ...my init stuff goes here...	</PRE>



<P>
<PRE>	$self;                         # return
    }</PRE>



<P>
Should return the self object on success, and undef on failure.


<P><DT><B><A NAME="new_body_for">new_body_for HEAD</A></B><DD>

<I>Abstract instance method.</I>
Based on the HEAD of a part we are parsing, return a new
body object (any desirable subclass of MIME::Body) for
receiving that part's data (both will be put into the
&quot;entity&quot; object for that part).


<P>
If you want the parser to do something other than write 
its parts out to files, you should override this method 
in a subclass.  For an example, see <B>MIME::Parser</B>.


<P>
<B>Note:</B> the reason that we don't use the &quot;interface&quot; mechanism
for this is that your choice of (1) which body class to use, and (2) how 
its <CODE>new()</CODE> method is invoked, may be very much based on the 
information in the header.

</DL>


<P>
You are of course free to override any other methods as you see
fit, like <CODE>new</CODE>.


<P><HR>
<A NAME="notes">
<H1><FONT COLOR=#600020>
<A HREF="#__top"><IMG SRC="h1bullet.gif" ALT="" BORDER="0"></A>
NOTES</FONT></H1>
</A>


<P>
<B>This is an abstract class.</B>
If you actually want to parse a MIME stream, use one of the children
of this class, like the backwards-compatible MIME::Parser.


<P><HR>
<A NAME="warnings">
<H1><FONT COLOR=#600020>
<A HREF="#__top"><IMG SRC="h1bullet.gif" ALT="" BORDER="0"></A>
WARNINGS</FONT></H1>
</A>

<DL>
<P><DT><B><A NAME="multipart">Multipart messages are always read line-by-line</A></B><DD>

Multipart document parts are read line-by-line, so that the
encapsulation boundaries may easily be detected.  However, bad MIME
composition agents (for example, naive CGI scripts) might return
multipart documents where the parts are, say, unencoded bitmap
files... and, consequently, where such &quot;lines&quot; might be 
veeeeeeeeery long indeed.


<P>
A better solution for this case would be to set up some form of 
state machine for input processing.  This will be left for future versions.


<P><DT><B><A NAME="multipart">Multipart parts read into temp files before decoding</A></B><DD>

In my original implementation, the MIME::Decoder classes had to be aware
of encapsulation boundaries in multipart MIME documents.
While this decode-while-parsing approach obviated the need for 
temporary files, it resulted in inflexible and complex decoder
implementations.


<P>
The revised implementation uses a temporary file (a la <CODE>tmpfile()</CODE>)
during parsing to hold the <I>encoded</I> portion of the current MIME 
document or part.  This file is deleted automatically after the
current part is decoded and the data is written to the &quot;body stream&quot;
object; you'll never see it, and should never need to worry about it.


<P>
Some folks have asked for the ability to bypass this temp-file
mechanism, I suppose because they assume it would slow down their application.
I considered accomodating this wish, but the temp-file
approach solves a lot of thorny problems in parsing, and it also
protects against hidden bugs in user applications (what if you've
directed the encoded part into a scalar, and someone unexpectedly
sends you a 6 MB tar file?).  Finally, I'm just not conviced that 
the temp-file use adds significant overhead.


<P><DT><B><A NAME="fuzzing">Fuzzing of CRLF and newline on input</A></B><DD>

RFC-1521 dictates that MIME streams have lines terminated by CRLF
(<CODE>&quot;\r\n&quot;</CODE>).  However, it is extremely likely that folks will want to 
parse MIME streams where each line ends in the local newline 
character <CODE>&quot;\n&quot;</CODE> instead.


<P>
An attempt has been made to allow the parser to handle both CRLF 
and newline-terminated input.


<P><DT><B><A NAME="fuzzing">Fuzzing of CRLF and newline on output</A></B><DD>

The <CODE>&quot;7bit&quot;</CODE> and <CODE>&quot;8bit&quot;</CODE> decoders will decode both
a <CODE>&quot;\n&quot;</CODE> and a <CODE>&quot;\r\n&quot;</CODE> end-of-line sequence into a <CODE>&quot;\n&quot;</CODE>.


<P>
The <CODE>&quot;binary&quot;</CODE> decoder (default if no encoding specified) 
still outputs stuff verbatim... so a MIME message with CRLFs 
and no explicit encoding will be output as a text file 
that, on many systems, will have an annoying ^M at the end of
each line... <I>but this is as it should be</I>.


<P><DT><B><A NAME="inability">Inability to handle multipart boundaries that contain newlines</A></B><DD>

First, let's get something straight: <I>this is an evil, EVIL practice,</I>
and is incompatible with RFC-1521... hence, it's not valid MIME.


<P>
If your mailer creates multipart boundary strings that contain
newlines <I>when they appear in the message body,</I> give it two weeks notice 
and find another one.  If your mail robot receives MIME mail like this, 
regard it as syntactically incorrect MIME, which it is.


<P>
Why do I say that?  Well, in RFC-1521, the syntax of a boundary is 
given quite clearly:


<P>
<PRE>      boundary := 0*69&lt;bchars&gt; bcharsnospace</PRE>



<P>
<PRE>      bchars := bcharsnospace / &quot; &quot;</PRE>



<P>
<PRE>      bcharsnospace :=    DIGIT / ALPHA / &quot;'&quot; / &quot;(&quot; / &quot;)&quot; / &quot;+&quot; /&quot;_&quot;
                   / &quot;,&quot; / &quot;-&quot; / &quot;.&quot; / &quot;/&quot; / &quot;:&quot; / &quot;=&quot; / &quot;?&quot;</PRE>



<P>
All of which means that a valid boundary string <I>cannot</I> have 
newlines in it, and any newlines in such a string in the message header
are expected to be solely the result of <I>folding</I> the string (i.e.,
inserting to-be-removed newlines for readability and line-shortening 
<I>only</I>).


<P>
Yet, there is at least one brain-damaged user agent out there 
that composes mail like this:


<P>
<PRE>      MIME-Version: 1.0
      Content-type: multipart/mixed; boundary=&quot;----ABC-
       123----&quot;
      Subject: Hi... I'm a dork!</PRE>



<P>
<PRE>      This is a multipart MIME message (yeah, right...)</PRE>



<P>
<PRE>      ----ABC-
       123----</PRE>



<P>
<PRE>      Hi there! </PRE>



<P>
We have <I>got</I> to discourage practices like this (and the recent file
upload idiocy where binary files that are part of a multipart MIME
message aren't base64-encoded) if we want MIME to stay relatively 
simple, and MIME parsers to be relatively robust.


<P>
<I>Thanks to Andreas Koenig for bringing a baaaaaaaaad user agent to
my attention.</I>


<P><DT><B><A NAME="untested">Untested &quot;binmode&quot; calls</A></B><DD>

New, untested binmode() calls were added in module version 1.11... 
if binmode() is <I>not</I> a NOOP on your system, please pay careful attention 
to your output, and report <I>any</I> anomalies.  
<I>It is possible that &quot;make test&quot; will fail on such systems,</I> 
since some of the tests involve checking the sizes of the output files.
That doesn't necessarily indicate a problem.


<P>
<B>If anyone</B> wants to test out this package's handling of both binary
and textual email on a system where binmode() is not a NOOP, I would be 
most grateful.  If stuff breaks, send me the pieces (including the 
original email that broke it, and at the very least a description
of how the output was screwed up).

</DL>


<P><HR>
<A NAME="under_the_hood">
<H1><FONT COLOR=#600020>
<A HREF="#__top"><IMG SRC="h1bullet.gif" ALT="" BORDER="0"></A>
UNDER THE HOOD</FONT></H1>
</A>


<P>
RFC-1521 gives us the following BNF grammar for the body of a
multipart MIME message:


<P>
<PRE>      multipart-body  := preamble 1*encapsulation close-delimiter epilogue</PRE>



<P>
<PRE>      encapsulation   := delimiter body-part CRLF</PRE>



<P>
<PRE>      delimiter       := &quot;--&quot; boundary CRLF 
                                   ; taken from Content-Type field.
                                   ; There must be no space between &quot;--&quot; 
                                   ; and boundary.</PRE>



<P>
<PRE>      close-delimiter := &quot;--&quot; boundary &quot;--&quot; CRLF 
                                   ; Again, no space by &quot;--&quot;</PRE>



<P>
<PRE>      preamble        := discard-text   
                                   ; to be ignored upon receipt.</PRE>



<P>
<PRE>      epilogue        := discard-text   
                                   ; to be ignored upon receipt.</PRE>



<P>
<PRE>      discard-text    := *(*text CRLF)</PRE>



<P>
<PRE>      body-part       := &lt;&quot;message&quot; as defined in RFC 822, with all 
                          header fields optional, and with the specified 
                          delimiter not occurring anywhere in the message 
                          body, either on a line by itself or as a substring 
                          anywhere.  Note that the semantics of a part 
                          differ from the semantics of a message, as 
                          described in the text.&gt;</PRE>



<P>
From this we glean the following algorithm for parsing a MIME stream:


<P>
<PRE>    PROCEDURE parse
    INPUT
        A FILEHANDLE for the stream.
        An optional end-of-stream OUTER_BOUND (for a nested multipart message).</PRE>



<P>
<PRE>    RETURNS
        The (possibly-multipart) ENTITY that was parsed.
        A STATE indicating how we left things: &quot;END&quot; or &quot;ERROR&quot;.</PRE>



<P>
<PRE>    BEGIN   
        LET OUTER_DELIM = &quot;--OUTER_BOUND&quot;.
        LET OUTER_CLOSE = &quot;--OUTER_BOUND--&quot;.</PRE>



<P>
<PRE>        LET ENTITY = a new MIME entity object.
        LET STATE  = &quot;OK&quot;.</PRE>



<P>
<PRE>        Parse the (possibly empty) header, up to and including the
        blank line that terminates it.   Store it in the ENTITY.</PRE>



<P>
<PRE>        IF the MIME type is &quot;multipart&quot;:
            LET INNER_BOUND = get multipart &quot;boundary&quot; from header.
            LET INNER_DELIM = &quot;--INNER_BOUND&quot;.
            LET INNER_CLOSE = &quot;--INNER_BOUND--&quot;.</PRE>



<P>
<PRE>            Parse preamble:
                REPEAT:
                    Read (and discard) next line
                UNTIL (line is INNER_DELIM) OR we hit EOF (error).</PRE>



<P>
<PRE>            Parse parts:
                REPEAT:
                    LET (PART, STATE) = parse(FILEHANDLE, INNER_BOUND).
                    Add PART to ENTITY.
                UNTIL (STATE != &quot;DELIM&quot;).</PRE>



<P>
<PRE>            Parse epilogue:
                REPEAT (to parse epilogue): 
                    Read (and discard) next line
                UNTIL (line is OUTER_DELIM or OUTER_CLOSE) OR we hit EOF
                LET STATE = &quot;EOF&quot;, &quot;DELIM&quot;, or &quot;CLOSE&quot; accordingly.</PRE>



<P>
<PRE>        ELSE (if the MIME type is not &quot;multipart&quot;):
            Open output destination (e.g., a file)</PRE>



<P>
<PRE>            DO:
                Read, decode, and output data from FILEHANDLE
            UNTIL (line is OUTER_DELIM or OUTER_CLOSE) OR we hit EOF.
            LET STATE = &quot;EOF&quot;, &quot;DELIM&quot;, or &quot;CLOSE&quot; accordingly.</PRE>



<P>
<PRE>        ENDIF</PRE>



<P>
<PRE>        RETURN (ENTITY, STATE).
    END</PRE>



<P>
For reasons discussed in MIME::Entity, we can't just discard the 
&quot;discard text&quot;: some mailers actually put data in the preamble.


<P><HR>
<A NAME="author">
<H1><FONT COLOR=#600020>
<A HREF="#__top"><IMG SRC="h1bullet.gif" ALT="" BORDER="0"></A>
AUTHOR</FONT></H1>
</A>


<P>
Copyright (c) 1996, 1997 by Eryq / eryq@zeegee.com


<P>
All rights reserved.  This program is free software; you can redistribute 
it and/or modify it under the same terms as Perl itself.


<P><HR>
<A NAME="version">
<H1><FONT COLOR=#600020>
<A HREF="#__top"><IMG SRC="h1bullet.gif" ALT="" BORDER="0"></A>
VERSION</FONT></H1>
</A>


<P>
$Revision: 4.107 $ $Date: 1998/01/17 06:31:12 $


<P><HR>
<SMALL>
		Last updated: Sat Jan 17 23:01:59 1998 <BR>
		Generated by pod2coolhtml 1.101.  Want a copy?  Just email
		<A HREF="mailto:eryq@enteract.com">eryq@enteract.com</A>.
		(Yes, it's free.)
		</SMALL></BODY>
</HTML>
	Global
`s`	Focus search bar
`?`	Bring up this help dialog
	GitHub
`g` `p`	Go to pull requests
`g` `i`	Go to GitHub issues (only if GitHub is preferred repository)
	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse
	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)