NAME

MIME::Parser - split MIME mail into decoded components

ALPHA-RELEASE WARNING

This code is in an evaluation phase until 1 August 1996. Depending on any comments/complaints received before this cutoff date, the interface may change in a non-backwards-compatible manner.

DESCRIPTION

Where it all begins. This is how you'll parse MIME streams to obtain MIME::Entity objects.

SYNOPSIS

use MIME::Parser;

# Create a new parser object:
my $parser = new MIME::Parser;

# Optional: set up parameters that will affect how it extracts 
#   documents from the input stream:
$parser->output_dir("$ENV{HOME}/mimemail");

# Parse an input stream:
$entity = $parser->read(\*STDIN) or die "couldn't parse MIME stream";

# Congratulations: you now have a (possibly multipart) MIME entity!
$entity->dump_skeleton;          # for debugging

THE NITTY GRITTY

RFC-1521 gives us the following BNF grammer for the body of a multipart MIME message:

multipart-body  := preamble 1*encapsulation close-delimiter epilogue

encapsulation   := delimiter body-part CRLF

delimiter       := "--" boundary CRLF 
                             ; taken from Content-Type field.
                             ; There must be no space between "--" 
                             ; and boundary.

close-delimiter := "--" boundary "--" CRLF 
                             ; Again, no space by "--"

preamble        := discard-text   
                             ; to be ignored upon receipt.

epilogue        := discard-text   
                             ; to be ignored upon receipt.

discard-text    := *(*text CRLF)

body-part       := <"message" as defined in RFC 822, with all 
                    header fields optional, and with the specified 
                    delimiter not occurring anywhere in the message 
                    body, either on a line by itself or as a substring 
                    anywhere.  Note that the semantics of a part 
                    differ from the semantics of a message, as 
                    described in the text.>

From this we glean the following algorithm for parsing a MIME stream:

PROCEDURE parse
INPUT
    A FILEHANDLE for the stream.
    An optional end-of-stream OUTER_BOUND (for a nested multipart message).

RETURNS
    The (possibly-multipart) ENTITY that was parsed.
    A STATE indicating how we left things: "END" or "ERROR".

BEGIN   
    LET OUTER_DELIM = "--OUTER_BOUND".
    LET OUTER_CLOSE = "--OUTER_BOUND--".

    LET ENTITY = a new MIME entity object.
    LET STATE  = "OK".

    Parse the (possibly empty) header, up to and including the
    blank line that terminates it.   Store it in the ENTITY.

    IF the MIME type is "multipart":
        LET INNER_BOUND = get multipart "boundary" from header.
        LET INNER_DELIM = "--INNER_BOUND".
        LET INNER_CLOSE = "--INNER_BOUND--".

        Parse preamble:
            REPEAT:
                Read (and discard) next line
            UNTIL (line is INNER_DELIM) OR we hit EOF (error).

        Parse parts:
            REPEAT:
                LET (PART, STATE) = parse(FILEHANDLE, INNER_BOUND).
                Add PART to ENTITY.
            UNTIL (STATE != "DELIM").

        Parse epilogue:
            REPEAT (to parse epilogue): 
                Read (and discard) next line
            UNTIL (line is OUTER_DELIM or OUTER_CLOSE) OR we hit EOF
            LET STATE = "EOF", "DELIM", or "CLOSE" accordingly.
 
    ELSE (if the MIME type is not "multipart"):
        Open output destination (e.g., a file)

        DO:
            Read, decode, and output data from FILEHANDLE
        UNTIL (line is OUTER_DELIM or OUTER_CLOSE) OR we hit EOF.
        LET STATE = "EOF", "DELIM", or "CLOSE" accordingly.

    ENDIF

    RETURN (ENTITY, STATE).
END

For reasons discussed in MIME::Entity, we can't just discard the "discard text": some mailers actually put data in the preamble.

PUBLIC INTERFACE

new

Create a new parser object. You can then set up various parameters before doing the actual parsing:

my $parser = new MIME::Parser;
$parser->output_dir("/tmp");
$parser->output_prefix("msg1");
my $entity = $parser->read(\*STDIN);

output_dir OPTVALUE

Get/set the output directory for the parsing operation. This is the directory where the extracted and decoded body parts will go. The default is ".".

If OPTVALUE is not given, the current output directory is returned. If OPTVALUE is given, the output directory is set to the new value, and the previous value is returned.

output_path HEAD

Given a MIME head for a file to be extracted, come up with a good output pathname for the extracted file.

Normally, the "directory" portion will be the output_dir(), and the "filename" portion will be the recommended filename extracted from the MIME header (or some simple temporary file name, starting with the output_prefix(), if the header does not specify a filename).

If there is a recommended filename, but it is judged to be evil (if it is empty, or if it contains "/"s or ".."s or non-ASCII characters), then a warning is issued and the temporary file name is used in its place. This may be overly restrictive, so...

NOTE: If you don't like the behavior of this function, you can change it by installing your own routine. See output_path_hook() for details.

Thanks to Laurent Amon for pointing out problems with the original implementation, and for making some good suggestions.

output_path_hook SUBREF

Install a different function to generate the output filename for extracted message data. Declare it like this:

    sub my_output_path_hook {
        my $parser = shift;   # this MIME::Parser
	my $head = shift;     # the MIME::Head for the current message

        # Your code here: it must return a path that can be 
        # open()ed for writing.  Remember that you can ask the
        # $parser about the output_dir, and you can ask the
        # $head about the recommended_filename!
    }

And install it immediately before parsing the input stream, like this:

# Create a new parser object, and install my own output_path hook:
my $parser = new MIME::Parser;
$parser->output_path_hook(\&my_output_path_hook);

# NOW we can parse an input stream:
$entity = $parser->read(\*STDIN);

output_prefix OPTVALUE

Get/set the output prefix for the parsing operation.

Get/set the output directory for the parsing operation. This is a short string that all filenames for extracted and decoded body parts will begin with. The default is "msg".

If OPTVALUE is not given, the current output prefix is returned. If OPTVALUE is given, the output directory is set to the new value, and the previous value is returned.

parse_two HEADFILE BODYFILE

Convenience front-end onto read(), intended for programs running under mail-handlers like deliver, which splits the incoming mail message into a header file and a body file.

Simply give this method the paths to the respective files. These must be pathnames: Perl "open-able" expressions won't work, since the pathnames are shell-quoted for safety.

WARNING: it is assumed that, once the files are cat'ed together, there will be a blank line separating the head part and the body part.

read FILEHANDLE

Takes a MIME-stream and splits it into its component entities, each of which is decoded and placed in a separate file in the splitter's output_dir().

The stream should be given as a glob ref to a readable FILEHANDLE; e.g., \*STDIN.

Returns a MIME::Entity, which may be a single entity, or an arbitrarily-nested multipart entity. Returns undef on failure.

QUESTIONABLE PRACTICES

Multipart messages are always read line-by-line

Multipart document parts are read line-by-line, so that the encapsulation boundaries may easily be detected. However, bad MIME composition agents (for example, naive CGI scripts) might return multipart documents where the parts are, say, unencoded bitmap files... and, consequently, where such "lines" might be veeeeeeeeery long indeed.

A better solution for this case would be to set up some form of state machine for input processing. This will be left for future versions.

Multipart parts read into temp files before decoding

In my original implementation, the MIME::Decoder classes had to be aware of encapsulation boundaries in multipart MIME documents. While this decode-while-parsing approach obviated the need for temporary files, it resulted in inflexible and complex decoder implementations.

The revised implementation uses temporary files (a la tmpfile()) to hold the encoded portions of MIME documents. Such files are deleted automatically after decoding is done, and no more than one such file is opened at a time, so you should never need to worry about them.

Fuzzing of CRLF and newline on input

RFC-1521 dictates that MIME streams have lines terminated by CRLF ("\r\n"). However, it is extremely likely that folks will want to parse MIME streams where each line ends in the local newline character "\n" instead.

An attempt has been made to allow the parser to handle both CRLF and newline-terminated input.

Fuzzing of CRLF and newline on output

The "7bit" and "8bit" decoders will decode both a "\n" and a "\r\n" end-of-line sequence into a "\n".

The "binary" decoder (default if no encoding specified) still outputs stuff verbatim... so a MIME message with CRLFs and no explicit encoding will be output as a text file that, on many systems, will have an annoying ^M at the end of each line... but this is as it should be.

AUTHOR

VERSION

$Revision: 1.8 $ $Date: 1996/06/06 23:42:39 $

To install MIME::Head, copy and paste the appropriate command in to your terminal.

cpanm

cpanm MIME::Head

CPAN shell

perl -MCPAN -e shell
install MIME::Head

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)