NAME

MIME::Parser - split MIME mail into decoded components

SYNOPSIS

use MIME::Parser;

# Create a new parser object:
my $parser = new MIME::Parser;
    
# Set up output directory for files:
$parser->output_dir("$ENV{HOME}/mimemail");

# Set up the prefix for files with auto-generated names:
$parser->output_prefix("part");

# If content length is this or below, write to in-core scalar;
# Else, write to a disk file (the default action):
$parser->output_to_core(20000);
     
# Parse an input stream:
$entity = $parser->read(\*STDIN) or die "couldn't parse MIME stream";

# Congratulations: you now have a (possibly multipart) MIME entity!
$entity->dump_skeleton;          # for debugging 

DESCRIPTION

A subclass of MIME::ParserHead, providing one useful way to parse MIME streams and obtain MIME::Entity objects. This particular parser class outputs the different parts as files on disk, in the directory of your choice.

If you don't like the way files are named... it's object-oriented and subclassable. If you want to do something really different, perhaps you want to subclass MIME::ParserBase instead.

WARNINGS

The organization of the output_path() code changed in version 1.11 of this module. If you are upgrading from a previous version, and you use inheritance to override the output_path() method, please take a moment to familiarize yourself with the new code. Everything should still work, but you never know...

PUBLIC INTERFACE

new_body_for HEAD

Instance method. Based on the HEAD of a part we are parsing, return a new body object (any desirable subclass of MIME::Body) for receiving that part's data.

The default behavior is to examine the HEAD for a recommended filename (generating a random one if none is available), and create a new MIME::Body::File on that filename in the parser's current output_dir().

If you use the output_to_core method (q.v.) before parsing, you can force this method to output some or all or a message's parts to in-core data structures, based on their size.

If you want the parser to do something else entirely, you should override this method in a subclass.

output_to_core [CUTOFF]

Instance method. Normally, instances of this class output all their decoded body data to disk files (via MIME::Body::File). However, you can change this behaviour by invoking this method before parsing:

If CUTOFF is an integer, then we examine the Content-length of each entity being parsed. If the content-length is known to be CUTOFF or below, the body data will go to an in-core data structure; If the content-length is unknown or if it exceeds CUTOFF, then the body data will go to a disk file.

If the CUTOFF is the string "NONE", then all body data goes to disk files regardless of the content-length. This is the default behaviour.

If the CUTOFF is the string "ALL", then all body data goes to in-core data structures regardless of the content-length. This is very risky (what if someone emails you an MPEG or a tar file, hmmm?) but people seem to want this bit of noose-shaped rope, so I'm providing it.

Without argument, returns the current cutoff: "ALL", "NONE" (the default), or a number.

See the new_body_for() method for more details.

output_dir [DIRECTORY]

Instance method. Get/set the output directory for the parsing operation. This is the directory where the extracted and decoded body parts will go. The default is ".".

If DIRECTORY is not given, the current output directory is returned. If DIRECTORY is given, the output directory is set to the new value, and the previous value is returned.

Note: this is used by the output_path() method in this class. It should also be used by subclasses, but if a subclass decides to output parts in some completely different manner, this method may of course be completely ignored.

evil_filename FILENAME

Instance method. Is this an evil filename? It is if it contains path info or non-ASCII characters. Returns true or false.

Note: Override this method in a subclass if you just want to change which externally-provided filenames are allowed, and which are not. Like this:

package MIME::MyParser;

require 5.002;                # for SUPER
use package MIME::Parser;

@MIME::MyParser::ISA = ('MIME::Parser');

sub evil_filename {
    my ($self, $name) = @_;
    return 1 if (!defined($name) || ($name eq ''));
    return 1 if ($name =~ m|/|);                      # Unix pathname
    return 1 if (($name eq '.') || ($name eq '..'));  # Unix directories
    return 1 if ($name =~ /[\s\x00-\x1f\x7f]/);       # non-printables
    0;     # it's good!
}
1;

Note: My apologies to various individuals across the Atlantic who have been inconvenienced by this function's rejection of non-ASCII characters. Changing the default behavior now would likely cause howls of protest from folks who depend on it. If you don't like the behavior of this function, you can define your own subclass of MIME::Parser and override it as shown above.

Thanks to Andrew Pimlott for finding a real dumb bug in the original version. Thanks to Nickolay Saukh for noting that (a) evil is in the eye of the beholder, and (b) 0x7F is whitespace, too.

output_path HEAD

Instance method. Given a MIME head for a file to be extracted, come up with a good output pathname for the extracted file.

The "directory" portion of the returned path will be the output_dir(), and the "filename" portion will be determined as follows:

  • If the MIME header contains a recommended filename, and it is not judged to be "evil" (evil filenames are ones which contain things like "/" or ".." or non-ASCII characters), then that filename will be used.

  • If the MIME header contains a recommended filename, but it is judged to be "evil", then a warning is issued and we pretend that there was no recommended filename. In which case...

  • If the MIME header does not specify a recommended filename, then a simple temporary file name, starting with the output_prefix(), will be used.

Note: If you don't like the behavior of this function, you can define your own subclass of MIME::Parser and override it there:

     package MIME::MyParser;
     
     require 5.002;                # for SUPER
     use package MIME::Parser;
     
     @MIME::MyParser::ISA = ('MIME::Parser');
     
     sub output_path {
         my ($self, $head) = @_;
         
         # Your code here; FOR EXAMPLE...
         if (i_have_a_preference) {
	     return my_custom_path;
         }
	 else {                      # return the default path:
             return $self->SUPER::output_path($head);
         }
     }
     1;

Note: Nickolay Saukh pointed out that, given the subjective nature of what is "evil", this function really shouldn't warn about an evil filename, but maybe just issue a debug message. I considered that, but then I thought: if debugging were off, people wouldn't know why (or even if) a given filename had been ignored. In mail robots that depend on externally-provided filenames, this could cause hard-to-diagnose problems. So, the message is still a warning, but now it's only output if $^W is true.

Thanks to Laurent Amon for pointing out problems with the original implementation, and for making some good suggestions. Thanks also to Achim Bohnet for pointing out that there should be a hookless, OO way of overriding the output_path.

output_path_hook SUBREF

Instance method: DEPRECATED. Install a different function to generate the output filename for extracted message data. Declare it like this:

    sub my_output_path_hook {
        my $parser = shift;   # this MIME::Parser
	my $head = shift;     # the MIME::Head for the current message

        # Your code here: it must return a path that can be 
        # open()ed for writing.  Remember that you can ask the
        # $parser about the output_dir, and you can ask the
        # $head about the recommended_filename!
    }

And install it immediately before parsing the input stream, like this:

# Create a new parser object, and install my own output_path hook:
my $parser = new MIME::Parser;
$parser->output_path_hook(\&my_output_path_hook);

# NOW we can parse an input stream:
$entity = $parser->read(\*STDIN);

This method is intended for people who are squeamish about creating subclasses. See the output_path() documentation for a cleaner, OOish way to do this.

output_prefix [PREFIX]

Instance method. Get/set the output prefix for the parsing operation. This is a short string that all filenames for extracted and decoded body parts will begin with. The default is "msg".

If PREFIX is not given, the current output prefix is returned. If PREFIX is given, the output directory is set to the new value, and the previous value is returned.

WRITING SUBCLASSES

Authors of subclasses can consider overriding the following methods. They are listed in approximate order of most-to-least impact.

new_body_for

Override this if you want to change the entire mechanism for choosing the output destination. You may want to use information in the MIME header to determine how files are named, and whether or not their data goes to a disk file or to an in-core scalar. (You have the MIME header object at your disposal.)

output_path

Override this if you want to completely change how the output path (containing both the directory and filename) is determined for those parts being output to disk files. (You have the MIME header object at your disposal.)

evil_filename

Override this if you want to change the test that determines whether or not a filename obtained from the header is permissible.

output_prefix

Override this if you want to change the mechanism for getting/setting the desired output prefix (used in naming files when no other names are suggested).

output_dir

Override this if you want to change the mechanism for getting/setting the desired output directory (where extracted and decoded files are placed).

AUTHOR

Copyright (c) 1996 by Eryq / eryq@rhine.gsfc.nasa.gov

All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

VERSION

$Revision: 3.203 $ $Date: 1997/01/22 08:39:25 $