NAME

MIME::Head - MIME message header

WARNING: This code is in an evaluation phase until 1 August 1996. Depending on any comments/complaints received before this cutoff date, the interface may change in a non-backwards-compatible manner.

DESCRIPTION

A class for parsing in and manipulating RFC-822 message headers, with some methods geared towards standard (and not so standard) MIME fields as specified in RFC-1521, Multipurpose Internet Mail Extensions.

SYNOPSIS

Start off by requiring or using this package:

require MIME::Head;

You can create a MIME::Head object in a number of ways:

# Create a new, empty header, and populate it manually:    
$head = MIME::Head->new;
$head->set('content-type', 'text/plain; charset=US-ASCII');
$head->set('content-length', $len);

# Create a new header by parsing in the STDIN stream:
$head = MIME::Head->read(\*STDIN);

# Create a new header by parsing in a file:
$head = MIME::Head->from_file("/tmp/test.hdr");

# Create a new header by running a program:
$head = MIME::Head->from_file("cat a.hdr b.hdr |");

To get rid of all internal newlines in all fields:

# Get rid of all internal newlines:
$head->unfold();

To test whether a given field exists:

# Was a "Subject:" given?
if ($head->exists('subject')) {
    # yes, it does!
}

To get the contents of that field as a string:

# Is this a reply?
$reply = 1 if ($head->get('Subject') =~ /^Re: /);

To set the contents of a field to a given string:

# Is this a reply?
$head->set('Content-type', 'text/html');

To extract parameters from certain structured fields, as a hash reference:

# What's the MIME type?
$params = $head->params('content-type');
$mime_type = $$params{_};
$char_set  = $$params{'charset'};
$file_name = $$params{'name'};

To get certain commonly-used MIME information:

# The content type (e.g., "text/html"):
$mime_type     = $head->mime_type;

# The content transfer encoding (e.g., "quoted-printable"):
$mime_encoding = $head->mime_encoding;

# The recommended filename (e.g., "choosy-moms-choose.gif"):
$file_name     = $head->recommended_filename;

# The boundary text, for multipart messages:
$boundary      = $head->multipart_boundary;

PUBLIC INTERFACE

Creation, input, and output

new

Class method. Creates a new header object, with no fields.

from_file EXPR

Class or instance method. For convenience, you can use this to parse a header object in from EXPR, which may actually be any expression that can be sent to open() so as to return a readable filehandle. The "file" will be opened, read, and then closed:

# Create a new header by parsing in a file:
my $head = MIME::Head->from_file("/tmp/test.hdr");

Since this method can function as either a class constructor or an instance initializer, the above is exactly equivalent to:

# Create a new header by parsing in a file:
my $head = MIME::Head->new->from_file("/tmp/test.hdr");

On success, the object will be returned; on failure, the undefined value.

This is really just a convenience front-end onto read().

Output to the given FILEHANDLE, or to the currently-selected filehandle if none was given:

# Output to STDOUT:
$head->print(\*STDOUT);

WARNING: this method does not output the blank line that terminates the header in a legal message (since you may not always want it).

read FILEHANDLE

Class or instance method. This constructs a header object by reading it in from a FILEHANDLE, until either a blank line or an end-of-stream is encountered. A syntax error will also halt processing.

Supply this routine with a reference to a filehandle glob; e.g., \*STDIN:

# Create a new header by parsing in STDIN:
my $head = MIME::Head->read(\*STDIN);

Since this method can function as either a class constructor or an instance initializer, the above is exactly equivalent to:

# Create a new header by parsing in STDIN:
my $head = MIME::Head->new->read(\*STDIN);

Except that you should probably use the first form. On success, the object will be returned; on failure, the undefined value.

Getting/setting fields

NOTE: this interface is not as extensive as that of Mail::Internet; however, I have provided a set of methods that I can guarantee are supportable across any changes to the internal implementation of this class.

add FIELD,TEXT,[WHERE]

Add a new occurence of the FIELD, given by TEXT:

# Add the trace information:    
$head->add('Received', 'from eryq.pr.mcs.net by gonzo.net with smtp');

The FIELD is automatically coerced to lowercase. Returns the TEXT.

Normally, the new occurence will be appended to the existing occurences. However, if the optional WHERE argument is the string "BEFORE", then the new occurence will be prepended. NOTE: if you want to be explicit about appending, use the string "AFTER" for this argument.

WARNING: this method always adds new occurences; it doesn't overwrite any existing occurences... so if you just want to change the value of a field (creating it if necessary), then you probably don't want to use this method: consider using set() instead.

add_text FIELD,TEXT

Add some more text to the [last occurence of the] field:

# Force an explicit character set:
if ($head->get('Content-type') !~ /\bcharset=/) {
    $head->add_text('Content-type', '; charset="us-ascii"');
}

The FIELD is automatically coerced to lowercase.

WARNING: be careful if adding text that contains a newline! A newline in a field value must be followed by a single space or tab to be a valid continuation line!

I had considered building this routine so that it "fixed" bare newlines for you, but then I decided against it, since the behind-the-scenes trickery would probably create more problems through confusion. So, instead, you've just been warned... proceed with caution.

delete FIELD

Delete all occurences of the given field.

# Remove all the MIME information:
$head->delete('MIME-Version');
$head->delete('Content-type');
$head->delete('Content-transfer-encoding');
$head->delete('Content-disposition');

Currently returns 1 always.

exists FIELD

Returns whether a given field exists:

# Was a "Subject:" given?
if ($head->exists('subject')) {
    # yes, it does!
}

The FIELD is automatically coerced to lowercase. This method returns the undefined value if the field doesn't exist, and some true value if it does.

fields

Return a list of all fields (in no particular order):

foreach $field (sort $head->fields) {
    print "$field: ", $head->get($field), "\n";
}
get FIELD,[OCCUR]

Returns the text of the [first occurence of the] field, or the empty string if the field is not present (nice for avoiding those "undefined value" warnings):

# Is this a reply?
$is_reply = 1 if ($head->get('Subject') =~ /^Re: /);

NOTE: this returns the first occurence of the field, so as to be consistent with Mail::Internet::get(). However, if the optional OCCUR argument is defined, it specifies the index of the occurence you want: zero for the first, and -1 for the last.

# Print the first 'Received:' entry:
print "Most recent: ", $head->get('received'), "\n";

# Print the first 'Received:' entry, explicitly:
print "Most recent: ", $head->get('received', 0), "\n";

# Print the last 'Received:' entry:
print "Least recent: ", $head->get('received', -1), "\n"; 
get_all FIELD

Returns the list of all occurences of the field, or the empty list if the field is not present:

# How did it get here?
@history = $head->get_all('Received');

NOTE: I had originally experimented with having get() return all occurences when invoked in an array context... but that causes a lot of accidents when you get careless and do stuff like this:

print "\u$field: ", $head->get($field), "\n";

It also made the intuitive behaviour unclear if the OCCUR argument was given in an array context. So I opted for an explicit approach to asking for all occurences.

original_text

Recover the original text that was read() in to create this object:

print "PARSED FROM:\n", $head->original_text;    
set FIELD,TEXT

Set the field to [the single occurence given by] the TEXT:

# Set the MIME type:
$head->set('content-type', 'text/html');

The FIELD is automatically coerced to lowercase. This method returns the text.

unfold [FIELD]

Unfold the text of all occurences of the given FIELD. If the FIELD is omitted, all fields are unfolded.

"Unfolding" is the act of removing all newlines.

$head->unfold;

Currently, returns 1 always.

MIME-specific methods

All of the following methods extract information from the following structured fields:

Content-type
Content-transfer-encoding
Content-disposition

Be aware that they do not just return the raw contents of those fields, and in some cases they will fill in sensible (I hope) default values. Use get() if you need to grab and process the raw field text.

params FIELD

Extract parameter info from a structured field, and return it as a hash reference. For example, here is a field with parameters:

Content-Type: Message/Partial;
    number=2; total=3;
    id="oc=jpbe0M2Yt4s@thumper.bellcore.com"

Here is how you'd extract them:

$params = $head->params('content-type');
if ($$params{_} eq 'message/partial') {
    $number = $$params{'number'};
    $total  = $$params{'total'};
    $id     = $$params{'id'};
}

Like field names, parameter names are coerced to lowercase. The special '_' parameter means the default parameter for the field.

WARNING: the syntax is a little different for each field (content-type, content-disposition, etc.). I've attempted to come up with a nice, simple catch-all solution: it simply stops when it can't match anything else.

mime_encoding

Try real hard to determine the content transfer encoding, which is returned as a non-empty string in all-lowercase.

If no encoding could be found, the empty string is returned.

mime_type

Try real hard to determine the content type (e.g., "text/plain", "image/gif", "x-weird-type", which is returned in all-lowercase.

A happy thing: the following code will work just as you would want, even if there's no subtype (as in "x-weird-type")... in such a case, the $subtype would simply be the empty string:

($type, $subtype) = split('/', $head->mime_type);

If the content-type information is missing, it defaults to "text/plain", as per RFC-1521:

Default RFC-822 messages are typed by this protocol as plain text in
the US-ASCII character set, which can be explicitly specified as
"Content-type: text/plain; charset=us-ascii".  If no Content-Type is
specified, this default is assumed.  

If just the subtype is missing (a syntax error unless the type begins with "x-", but we'll tolerate it, since some brain-dead mailers actually do this), then it simply is not reported; e.g., "Content-type: TEXT" is returned simply as "text".

WARNING: prior to version 1.17, a missing subtype was reported as "x-subtype-unknown". I said at the time that this might be a really horrible idea, and that I might change it in the future. Well, it was, so I did.

If the content type is present but can't be parsed at all (yow!), the empty string is returned.

multipart_boundary

If this is a header for a multipart message, return the "encapsulation boundary" used to separate the parts. The boundary is returned exactly as given in the Content-type: field; that is, the leading double-hyphen (--) is not prepended.

(Well, almost exactly... from RFC-1521:

(If a boundary appears to end with white space, the white space 
must be presumed to have been added by a gateway, and must be deleted.)  

so we oblige and remove any trailing spaces.)

Returns undef (not the empty string) if either the message is not multipart, if there is no specified boundary, or if the boundary is illegal (e.g., if it is empty after all trailing whitespace has been removed).

Return the recommended external filename. This is used when extracting the data from the MIME stream.

Returns undef if no filename could be suggested.

Compatibility tweaks

tweak_FROM_parsing CHOICE

Class method. The parser may be tweaked so that any line in the header stream that begins with "From " will be either ignored, flagged as an error, or coerced into the special field "Mail-from:" (the default; this approach was inspired by Emacs's "Babyl" format). Though not valid for a MIME header, this will provide compatibility with some Unix mail messages. Just do this:

MIME::Head->tweak_FROM_parsing($choice)

Where $choice is one of 'IGNORE', 'ERROR', or 'COERCE'.

DESIGN ISSUES

Why have separate objects for the head and the entity?

See the documentation under MIME::Entity for the rationale behind this decision.

Why assume that MIME headers are email headers?

I quote from Achim Bohnet, who gave feedback on v.1.9 (I think he's using the word header where I would use field; e.g., to refer to "Subject:", "Content-type:", etc.):

There is also IMHO no requirement [for] MIME::Heads to look 
like [email] headers; so to speak, the MIME::Head [simply stores] 
the attributes of a complex object, e.g.:

    new MIME::Head type => "text/plain",
                   charset => ...,
                   disposition => ..., ... ;

See the next question for an answer to this one.

Why is MIME::Head so complex, and yet lacking in composition methods?

Sigh.

I have often wished that the original RFC-822 designers had taken a different approach, and not given every other field its own special grammar: read RFC-822 to see what I mean. As I understand it, in Heaven, all mail message headers have a very simple syntax that encodes arbitrarily-nested objects; a consistent, generic representation for exchanging OO data structures.

But we live in an imperfect world, where there's nonsense like this to put up with:

From: Yakko Warner <yakko@tower.wb.com>
Subject: Hello, nurse!
Received: from gsfc.nasa.gov by eryq.pr.mcs.net  with smtp
    (Linux Smail3.1.28.1 #5) id m0tStZ7-0007X4C; Thu, 21 Dec 95 16:34 CST
Received: from rhine.gsfc.nasa.gov by gsfc.nasa.gov (5.65/Ultrix3.0-C)
    id AA13596; Thu, 21 Dec 95 17:20:38 -0500
Content-type: text/html; charset=US-ASCII; 
    name="nurse.html"

I quote from Achim Bohnet, who gave feedback on v.1.9 (I think he's using the word header where I would use field; e.g., to refer to "Subject:", "Content-type:", etc.):

MIME::Head is too big. A better approach IMHO would be to 
have a general header class that knows about allowed characters, 
line length, and some (formatting) output routines.  There 
should be other classes that handle special headers and that 
are aware of the semantics/syntax of [those] headers...

    From, to, reply-to, message-id, in-reply-to, x-face ...

MIME::Head should only handle MIME specific headers.  

As he describes, each kind of field really merits its own small class (e.g, Mail::Field::Subject, Mail::Field::MessageId, Mail::Field::XFace, etc.), each of which provides a from_field() method for parsing field data into a class object, and a to_field() method for generating that field from a class object.

I kind of like the elegance of this approach. We could then have a generic Mail::Head class, instances of which would consist simply of one or more instances of subclasses of a generic Mail::Field class. Unrecognized fields would be represented as instances of Mail::Field by default.

There would be a MIME::Field class, with subclasses like MIME::Field::ContentType that would allow us to get fields like this:

$type    = $head->field('content-type')->type;
$subtype = $head->field('content-type')->subtype;
$charset = $head->field('content-type')->charset;

And set fields like this:

$head->field('content-type')->type('text');
$head->field('content-type')->subtype('html');
$head->field('content-type')->charset('us-ascii');

And, with that same MIME::Head object, get at other fields, like:

$subject     = $head->field('subject')->text;  # just the flat text
$sender_name = $head->field('from')->name;     # e.g., Yakko Warner
$sender_addr = $head->field('from')->addr;     # e.g., yakko@tower.wb.com

So why a special MIME::Head subclass of Mail::Head? Why, to enable us to add MIME-specific wrappers, like this:

package MIME::Head;
@ISA = qw(Mail::Head);

sub recommended_filename {
    my $self = shift;
    my $try;
    
    # First, try to get it from the content-disposition:
    ($try = $self->field('content-disposition')->filename) and return $try;
    
    # Next, try to get it from the content-type:
    ($try = $self->field('content-type')->name) and return $try;
    
    # Give up:
    undef;
}

Why all this "occurence" jazz? Isn't every field unique?

Aaaaaaaaaahh....no.

Looking at a typical mail message header, it is sooooooo tempting to just store the fields as a hash of strings, one string per hash entry. Unfortunately, there's the little matter of the Received: field, which (unlike From:, To:, etc.) will often have multiple occurences; e.g.:

Received: from gsfc.nasa.gov by eryq.pr.mcs.net  with smtp
    (Linux Smail3.1.28.1 #5) id m0tStZ7-0007X4C; Thu, 21 Dec 95 16:34 CST
Received: from rhine.gsfc.nasa.gov by gsfc.nasa.gov (5.65/Ultrix3.0-C)
    id AA13596; Thu, 21 Dec 95 17:20:38 -0500
Received: (from eryq@localhost) by rhine.gsfc.nasa.gov (8.6.12/8.6.12) 
    id RAA28069; Thu, 21 Dec 1995 17:27:54 -0500
Date: Thu, 21 Dec 1995 17:27:54 -0500
From: Eryq <eryq@rhine.gsfc.nasa.gov>
Message-Id: <199512212227.RAA28069@rhine.gsfc.nasa.gov>
To: eryq@eryq.pr.mcs.net
Subject: Stuff and things

The Received: field is used for tracing message routes, and although it's not generally used for anything other than human debugging, I didn't want to inconvenience anyone who actually wanted to get at that information.

I also didn't want to make this a special case; after all, who knows what other fields could have multiple occurences in the future? So, clearly, multiple entries had to somehow be stored multiple times... and the different occurences had to be retrievable.

SEE ALSO

MIME::Decoder, MIME::Entity, MIME::Head, MIME::Parser.

AUTHOR

Copyright (c) 1996 by Eryq / eryq@rhine.gsfc.nasa.gov

All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

The more-comprehensive filename extraction is courtesy of Lee E. Brotzman, Advanced Data Solutions.

VERSION

$Revision: 1.20 $ $Date: 1996/07/23 19:02:43 $