NAME
MIME::Head - MIME message header
ALPHA-RELEASE WARNING
This code is in an evaluation phase until 1 August 1996. Depending on any comments/complaints received before this cutoff date, the interface may change in a non-backwards-compatible manner.
DESCRIPTION
A class for parsing in and manipulating RFC-822 message headers, with some methods geared towards standard (and not so standard) MIME fields as specified in RFC-1521, Multipurpose Internet Mail Extensions.
SYNOPSIS
Start off by requiring or using this package:
require MIME::Head;
You can create a MIME::Head object in a number of ways:
# Create a new, empty header, and populate it manually:
$head = MIME::Head->new;
$head->set('content-type', 'text/plain; charset=US-ASCII');
$head->set('content-length', $len);
# Create a new header by parsing in the STDIN stream:
$head = MIME::Head->read(\*STDIN);
# Create a new header by parsing in a file:
$head = MIME::Head->from_file("/tmp/test.hdr");
# Create a new header by running a program:
$head = MIME::Head->from_file("cat a.hdr b.hdr |");
To get rid of all internal newlines in all fields:
# Get rid of all internal newlines:
$head->unfold();
To test whether a given field exists:
# Was a "Subject:" given?
if ($head->exists('subject')) {
# yes, it does!
}
To get the contents of that field as a string:
# Is this a reply?
$reply = 1 if ($head->get('Subject') =~ /^Re: /);
To set the contents of a field to a given string:
# Is this a reply?
$head->set('Content-type', 'text/html');
To extract parameters from certain structured fields, as a hash reference:
# What's the MIME type?
$params = $head->params('content-type');
$mime_type = $$params{_};
$char_set = $$params{'charset'};
$file_name = $$params{'name'};
To get certain commonly-used MIME information:
$mime_type = $head->mime_type;
$mime_encoding = $head->mime_encoding;
$file_name = $head->recommended_filename;
$boundary = $head->multipart_boundary;
COMPATIBILITY TWEAKS
The parser may be tweaked so that any line in the header stream that begins with "From "
will be either ignored, flagged as an error, or coerced into the special field "Mail-from:"
(the default; this approach was inspired by Emacs's "Babyl" format). Though not valid for a MIME header, this will provide compatibility with some Unix mail messages. Just do this:
MIME::Head->tweak_FROM_parsing($choice)
Where $choice
is one of IGNORE
, ERROR
, or COERCE
.
PUBLIC INTERFACE
Creation, input, and output
- new
-
Class method. Creates a new header object, with no fields.
- from_file EXPR
-
Class or instance method. For convenience, you can use this to parse a header object in from EXPR, which may actually be any expression that can be sent to open() so as to return a readable filehandle. The "file" will be opened, read, and then closed:
# Create a new header by parsing in a file: my $head = MIME::Head->from_file("/tmp/test.hdr");
Since this method can function as either a class constructor or an instance initializer, the above is exactly equivalent to:
# Create a new header by parsing in a file: my $head = MIME::Head->new->from_file("/tmp/test.hdr");
On success, the object will be returned; on failure, the undefined value.
This is really just a convenience front-end onto
read()
. - read FILEHANDLE
-
Class or instance method. This constructs a header object by reading it in from a FILEHANDLE, until either a blank line or an end-of-stream is encountered. A syntax error will also halt processing.
Supply this routine with a reference to a filehandle glob; e.g.,
\*STDIN
:# Create a new header by parsing in STDIN: my $head = MIME::Head->read(\*STDIN);
Since this method can function as either a class constructor or an instance initializer, the above is exactly equivalent to:
# Create a new header by parsing in STDIN: my $head = MIME::Head->new->read(\*STDIN);
Except that you should probably use th first form. On success, the object will be returned; on failure, the undefined value.
- print FILEHANDLE
-
Output to the given FILEHANDLE, or to the currently-selected filehandle if none was given:
# Output to STDOUT: $head->print(\*STDOUT);
WARNING: this method does not output the blank line that terminates the header in a legal message (since you may not always want it).
Getting/setting fields
NOTE: this interface is not as extensive as that of MIME::Internet; however, I have provided a set of methods that I can guarantee are supportable across any changes to the internal implementation of this class.
Anything that you can't do here, you'll have to do
- add FIELD,TEXT,[WHERE]
-
Add a new occurence of the FIELD, given by TEXT:
# Add the trace information: $head->add('Received', 'from eryq.pr.mcs.net by gonzo.net with smtp');
The FIELD is automatically coerced to lowercase. Returns the TEXT.
Normally, the new occurence will be appended to the existing occurences. However, if the optional WHERE argument is the string
"BEFORE"
, then the new occurence will be prepended. NOTE: if you want to be explicit about appending, use the string"AFTER"
for this argument.WARNING: this method always adds new occurences; it doesn't overwrite any existing occurences... so if you just want to change the value of a field (creating it if necessary), then you probably don't want to use this method: consider using
set()
instead. - add_text FIELD,TEXT
-
Add some more text to the [last occurence of the] field:
# Force an explicit character set: if ($head->get('Content-type') !~ /\bcharset=/) { $head->add_text('Content-type', '; charset="us-ascii"'); }
The FIELD is automatically coerced to lowercase.
WARNING: be careful if adding text that contains a newline! A newline in a field value must be followed by a single space or tab to be a valid continuation line!
I had considered building this routine so that it "fixed" bare newlines for you, but then I decided against it, since the behind-the-scenes trickery would probably create more problems through confusion. So, instead, you've just been warned... proceed with caution.
- delete FIELD
-
Delete all occurences of the given field.
# Remove all the MIME information: $head->delete('MIME-Version'); $head->delete('Content-type'); $head->delete('Content-transfer-encoding'); $head->delete('Content-disposition');
Currently returns 1 always.
- exists FIELD
-
Returns whether a given field exists:
# Was a "Subject:" given? if ($head->exists('subject')) { # yes, it does! }
The FIELD is automatically coerced to lowercase. This method returns the undefined value if the field doesn't exist, and some true value if it does.
- fields
-
Return a list of all fields (in no particular order):
foreach $field (sort $head->fields) { print "$field: ", $head->get($field), "\n"; }
- get FIELD,[OCCUR]
-
Returns the text of the [first occurence of the] field, or the empty string if the field is not present (nice for avoiding those "undefined value" warnings):
# Is this a reply? $is_reply = 1 if ($head->get('Subject') =~ /^Re: /);
NOTE: this returns the first occurence of the field, so as to be consistent with Mail::Internet::get(). However, if the optional OCCUR argument is defined, it specifies the index of the occurence you want: zero for the first, and -1 for the last.
# Print the first 'Received:' entry: print "Most recent: ", $head->get('received'), "\n"; # Print the first 'Received:' entry, explicitly: print "Most recent: ", $head->get('received', 0), "\n"; # Print the last 'Received:' entry: print "Least recent: ", $head->get('received', -1), "\n";
- get_all FIELD
-
Returns the list of all occurences of the field, or the empty list if the field is not present:
# How did it get here? @history = $head->get_all('Received');
NOTE: I had originally experimented with having
get()
return all occurences when invoked in an array context... but that causes a lot of accidents when you get careless and do stuff like this:print "\u$field: ", $head->get($field), "\n";
It also made the intuitive behaviour unclear if the OCCUR argument was given in an array context. So I opted for an explicit approach to asking for all occurences.
- original_text
-
Recover the original text that was read() in to create this object:
print "PARSED FROM:\n", $head->original_text;
- set FIELD,TEXT
-
Set the field to [the single occurence given by] the TEXT:
# Set the MIME type: $head->set('content-type', 'text/html');
The FIELD is automatically coerced to lowercase. This method returns the text.
- unfold [FIELD]
-
Unfold the text of all occurences of the given FIELD. If the FIELD is omitted, all fields are unfolded.
"Unfolding" is the act of removing all newlines.
$head->unfold;
Currently, returns 1 always.
MIME-specific stuff
All of the following methods extract information from the following structured fields:
Content-type
Content-transfer-encoding
Content-disposition
Be aware that they do not just return the raw contents of those fields, and in some cases they will fill in sensible (I hope) default values. Use get()
if you need to grab and process the raw field text.
- params FIELD
-
Extract parameter info from a structured field, and return it as a hash reference. For example, here is a field with parameters:
Content-Type: Message/Partial; number=2; total=3; id="oc=jpbe0M2Yt4s@thumper.bellcore.com"
Here is how you'd extract them:
$params = $head->params('content-type'); if ($$params{_} eq 'message/partial') { $number = $$params{'number'}; $total = $$params{'total'}; $id = $$params{'id'}; }
Like field names, parameter names are coerced to lowercase. The special '_' parameter means the default parameter for the field.
WARNING: the syntax is a little different for each field (content-type, content-disposition, etc.). I've attempted to come up with a nice, simple catch-all solution: it simply stops when it can't match anything else.
- mime_encoding
-
Try real hard to determine the content transfer encoding, which is returned as a non-empty string in all-lowercase.
If no encoding could be found, the empty string is returned.
- mime_type
-
Try real hard to determine the content type, which is returned as
"$type/$subtype"
in all-lowercase.($type, $subtype) = split('/', $head->mime_type);
If both the type and the subtype are missing, the content-type defaults to
"text/plain"
, as per RFC-1521:Default RFC-822 messages are typed by this protocol as plain text in the US-ASCII character set, which can be explicitly specified as "Content-type: text/plain; charset=us-ascii". If no Content-Type is specified, this default is assumed.
If just the subtype is missing (really a syntax error, but we'll tolerate it, since some mailers actually do this), then the subtype defaults to
"x-subtype-unknown"
. This may change in the future, since I don't know if this was a really horrible idea: unfortunately, there is no standard default subtype, and even when a good default can be decided upon, I felt queasy about returning the erroneous"text"
as either the legal"text/plain"
or the still-illegal"text/"
.If the content type is present but can't be parsed at all (yow!), the empty string is returned.
- multipart_boundary
-
If this is a header for a multipart message, return the "encapsulation boundary" used to separate the parts. The boundary is returned exactly as given in the
Content-type:
field; that is, the leading double-hyphen (--
) is not prepended.(Well, almost exactly... from RFC-1521:
(If a boundary appears to end with white space, the white space must be presumed to have been added by a gateway, and must be deleted.)
so we oblige and remove any trailing spaces.)
Returns undef (not the empty string) if either the message is not multipart, if there is no specified boundary, or if the boundary is illegal (e.g., if it is empty after all trailing whitespace has been removed).
- recommended_filename
-
Return the recommended external filename. This is used when extracting the data from the MIME stream.
Returns undef if no filename could be suggested.
UNDER THE HOOD
See the documentation under MIME::Entity for the rationale behind my additions to the MIME family.
Implementation
Looking at a typical mail message header, it is sooooooo tempting to just store the fields as a hash of strings, one string per hash entry. Unfortunately, there's the little matter of the Received:
field, which (unlike From:
, To:
, etc.) will often have multiple occurences; e.g.:
Received: from gsfc.nasa.gov by eryq.pr.mcs.net with smtp
(Linux Smail3.1.28.1 #5) id m0tStZ7-0007X4C; Thu, 21 Dec 95 16:34 CST
Received: from rhine.gsfc.nasa.gov by gsfc.nasa.gov (5.65/Ultrix3.0-C)
id AA13596; Thu, 21 Dec 95 17:20:38 -0500
Received: (from eryq@localhost) by rhine.gsfc.nasa.gov (8.6.12/8.6.12)
id RAA28069; Thu, 21 Dec 1995 17:27:54 -0500
Date: Thu, 21 Dec 1995 17:27:54 -0500
From: Eryq <eryq@rhine.gsfc.nasa.gov>
Message-Id: <199512212227.RAA28069@rhine.gsfc.nasa.gov>
To: eryq@eryq.pr.mcs.net
Subject: Stuff and things
The Received:
field is used for tracing message routes, and although it's not generally used for anything other than human debugging, I didn't want to inconvenience anyone who actually wanted to get at that information. I also didn't want to make this a special case; after all, who knows what other fields could have multiple occurences in the future? So, clearly, multiple entries had to somehow be stored multiple times.
SEE ALSO
MIME::Decoder, MIME::Entity, MIME::Head, MIME::Parser.
AUTHOR
Copyright (c) 1996 by Eryq / eryq@rhine.gsfc.nasa.gov
All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
More-comprehensive filename extraction by Lee E. Brotzman, Advanced Data Solutions.
VERSION
$Revision: 1.9 $ $Date: 1996/06/06 23:27:02 $