NAME
MIME::Structure - determine structure of MIME messages
SYNOPSIS
use MIME::Structure;
$parser = MIME::Structure->new;
$message = $parser->parse($filehandle);
print $message->{'header'};
$parts = $message->{'parts'};
foreach ($parts) {
$offset = $_->{'offset'};
$type = $_->{'type'};
$subtype = $_->{'subtype'};
$line = $_->{'line'};
$header = $_->{'header'};
}
print $parser->concise_structure($message), "\n";
METHODS
- new
-
$parser = MIME::Structure->new;
- parse
-
$message = $parser->parse($filehandle); ($message, @other_entities) = $parser->parse($filehandle);
Parses the message found in the given filehandle.
A MIME message takes the form of a non-empty tree, each of whose nodes is termed an entity (see RFCs 2045-2049). The root entity is the message itself; the children of a multipart message are the parts it contains. (A non-multipart message has no children.)
When called in list context, the parse method returns a list of references to hashes; each hash contains information about a single entity in the message.
The first hash represents the message itself; if it is a multipart message, subsequent entities are its parts and subparts in the order in which they occur in the message -- in other words, in pre-order. If called in scalar context, only a reference to the hash containing information about the message itself is returned.
The following elements may appear in these hashes:
- body_offset
-
The offset, in bytes, of the entity's body.
- content_length
-
The length, in bytes, of the entity's body. Currently only set for the message itself.
- encoding
-
The value of the entity's Content-Transfer-Encoding field.
- fields
-
If the keep_fields option is set, this will be a reference to a hash whose keys are the names (converted to lower case) are the names of all fields present in the entity;s header and whose values xxx.
- header
-
The entity's full header as it appeared in the message, not including the final blank line. This will be presently only if the keep_header option is set.
- kind
-
message
if the entity is the message, orpart
if it is a part within a message (or within another part). - length
-
The length, in bytes, of the entire entity, including its header and body. Currently only set for the message itself.
- level
-
The level at which the entity is found. The message itself is at level 0, its parts (if any) are at level 1, their parts are at level 2, and so on.
- line
-
The line number (1-based) of the first line of the message's header. The message itself always, by definition, is at line 1.
- number
-
A dotted-decimal notation that indicates the entity's place within the message. The root entity (the message itself) has number 1; its parts (if it has any any) are numbered 1.1, 1.2, 1.3, etc., and the numbers of their parts in turn (if they have any) are constructed in like manner.
- offset
-
The offset in bytes of the first line of the entity's header, measured from the first line of the message's header. The message itself always, by definition, is at offset 0.
- parent
-
A reference to the hash representing the entity's parent. If the entity is the message itself, this is undefined.
- parts
-
A reference to an array of the entity's parts. This will be present only if the entity is of type multipart.
- parts_boundary
-
The string used as a boundary to delimit the entity's parts. Present only in multipart entities.
- subtype
-
The MIME media subtype of the entity's content, e.g.,
plain
orjpeg
. - type
-
The MIME media type of the entity's content, e.g.,
text
orimage
. - type_params
-
A reference to a hash containing the attributes (if any) found in the Content-Type: header field. For example, given the following Content-Type header:
Content-Type: text/html; charset=UTF-8
The entity's type_params element will be this:
$entity{'type_params'} = { 'charset' => 'UTF-8', }
Besides parsing the message, this method may also be used to print the message, or portions thereof, as it parses; the print method (q.v.) may be used to specify what to print.
- keep_header
-
$keep_header = $parser->keep_header; $parser->keep_header(1);
Set (or get) whether headers should be remembered during parsing.
- keep_fields
-
Set (or get) whether fields (normalized headers) should be remembered.
-
$print = $parser->print; $parser->print($MIME::Structure::PRINT_HEADER | $MIME::Structure::PRINT_BODY); $parser->print('header,body');
Set (or get) what should be printed. This may be specified either as any of the following symbolic constants, ORed together:
- PRINT_NONE
- PRINT_HEADER
- PRINT_BODY
- PRINT_PREAMBLE
- PRINT_EPILOGUE
Or using the following string constants concatenated using any delimiter:
- none
- header
- body
- preamble
- epilogue
- print_header
-
$print_header = $parser->print_header; $parser->print_header(1);
Set (or get) whether headers should be printed.
- print_body
-
$print_body = $parser->print_body; $parser->print_body(1);
Set (or get) whether bodies should be printed.
- print_preamble
-
$print_preamble = $parser->print_preamble; $parser->print_preamble(1);
Set (or get) whether preambles should be printed.
- print_epilogue
-
$print_epilogue = $parser->print_epilogue; $parser->print_epilogue(1);
Set (or get) whether epilogues should be printed.
- entities
-
$parser->parse; print "$_->{type}/$_->{subtype} $_->{offset}\n" for @{ $parser->entities };
Returns a reference to an array of all the entities in a message, in the order in which they occur in the message. Thus the first entity is always the root entity, i.e., the message itself).
- concise_structure
-
$parser->parse; print $parser->concise_structure; # e.g., '(multipart/alternative:0 (text/html:291) (text/plain:9044))'
Returns a string showing the structure of a message, including the content type and offset of each entity (i.e., the message and [if it's multipart] all of its parts, recursively). Each entity is printed in the form:
"(" content-type ":" byte-offset [ " " parts... ")"
Offsets are byte offsets of the entity's header from the beginning of the message. (If parse() was called with an offset parameter, this is added to the offset of the entity's header.)
N.B.: The first offset is always 0.
BUGS
Documentation is sketchy.
AUTHOR
Paul Hoffman <nkuitse (at) cpan (dot) org>
COPYRIGHT
Copyright 2008 Paul M. Hoffman. All rights reserved.
This program is free software; you can redistribute it and modify it under the same terms as Perl itself.