NAME
Mail::Box::MH - Handle folders with a file per message.
SYNOPSIS
use Mail::Box::MH;
my $folder = new Mail::Box::MH folder => $ENV{MAIL}, ...;
DESCRIPTION
Mail::Box::MH extends Mail::Box and Mail::Box::Index to implement MH-type folders. This manual-page describes Mail::Box::MH and Mail::Box::MH::* packages. Read Mail::Box::Manager for the general overview, Mail::Box for understanding mailboxes, and Mail::Box::Message about how messages are used, first.
The explanation is complicated, but for normal use you should bother yourself with all details. Skip the manual-page to PUBLIC INTERFACE
.
How MH-folders work
MH-type folders use a directory to store the messages of one folder. Each message is stored in a seperate file. This seems useful, because changes in a folder change only a few of these small files, in contrast with file-based folders where changes in a folder cause rewrites of huge folder-files.
However, MH-based folders perform very bad if you need header-information of all messages. For instance, if you want to have full knowledge about all message-threads (see Mail::Box::Threads) in the folder, it requires to read all header-lines in all message-files. And usually, reading in threads is desired.
So, each message is written in a seperate file. The file-names are numbers, which count from 1
. Next to these message-files, a directory may contain a file named .mh_sequences
, storing labels which relate to the messages. Furthermore, a folder-directory may contain sub-directories, which are seen as sub-folders.
Implementation
This implementation supports the .mh-sequences
file and sub-folders. Next to this, considerable effort it made to avoid reading each message-file. This should boost performance of the Mail::Box module over other Perl-modules which are able to read folders.
Folder-types which store their messages each in one file, together in one directory, are bad for performance. Consider that you want to know the subjects of all messages, while browser through a folder with your mail-reading client. This would cause all message-files to be read.
Mail::Box::MH has two ways to try improve performance. You can use an index-file, and use on delay-loading. The combination performs even better. Both are explained in the next sections.
An index-file
If you specify keep_index
as option to the folder creation method new()
, then all header-lines of all messages from the folder which have been read once, will also be written into one dedicated index-file (one file per folder). The default filename is .index
However, index-files are not supported by any other reader which supports MH (as far as I know). If you read the folders with such I client, it will not cause unrecoverable conflicts with this index-file, but at most be bad for performance.
If you do not (want to) use an index-file, then delay-loading may save your day.
Delayed loading
The delay-loading mechanism of messages tries to be as lazy as possible. When the folder is opened, none of the message-files will be read. If there is an index-file, those headers will be taken. The labels will be read from the <.mh-sequences>. But from the messages, only the filenames are scanned.
Not before any header-line (or any other action on a message) is used, the message is read. This is done using Perl's AUTOLOADing, and is transparent to users. If the first thing you ask for is a header-line, then lazy_extract
and take_headers
determine what how far this message is parsed: into a Mail::Box::MH::NotParsed or a Mail::Box::MH::Message.
The index-file is farmost best performing, but also in the second case, performance can be ok. When a mail-client opens a huge folder, only a few of the messages will be displayed on the screen as folder-list. Only from the visible messages, header-lines like `Subject' are needed, so the AUTOLOAD automatically reads those message-files. Other messages will only be read from file when they appear in the viewport.
Message State Transition
The user of a folder gets it hand on a message-object, and is not bothered with the actual data which is stored in the object at that moment. As implementor of a mail-package, you might be.
For trained eyes only:
read() !lazy && !DELAY
-------> +----------------------------------> Mail::Box::
| MH::Message
| lazy && !DELAY && !index ^
+--------------. |
| \ \ NotParsed load |
| \ `-> NotReadHead ------>-'|
| REAL \ |
| \ |
| index v NotParsed load |
+------------------> MIME::Head ------->-'|
| ^ |
| | |
| |load_head |
| | |
| DELAY && !index NotParsed load |
+------------------> <no head> -------->--'
,-------------------------+---.
| ALL | | regexps && taken
v | |
NotParsed head() get() / /
NotReadHead --------> ------->+---'
\ \ \
\ other() \ other() \regexps && !taken
\ \ \
\ \ \ load Mail::Box::
`----->----+---------+---------> MH::Message
,---------------.
| |
v |
NotParsed head() |
MIME::Head -------->--'
\ Mail::Box::
`------------------------> MH::Message
load_head NotParsed
,----------> MIME::Head
/
NotParsed head() / lazy
<no head> --------->+
\ !lazy
\
`-----------> Mail::Box::
load MH::Message
Terms: lazy
refers to the evaluation of the lazy_extract()
option. The load
and load_head
are triggers to the AUTOLOAD
mothods. All terms like head()
refer to method-calls. The index
is true if there is an index-file kept, and the message-header found in there seems still valid (see the keep_index
option of new()
).
Finally, ALL
, REAL
, DELAY
(default), and regexps
refer to values of the take_headers
option of new()
. Notice that take_headers
on DELAY
is more important than lazy_extract
.
Hm... not that easy... Happily, the implementation takes fewer lines than the documentation.
PUBLIC INTERFACE
- new ARGS
-
Create a new folder. The are many options which are taken from other objects. For some, different options are set. For MH-specific options see below, but first the full list.
access Mail::Box 'r' dummy_type Mail::Box::Threads 'Mail::Box::Message::Dummy' folder Mail::Box $ENV{MAIL} folderdir Mail::Box <no default> index_filename Mail::Box::Index foldername.'/.index' keep_index Mail::Box::Index 0 labels_filename Mail::Box::MH foldername.'/.mh_sequence' lazy_extract Mail::Box 10000 (10kB) lockfile Mail::Box::Locker foldername.'/.lock' lock_method Mail::Box::Locker 'dotlock' lock_timeout Mail::Box::Locker 3600 (1 hour) lock_wait Mail::Box::Locker 10 (seconds) manager Mail::Box undef message_type Mail::Box 'Mail::Box::MH::Message' notreadhead_type Mail::Box 'Mail::Box::Message::NotReadHead' notread_type Mail::Box 'Mail::Box::MH::Message::NotParsed' realhead_type Mail::Box 'MIME::Head' remove_when_empty Mail::Box 1 save_on_exit Mail::Box 1 take_headers Mail::Box 'DELAY' thread_body Mail::Box::Threads 0 thread_timespan Mail::Box::Threads '3 days' thread_window Mail::Box::Threads 10 <none> Mail::Box::Tie
MH specific options:
labels_filename => FILENAME
In MH-folders, messages can be labeled, for instance based on the sender or whether it is read or not. This status is kept in a file which is usually called
.mh_sequences
, but that name can be overruled with this flag.
- readMessages
-
Read all messages from the folder. This method is called at instantiation of the folder, so do not call it yourself unless you have a very good reason.
- readMessage MESSAGE-NR [, BOOL]
-
Read one message from its file. This method is automatically triggered by the AUTOLOAD mechanism, so will usually not be called explicitly.
Although the name of the method seems to imply that also the message body is read, this might not be true. If BOOL is true (default false), the body is certainly read. Otherwise, it depends on the content of the folder's
take_headers
andlazy_extract
flags. - addMessage MESSAGE
-
Add a message to the MH-folder.
- write
-
Write all messages to the folder-file. Returns whether this was successful. If you want to write to a different file, you first create a new folder, then move the messages, and then write that file.
- readAllHeaders
-
Force all messages to be read at least till their header information is known. The exact status reached depends on the
take_headers
ofnew()
, as described above. - appendMessages LIST-OF-OPTIONS
-
(Class method) Append one or more messages to this folder. See the manual-page of Mail::Box for explantion of the options. The folder will not be opened. Returns the list of written messages on success.
Example: my $message = Mail::Internet->new(...); Mail::Box::Mbox->appendMessages ( folder => '=xyz' , message => $message , folderdir => $ENV{FOLDERS} );
- dirname
-
Returns the dirname related to this folder.
Example: print $folder->dirname;
- folderToDirname FOLDERNAME, FOLDERDIR
-
(class method) Translate a foldername into a filename, with use of the FOLDERDIR to replace a leading
=
. - highestMessageNumber
-
Returns the highest number which is used in the folder to store a file. This method may be called when the folder is read (then this number can be derived without file-system access), but also when the folder is not read (yet).
- messageID MESSAGE-ID [,MESSAGE]
-
Returns the message with the specified MESSAGE-ID. If also a MESSAGE is specified, the relationship between ID and MESSAGE will be stored first.
Be warned, that if the message is not read at all (
take_headers
set toDELAY
), each message of the folder will be parsed, at least to get its header. The headers are read from back to front in the folder. - allMessageIDs
-
Returns a list of all message-ids in the folder, including those which are to be deleted.
Be warned that this will cause all message-headers to be read from their files, if that was not done before. This penalty can be avoided keeping an index-file. See the
keep_index
option ofnew()
.
Manage message labels
MH-folder use one dedicated file per folder-directory to list special tags to messages in the folder. Typically, this file is called .mh_sequences
. The messages are numbered from 1
.
Example content of .mh_sequences
: cur: 93 unseen: 32 35-56 67-80
To generalize labels on messages, two are treated specially:
cur
The
cur
specifies the number of the message where the user stopped reading mail from this folder at last access. Internally in these modules refered to as labelcurrent
.unseen
With
unseen
is listed which message was never read. This must be a mistake in the design of MH: it must be a source of confusion. People should never use labels with a negation in the name:if($seen) if(!$unseen) #yuk! if(!$seen) if($unseen) unless($seen) unless($unseen) #yuk!
So: label
unseen
is translated intoseen
for internal use.readLabels
In MH-folders, messages can be labeled to easily select sets which are, for instance, posted by who. The file is usually called
.mh_sequences
but that name can be overruled using thelabels_filename
option ofnew()
.writeLabels HASH
Write the file which contains the relation between messages (actually the messages' sequence-numbers) and the labels those messages have. The parameter is a reference to an hash which contains for each label a reference to a list of message-numbers which have to be written.
folder management methods
Read the Mail::Box manual for more details and more options on each method.
- foundIn FOLDERNAME [,OPTIONS]
-
Autodetect if there is a Mail::Box::MH folder specified here. The FOLDERNAME specifies the name of the folder, as is specified by the application. The OPTIONS is a list of extra parameters to the request.
For this class, we use (if defined):
folderdir => DIRECTORY
Example: Mail::Box::MH->foundIn ( '=markov' , folderdir => "$ENV{HOME}/.mh" );
- listFolders [OPTIONS]
-
List the folders in a certain directory.
folderdir => DIRECTORY
check => BOOL
skip_empty => BOOL
- subFolders [OPTIONS]
-
Returns the subfolders to a folder. Although file-type folders do not have a natural form of sub-folders, we can simulate them. The
subfolder_extention
option of the constructor (new()
) defines how sub-folders can be recognized.check => BOOL
skip_empty => BOOL
- openSubFolder NAME [,OPTIONS]
-
Open (or create, if it does not exist yet) a new subfolder to an existing folder.
Example: my $folder = Mail::Box::MH->new(folder => '=Inbox'); my $sub = $folder->openSubFolder('read');
Mail::Box::MH::Message::Runtime
This object contains methods which are part of as well delay-loaded (not-parsed) as loaded messages, but not general for all folders.
PUBLIC INTERFACE
- new ARGS
-
Messages in directory-based folders use the following extra options for creation:
filename => FILENAME
The file where the message is stored in.
- print TO
-
Write one message to a file-handle. Unmodified messages are taken from the folder-file where they were stored in. Modified messages are written as in memory. Specify a file-handle to write TO (defaults to STDOUT).
- printIndex [FILEHANDLE]
-
Print the information of this message which is required to maintain an index-file. By default, this prints to STDOUT.
- readIndex CLASS [,FILEHANDLE]
-
Read the headers of one message from the index into a CLASS structure. CLASS is (a sub-class of) a MIME::Head. If no FILEHANDLE is specified, the data is read from STDIN.
- filename
-
Returns the name of the file in which this message is actually stored. This will return
undef
when the message is not read from a file. - headIsRead
-
Checks if the head of the message is read. This is true for fully parsed messages and messages where the header was accessed once.
Mail::Box::MH::Message
This object extends a Mail::Box::Message with extra tools and facts on what is special to messages in file-based folders, with respect to messages in other types of folders.
PUBLIC INTERFACE
- coerce FOLDER, MESSAGE [,OPTIONS]
-
(Class method) Coerce a MESSAGE into a Mail::Box::MH::Message, ready to be stored in FOLDER. When any message is offered to be stored in the mailbox, it first should have all fields which are specific for MH-folders.
The coerced message is returned on success, else
undef
.Example: my $mh = Mail::Box::MH->new(...); my $message = Mail::Box::Mbox::Message->new(...); Mail::Box::MH::Message->coerce($mh, $message); # Now $message is ready to be stored in $mh.
However, you can better use $mh->coerce($message); which will call coerce on the right message type for sure.
Mail::Box::MH::Message::NotParsed
Not parsed messages stay in the file until the message is used. Because this folder structure uses many messages in the same file, the byte-locations are remembered.
PUBLIC INTERFACE
- load CLASS [, ARRAY-OF-LINES]
-
This method is called by the autoloader then the data of the message is required. If you specified
REAL
for thetake_headers
option fornew()
, you did have a MIME::Head in your hands, however this will be destroyed when the whole message is loaded.If an array of lines is provided, that is parsed as message. Otherwise, the file of the message is opened and parsed.
- head
-
Get the head of the message. This may return immediately, because the head is already read. However, when we do not have a header yet, we read the message. At this moment, the
lazy_extract
option ofnew
comes into action: will we read the whole message now, or only the header? - messageID
-
Retreive the message's id. Every message has a unique message-id. This id is used mainly for recognizing discussion threads.
AUTHOR
Mark Overmeer (Mark@Overmeer.net). All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
VERSION
This code is alpha, version 0.6
2 POD Errors
The following errors were encountered while parsing the POD:
- Around line 816:
Expected '=item *'
- Around line 860:
Expected '=item *'