NAME
Mail::Box::Threads - maintain threads within a set of folders
SYNOPSIS
my $mgr = Mail::Box::Manager->new;
my $folder = $mgr->open(folder => '/tmp/inbox');
my $threads = $mgr->threads(folder => $folder);
foreach my $thread ($threads->all)
{ $thread->print;
}
$threads->includeFolder($folder);
$threads->removeFolder($folder);
DESCRIPTION
Read Mail::Box::Manager and Mail::Box first. This man-page also describes Mail::Box::Thread
.
A (message-)thread is a message with links to messages which followed in reply of that message. And then the messages with replied to the messages, which replied the original message. And so on. Some threads are only one message long (never replied to), some threads are very long.
The Mail::Box::Threads module is very powerful. Not only is it able to do a descent job on MH-like folders (makes a trade-off between perfection and speed), it also can maintain threads from messages residing in different opened folders. Both facilities are rare for mail-agents.
More details about the IMPLEMENTATION at the bottom of this man-page. BE WARNED: not all possibilities are tested in great detail.
METHODS
- new ARGS
-
A Mail::Box::Threads-object is created by a Mail::Box::Manager. One manager can produce more than one of these objects. One Mail::Box::Threads-object can combine messages from a set of folders, which may be partially overlapping with other objects of the same type.
The construction of thread administration accepts the following options:
dummy_type => CLASS
The type of dummy messages. Dummy messages are used to fill holes in detected threads: refered to by messages found in the folder, but itselves not in the folder. Defaults to Mail::Box::Message::Dummy.
folder => FOLDER | REF-ARRAY-FOLDERS
folders => FOLDER | REF-ARRAY-FOLDERS
Specifies which folders are to be covered by the threads. You can specify one or more open folders. When you close a folder, the manager will automatically remove the messages of that folder from your threads.
thread_type => CLASS
Type of the threads, by default Mail::Box::Thread (described lower down in this manpage).
threader_type => CLASS | OBJECT
You can specify a module name (CLASS) or a prepared OBJECT, which can handle the basic actions required to detect threads. In both case, the class must be derived from Mail::Box::Threads.
window => INTEGER|'ALL'
The thread-window describes how many messages should be checked at maximum to fill `holes' in threads for folder which use delay-loading of message headers. The default value is 10.
The constant 'ALL' will cause thread-detection not to stop trying to fill holes, but continue looking until the first message of the folder is reached. Gives the best quality results, but may perform bad.
timespan => TIME | 'EVER'
Specify how fast threads usually work: the amount of time between an answer and a reply. This is used in combination with the
window
option to determine when to give-up filling the holes in threads.See Mail::Box::timespan2seconds for the possibilities for TIME. The default is '3 days'. With 'EVER', the search for messages in a thread will only be limited by the window-size.
thread_body => BOOL
May thread-detection be based on the content of a message? This has a serious performance implication when there are many messages without
In-Reply-To
andReferences
headers in the folder, because it will cause many messages to be parsed.NOT USED YET. Defaults to FALSE.
Example:
use Mail::Box::Manager; my $mgr = new Mail::Box::Manager; my $inbox = $mgr->open(folder => $ENV{MAIL}); my $read = $mgr->open(folder => 'Mail/read'); my $threads = $mgr->threads(folders => [$inbox, $read]); # longer alternative for last line: my $threads = $mgr->threads; $threads->includeFolder($inbox); $threads->includeFolder($read);
- folders
-
Returns the folders as managed by this threader.
- includeFolder FOLDERS
- removeFolder FOLDERS
-
Add/Remove a folders to/from the list of folders whose messages are organized in the threads maintained by this object. Duplicated inclusions will not cause any problems.
From the folders, the messages which have their header-lines parsed (see Mail::Box about lazy_extract) will be immediately scanned. Messages of which the header is known only later will have to report this (see Mail:Box with toBeThreaded).
Example:
$threads->includeFolder($inbox, $draft); $threads->removeFolder($draft);
- thread MESSAGE
-
Returns the thread where this MESSAGE is the start of. However, there is a possibility that this message is a reply itself.
Usually, all messages which are in reply of this message are dated later than the specified one. All headers of messages later than this one are are getting parsed first, for each folder in this threads-object.
Example:
my $threads = $mgr->threads(folder => $inbox); my $thread = $threads->thread($inbox->message(3)); print $thread->toString;
- threadStart MESSAGE
-
Based on a message, and facts from previously detected threads, try to build solid knowledge about the thread where this message is in.
- all
- sortedAll [PREPARE [COMPARE]]
-
Returns all messages which start a thread. The list may contain dummy messages and messages which are scheduled for deletion.
To be able to return all threads, thread construction on each message is performed first, which may be slow for some folder-types because is will enforce parsing of message-bodies.
The
sortedAll
returns the threads by default sorted on timestamp. - known
- sortedKnown [PREPARE [,COMPARE]]
-
Returns the list of all messages which are known to be the start of a thread. Threads containing messages which where not read from their folder (like often happends MH-folder messages) are not yet known, and hence will not be returned.
The list may contain dummy messages, and messages which are scheduled for deletion. Threads are detected based on explicitly calling
inThread()
andthread()
with a messages from the folder.Be warned that, each time a message's header is read from the folder, the return of the method can change.
The
sortedKnown
returns the threads by default sorted on timestamp.
CLASS Mail::Box::Thread
The Mail::Box::Thread maintains one node in the linked list of threads. Each node contains one message, and a list of its follow-ups. Next to that, the certainty that a message is a follow-up indeed is checked.
METHODS of Mail::Box::Thread
- new OPTIONS
-
You will not call this method by yourself, because it is the task of the Mail::Box::Threads object to construct it.
As OPTIONS, you can specify
message => OBJECT
The message which is stored in this node. The message must be a Mail::Box::Message.
messageID => MESSAGE-ID
The messageID which is stored in this node. Do only specify it when you don't have the message yet.
dummy_type => CLASS
When we need a dummy, which type should it become.
- message
-
Get the message which is stored in this thread-node. However: the same message may be located in many folders at the same time, which on turn may be controled by the same thread-manager.
In SCALAR context, you will get the first undeleted instance of the message. If all instances are flagged for deletion, then you get the first. When the open folders only contain references to the message, but no instance, you get a dummy message (Mail::Box::Message::Dummy).
In LIST context, you get all instances of the message, found till now.
Examples:
my $threads = $mgr->threads(folders => [$draft, $sent]); my $node = $draft->message(1)->thread; foreach my $instance ($node->message) { print "Found in ", $instance->folder, ".\n"; } print "Best is ", scalar $node->message, ".\n";
- addMessage MESSAGE
-
Add one message to the thread-node. If the node is filled with a dummy, then that one is replaced. In other cases, the messages is added to the end of the list.
sub addMessage($) { my ($self, $message) = @_
return $self->{MBT_messages} = [ $message ] if $self->isDummy; push @{$self->{MBT_messages}}, $message; $message; }
#-------------------------------------------
- isDummy
-
Returns whether this node has no messages (yet): is a hole in a thread.
- messageID
-
Return the message-id related to this thread-node. Each of the messages listed in this node will have the same ID.
- repliedTo
-
Returns the message where this one is a reply to. In SCALAR context, this will return the MESSAGE which was replied to by this one. This message object may be a dummy message. In case the message seems to be the first message of a thread, the value
undef
is returned.In LIST context, this method also returns how sure these are messages are related. When extended thread discovery in enabled, then some magic is applied to relate messages. In LIST context, the first returned argment is a MESSAGE, and the second a STRING constant. Values for the STRING may be:
'REPLY'
This relation was directly derived from an `in-reply-to' message header field. The relation is very sure.
'REFERENCE'
This relation is based on information found in a `Reference' message header field. One message may reference a list of messages which precede it in the thread. Let's hope they are stored in the right order.
'GUESS'
The relation is a big guess, of undetermined type.
More constants may be added later.
Examples:
my $question = $answer->repliedTo; my ($question, $quality) = $answer->repliedTo; if($question && $quality eq 'REPLY') { ... };
- follows THREAD, QUALITY
-
Register that the current thread is a reply on this specified THREAD. The QUALITY of the relation is specified by the second argument.
The relation may be specified more than once, but there can be only one. Once a reply (QUALITY equals
REPLY
) is detected, that value will be kept. - followedBy THREADS
-
Register that the THREADS are follow-ups to this message. There may be more than one of these follow-ups which are not related to each-other in any other way than sharing the same parent.
If the same relation is defined more than ones, this will not cause duplication of information.
- followUps
- sortedFollowUps [PREPARE [,COMPARE]]
-
Returns the list of follow-ups to this thread-node. This list contains parsed, not-parsed, and dummy messages.
The
sortedFollowUps()
returns the same list, but then sorted (by default based on an estimated time of the reply seestartTimeEstimate()
and Mail::Box::sort).
Actions on whole threads
Some conveniance methods are added to threads, to simplify retreiving knowledge from it.
- recurseThread CODE-REF
-
Execute a function for all sub-threads. If the subroutine returns true, sub-threads are visited, too. Otherwise, this branch is aborted. The routine is called with the thread-node as only argument.
- totalSize
-
Sum the size of all the messages in the thread.
- nrMessages
-
Number of messages in this thread.
- ids
-
Collect all the ids in this thread.
Examples:
$newfolder->addMessages($folder->ids($thread->ids)); $folder->delete($thread->ids);
- folded [BOOL]
-
Returns whether this (part of the) folder has to be shown folded or not. This is simply done by a label, which means that most folder-types can store this.
- threadToString
-
Translate a thread into a string. The string will contain at least one line for each message which was found, but tries to fold dummies. This is useful for debugging, but most message-readers will prefer to implement their own thread printer.
Example:
print $message->threadToString;
may result in
Subject of this message |- Re: Subject of this message |-*- Re: Re: Subject of this message | |- Re(2) Subject of this message | |- [3] Re(2) Subject of this message | `- Re: Subject of this message (reply) `- Re: Subject of this message
The `*' represents a lacking message. The `[3]' presents a folded thread with three messages.
- startTimeEstimate
-
Guess when this thread was started. Each message contains a various date specifications (each with various uncertainties, because of timezones and out-of-sync clocks), one of which is taken as timestamp for the message. This method returns the timestamp of this message (message contained in this node of the thread), but when this is a dummy the lowest of the replies.
IMPLEMENTATION
This module implements thread-detection on a folder. Messages created by the better mailers will include In-Reply-To
and References
lines, which are used to figure out how messages are related. If you prefer a better thread detection, they are implementable, but there may be a serious performance hit (depends on the type of folder used).
Maintaining threads
A Mail::Box::Threads-object is created by the Mail::Box::Manager, using its threads()
method. Each object can monitor the thread-relations between messages in one or more folders. When more than one folder is specified, the messages are merged while reading the threads, although nothing changes in the folder-structure. Adding and removing folders which have to be maintained is permitted at any moment, although may be quite costly in performance.
An example of the maintained structure is shown below. The Mail::Box::Manager has two open folders, and a thread-builder which monitors them both. The combined folders have two threads, the second is two long (msg3 is a reply on msg2). Msg2 is in two folders at once.
manager
| \
| `----------- threads
| | |
| thread thread---thread
| | /| /
| | // /
+---- folder1 | // /
| | / // /
| +-----msg1 // /
| +-----msg2-'/ /
| / /
`-----folder2 / /
| / /
+-----msg2 /
+-----msg3------'
Delayed thread detection
With all()
you get the start-messages of each thread of this folder. When that message was not found in the folder (not saved or already removed), you get a message of the dummy-type. These thread descriptions are in perfect state: all messages of the folder are included somewhere, and each missing message of the threads (`holes') are filled by dummies.
However, to be able to detect all threads it is required to have the headers of all messages, which is very slow for some types of folders, especially MH and IMAP folders.
For interactive mail-readers, it is prefered to detect threads only on messages which are in the viewport of the user. This may be sloppy in some situations, but everything is preferable over reading an MH mailbox with 10k e-mails to read only the see most recent messages.
In this object, we take special care not to cause unnecessary parsing (loading) of messages. Threads will only be detected on command, and by default only the message headers are used.
The following reports the Mail::Box::Thread object which is related to a message:
my $thread = $message->thread;
When the message was not put in a thread yet, it is done now. But, more work is done to return the best thread. Based on various parameters, which where specified when the folder was created, the method walks through the folder to fill the holes which are in this thread.
Walking from back to front (recently arrived messages are usually in the back of the folder), message after message are triggered to be included in their thread. At a certain moment, the whole thread of the requested method is found, a certain maximum number of messages was tried, but that didn't help (search window bound reached), or the messages within the folder are getting too old. Then the search to complete the thread will end, although more messages of them might have been in the folder: we don't scan the whole folder for performance reasons.
Finally, for each message where the head is known, for instance for all messages in mbox-folders, the correct thread is determined immediately. Also, all messages where the head get loaded later, are automatically included.
AUTHOR
Mark Overmeer (Mark@Overmeer.net). All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
VERSION
This code is beta, version 1.313