NAME
SWISH::Prog::Mail - index email with Swish-e
SYNOPSIS
use SWISH::Prog::Mail;
my $prog = SWISH::Prog::Mail->new(
maildir => 'path/to/my/maildir',
);
$prog->create;
DESCRIPTION
SWISH::Prog::Mail is a SWISH::Prog subclass designed for providing full-text search for your email with Swish-e.
SWISH::Prog::Mail uses Mail::Box, available from CPAN.
Since SWISH::Prog::Mail inherits from SWISH::Prog, read the SWISH::Prog docs first. Any overridden methods are documented here.
METHODS
new( maildir => path )
Create new indexer object.
NOTE: The new() method simply inherits from SWISH::Prog, so any params valid for that method() are allowed here.
init
Initialize object. This overrides SWISH::Prog init() base method.
init_indexer
Adds the special mail
MetaName to the Config object before opening indexer.
create( opts )
Create index.
Returns number of emails indexed.
process_folder( Mail::Box object )
Recurse through Mail::Box object, indexing all messages. The Mail::Box object should be a folder as returned from Mail::Box::Manager->new().
filter_attachment( msg_url, Mail::Message::Part )
Run the document represented by Mail::Message::Part object through SWISH::Filter so attachments are indexed too.
Returns XML content ready for indexing.
index_mail( folder, Mail::Message )
Extract data and content from Mail::Message in folder and call index().
mail2xml( title, meta_hash_ref )
Converts meta_hash_ref to a XML string. Returns the XML.
title_filter( meta_hashref )
By default the Subject of each mail is used as the title. Override this method to alter that behaviour.
mail_filter( mail )
Override this method if you need to alter the mail prior to it being converted to XML for indexing.
This method is called prior to title_filter() so all data is affected.
See FILTERS section.
FILTERS
There are several filtering methods in this module. Here's a summary of what they do and when they are called, so you have a better idea of how to best use them. Pay special attention to those called before converting the row to XML as opposed to after conversion.
mail_filter
Called by index_sql() for each row fetched from the database. This is the first filter called in the chain. Called before the row is converted to XML.
title_filter
Called by index_sql() after row_filter() but only if an explicit title
opt param was not passed to index_sql(). Called before the row is converted to XML.
SWISH::Prog::DBI::Doc *_filter() methods
Each of the normal SWISH::Prog::Doc attributes has a *_filter() method. These are called after the row is converted to XML. See SWISH::Prog::Doc.
NOTE: There is not a SWISH::Prog::DBI::Doc row_filter() method.
filter
The normal SWISH::Prog filter() method is called as usual just before passing to ok() inside index(). Called after the row is converted to XML.
ENCODINGS
Since Swish-e version 2 does not support UTF-8 encodings, you may need to convert or transliterate your text prior to indexing. Swish-e offers the TranslateCharacters config option, but that does not work well with multi-byte characters.
Here's one way to handle the issue. Use Search::Tools::Transliterate and the mail_filter() method to convert your UTF-8 text to single-byte characters. You can do this by subclassing SWISH::Prog::Mail and overriding the mail_filter() method.
See SWISH::Prog::DBI for a similar example.
SEE ALSO
SWISH::Prog, SWISH::Prog::Mail::Doc, Search::Tools
AUTHOR
Peter Karman, <perl@peknet.com>
Thanks to rjbs and confounded on #email at irc.perl.org for suggestions on this module.
COPYRIGHT AND LICENSE
Copyright 2007 by Peter Karman
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.