NAME

SWISH::Prog::Mail - index email with Swish-e

SYNOPSIS

use SWISH::Prog::Mail;

my $prog = SWISH::Prog::Mail->new(
    maildir         => 'path/to/my/maildir',
);

$prog->create;

DESCRIPTION

SWISH::Prog::Mail is a SWISH::Prog subclass designed for providing full-text search for your email with Swish-e.

SWISH::Prog::Mail uses Mail::Box, available from CPAN.

Since SWISH::Prog::Mail inherits from SWISH::Prog, read the SWISH::Prog docs first. Any overridden methods are documented here.

METHODS

new( maildir => path )

Create new indexer object.

NOTE: The new() method simply inherits from SWISH::Prog, so any params valid for that method() are allowed here.

init

Initialize object. This overrides SWISH::Prog init() base method.

init_indexer

Adds the special mail MetaName to the Config object before opening indexer.

create( opts )

Create index.

Returns number of emails indexed.

process_folder( Mail::Box object )

Recurse through Mail::Box object, indexing all messages. The Mail::Box object should be a folder as returned from Mail::Box::Manager->new().

filter_attachment( msg_url, Mail::Message::Part )

Run the document represented by Mail::Message::Part object through SWISH::Filter so attachments are indexed too.

Returns XML content ready for indexing.

index_mail( folder, Mail::Message )

Extract data and content from Mail::Message in folder and call index().

mail2xml( title, meta_hash_ref )

Converts meta_hash_ref to a XML string. Returns the XML.

title_filter( meta_hashref )

By default the Subject of each mail is used as the title. Override this method to alter that behaviour.

mail_filter( mail )

Override this method if you need to alter the mail prior to it being converted to XML for indexing.

This method is called prior to title_filter() so all data is affected.

See FILTERS section.

FILTERS

There are several filtering methods in this module. Here's a summary of what they do and when they are called, so you have a better idea of how to best use them. Pay special attention to those called before converting the row to XML as opposed to after conversion.

mail_filter

Called by index_sql() for each row fetched from the database. This is the first filter called in the chain. Called before the row is converted to XML.

title_filter

Called by index_sql() after row_filter() but only if an explicit title opt param was not passed to index_sql(). Called before the row is converted to XML.

SWISH::Prog::DBI::Doc *_filter() methods

Each of the normal SWISH::Prog::Doc attributes has a *_filter() method. These are called after the row is converted to XML. See SWISH::Prog::Doc.

NOTE: There is not a SWISH::Prog::DBI::Doc row_filter() method.

filter

The normal SWISH::Prog filter() method is called as usual just before passing to ok() inside index(). Called after the row is converted to XML.

ENCODINGS

Since Swish-e version 2 does not support UTF-8 encodings, you may need to convert or transliterate your text prior to indexing. Swish-e offers the TranslateCharacters config option, but that does not work well with multi-byte characters.

Here's one way to handle the issue. Use Search::Tools::Transliterate and the mail_filter() method to convert your UTF-8 text to single-byte characters. You can do this by subclassing SWISH::Prog::Mail and overriding the mail_filter() method.

See SWISH::Prog::DBI for a similar example.

SEE ALSO

http://swish-e.org/docs/

SWISH::Prog, SWISH::Prog::Mail::Doc, Search::Tools

AUTHOR

Peter Karman, <perl@peknet.com>

Thanks to rjbs and confounded on #email at irc.perl.org for suggestions on this module.

COPYRIGHT AND LICENSE

Copyright 2007 by Peter Karman

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.