NAME

Email::Fingerprint - Calculate a digest for recognizing duplicate emails

VERSION

Version 0.49

SYNOPSIS

Email::Fingerprint calculates a checksum that uniquely identifies an email, for use in spotting duplicate messages. The checksum is based on: the Message-ID: header; or if it doesn't exist, on the Date:, From:, To: and Cc: headers together; or if those don't exist, on the body of the message.

use Email::Fingerprint;

my $foo = Email::Fingerprint->new();
...

ATTRIBUTES

FUNCTIONS

new

$fp = new Email::Fingerprint({
    input           => \*INPUT,         # Or $string, \@lines, etc.
    checksum        => "Digest::SHA",   # Or "Digest::MD5", etc.
    strict_checking => 1,               # If true, use message bodies
    %mail_header_opts,
});

Create a new fingerprinting object. If the input option is used, Email::Fingerprint attempts to intelligently read the email message given by that option, whether it's a string, an array of lines or a filehandle.

If $opts{checksum} is not supplied, then Email::Fingerprint will use the first checksum module that it finds. If it finds no modules, it will use unpack in a ghastly manner you don't want to think about.

Any %opts are also passed along to Mail::Header-new>; see the perldoc for Mail::Header options.

checksum

# Uses original/default settings to take checksum
$checksum = $fp->checksum;

# Can use any options accepted by constructor
$options  = {
    input           => \*INPUT,         # Or $string, \@lines, etc.
    checksum        => "Digest::SHA",   # Or "Digest::MD5", etc.
    strict_checking => 1,               # If true, use message bodies
    %mail_header_opts,
};

# Overrides one or more original/default settings
$checksum = $fp->checksum($options);

Calculates the actual email fingerprint. The optional hashref argument will permanently override the object's previous settings.

read

$fingerprint->read_string( $email );
$fingerprint->read_string( $email, \%mh_args );

Accepts the email message $email and attempts to read it intelligently, distinguishing strings, array references and file handles. If supplied, the optional hash reference is passed on to Mail::Header.

read_string

$fingerprint->read_string( $email_string );
$fingerprint->read_string( $email_string, \%mh_args );

Accepts the email message $email_string and prepares it for checksum computation. If supplied, the optional hashref is passed on to Mail::Header.

read_filehandle

$fingerprint->read_filehandle( $email_fh );
$fingerprint->read_filehandle( $email_fh, \%mh_args );

Accepts the email message $email_fh and prepares it for checksum computation. If supplied, the optional hashref is passed on to Mail::Header.

read_arrayref

$fingerprint->read_arrayref( \@email_lines );
$fingerprint->read_arrayref( \@email_lines, \%mh_args );

Accepts the email message \@email_lines and prepares it for checksum computation. If supplied, the optional hashref is passed on to Mail::Header.

message_loaded

Returns true if an email message has been loaded and is ready for checksum, or false if no message has been loaded or an error has occurred.

set_checksum

Specifies the checksum method to be used.

INTERNAL METHODS

BUILD

A constructor helper method called from the Class::Std framework. To execute BUILD, use new().

_extract_headers

Extract the Message-ID: header. If that does not exist, extract the Date:, From:, To: and Cc: headers. If those do not exist, then force strict checking so that the message body will be fingerprinted.

_extract_body

$body = $fp->_extract_body;

Gets the body of the message, as a string. Line-endings are preserved, so the body can, e.g., be printed.

This method must only be called after a message has been read. No validation is done in the method itself, so this is the user's responsibility.

_concat

@headers = qw( foo@example.com bar@example.com );
$delim   = 'To:';
$string  = $fp->_concat( \@headers, $delim );

# $string is now 'To:foo@example.comTo:bar@example.com'

Returns the concatenation of \@headers, with $delim prepended to each element of \@headers. If $delim is omitted, the empty string is used. \@headers elements are all chomped before concatenation.

AUTHOR

Len Budney, <lbudney at pobox.com>

BUGS

Please report any bugs or feature requests to bug-email-fingerprint at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Email-Fingerprint. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

perldoc Email::Fingerprint

You can also look for information at:

SEE ALSO

See Mail::Header for options governing the parsing of email headers.

ACKNOWLEDGEMENTS

Email::Fingerprint is based on the eliminate_dups script by Peter Samuel and available at http://www.qmail.org/.

COPYRIGHT & LICENSE

Copyright 2006-2011 Len Budney, all rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.