NAME
Email::Find - Find RFC 822 email addresses in plain text
SYNOPSIS
use Email::Find;
$num_found = find_emails($text, \&callback);
DESCRIPTION
This is a module for finding a subset of RFC 822 email addresses in arbitrary text (CAVEATS). The addresses it finds are not guaranteed to exist or even actually be email addresses at all (CAVEATS), but they will be valid RFC 822 syntax.
Email::Find will perform some heuristics to avoid some of the more obvious red herrings and false addresses, but there's only so much which can be done without a human.
Functions
Email::Find exports one function, find_emails(). It works very similar to URI::Find's find_uris().
$num_emails_found = find_emails($text, \&callback);
The first argument is a block of text for find_emails to search through and manipulate. Second is a callback routine which defines what to do with each email as they're found. It returns the total number of emails found.
The callback is given two arguments. The first is a Mail::Address object representing the address found. The second is the actual original email as found in the text. Whatever the callback returns will replace the original text.
EXAMPLES
# Simply print out all the addresses found leaving the text undisturbed.
find_emails($text, sub {
my($email, $orig_email) = @_;
print "Found ".$email->format."\n";
return $orig_email;
});
# For each email found, ping its host to see if its alive.
require Net::Ping;
$ping = Net::Ping->new;
my %Pinged = ();
find_emails($text, sub {
my($email, $orig_email) = @_;
my $host = $email->host;
next if exists $Pinged{$host};
$Pinged{$host} = $ping->ping($host);
});
while( my($host, $up) = each %Pinged ) {
print "$host is ". $up ? 'up' : 'down' ."\n";
}
# Count how many addresses are found.
print "Found ", find_emails($text, sub { return $_[1] }), " addresses\n";
# Wrap each address in an HTML mailto link.
find_emails($text, sub {
my($email, $orig_email) = @_;
my($address) = $email->format;
return qq|<a href="mailto:$address">$orig_email</a>|;
});
CAVEATS
- Why a subset of RFC 822?
-
I say that this module finds a subset of RFC 822 because if I attempted to look for all possible valid RFC 822 addresses I'd wind up practically matching the entire block of text! The complete specification is so wide open that its difficult to construct soemthing that's not an RFC 822 address.
To keep myself sane, I look for the 'address spec' or 'global address' part of an RFC 822 address. This is the part which most people consider to be an email address (the 'foo@bar.com' part) and it is also the part which contains the information necessary for delivery.
- Why are some of the matches not email addresses?
-
Alas, many things which aren't email addresses look like email addresses and parse just fine as them. The biggest headache is email and usenet message IDs. I do my best to avoid them, but there's only so much cleverness you can pack into one library.
AUTHOR
Copyright 2000, 2001 Michael G Schwern <schwern@pobox.com>. All rights reserved.
THANKS
Thanks to Jeremy Howard for his patch to make it work under 5.005.
LICENSE
This module may not be used for the purposes of sending unsolicited email (ie. spamming) in any way, shape or form or for the purposes of generating lists for commercial sale without explicit permission from the author.
For everyone else this module is free software; you may redistribute it and/or modify it under the same terms as Perl itself.
If you're not sure, contact the author.
SEE ALSO
Email::Valid, RFC 822, URI::Find, Apache::AntiSpam