NAME
App::SpamcupNG::HTMLParse - functions to extract information from Spamcop.net web pages
SYNOPSIS
use App::SpamcupNG::HTMLParse qw(find_next_id find_errors find_warnings find_spam_header find_message_age find_header_info);
DESCRIPTION
This package export functions that uses XPath to extract specific information from the spamcop.net HTML pages.
EXPORTS
Following are all exported functions by this package.
find_header_info
Finds information from the e-mail header of the received SPAM and returns it.
Returns a hash reference with the following keys:
There is an attempt to normalize the Content-Type
header, by removing extra spaces and using just the first two entries, also making everything as lower case.
find_message_age
Find and return the SPAM message age information.
Returns an array reference, with the zero index as an integer with the age, and the index 1 as the age unit (possibly "hour");
If nothing is found, returns undef
;
find_next_id
Expects as parameter a scalar reference of the HTML page.
Tries to find the SPAM ID used to identify SPAM reports on spamcop.net webpage.
Returns the ID if found, otherwise undef
.
find_warnings
Expects as parameter a scalar reference of the HTML page.
Tries to find all warnings on the HTML, based on CSS classes.
Returns an array reference with all warnings found.
find_errors
Expects as parameter a scalar reference of the HTML page.
Tries to find all errors on the HTML, based on CSS classes.
Returns an array reference with all errors found.
find_best_contacts
Expects as parameter a scalar reference of the HTML page.
Tries to find all best contacts on the HTML, based on CSS classes.
The best contacts are the e-mail address that Spamcop considers appropriate to use for SPAM reporting.
Returns an array reference with all best contacts found.
find_spam_header
Expects as parameter a scalar reference of the HTML page.
You can optionally pass a second parameter that defines if each line should be prefixed with a tab character. The default value is false.
Tries to find the e-mail header of the SPAM reported.
Returns an array reference with all the lines of the e-mail header found.
find_receivers
Expects as parameter a scalar reference of the HTML page.
Tries to find all the receivers of the SPAM report, even if those were not real e-mail address, only internal identifiers for Spamcop to store statistics.
Returns an array reference, where each item is a string.
SEE ALSO
AUTHOR
Alceu Rodrigues de Freitas Junior, <arfreitas@cpan.org>
COPYRIGHT AND LICENSE
This software is copyright (c) 2018 of Alceu Rodrigues de Freitas Junior, <arfreitas@cpan.org>
This file is part of App-SpamcupNG distribution.
App-SpamcupNG is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
App-SpamcupNG is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with App-SpamcupNG. If not, see <http://www.gnu.org/licenses/>.