NAME
News::GnusFilter - package for scoring usenet posts
Version: 0.55 ($Revision: 1.6 $)
SYNOPSIS
# ~/.gnusfilter - scoring script
require 5.006;
use strict;
use News::GnusFilter qw/:tests groan references NSLOOKUP VERBOSE/;
NSLOOKUP = ""; # disables nslookups for bogus_address test
VERBOSE = 1; # noisier output for debugging
my $goof = News::GnusFilter->set_score( {
rethreaded => 80,
no_context => 60,
} );
# standard tests - see MESSAGE TESTS for details
missing_headers;
bogus_address;
annoying_subject;
cross_post;
mimes;
lines_too_long;
control_characters;
miswrapped;
misattribution;
jeopardy_quoted;
check_quotes; # runs multiple tests on quoted paragraphs
bad_signature;
# custom tests - see WRITING HEADERS and SCORING
if (check_quotes and not references) {
$goof->{rethreaded} = groan "Callously rethreaded";
}
if (references and not check_quotes) {
$goof->{no_context} = groan "Missing context";
}
__END__
Your GnusFilter script should be installed as a mime-decoder hook for gnus.
DESCRIPTION
News::GnusFilter is a pure-Perl package for scripting an inline message filter. It adds "Gnus-Warning:" headers when presented with evidence of atypical content or otherwise nonstandard formatting for usenet messages.
News::GnusFilter should be drop-in compatible with other newsreaders that are capable of filtering a usenet posting through an external application prior to display. See the CONFIGURATION section below for descriptions of tunable parameters, and the MESSAGE TESTS section for descriptions of the exported subroutines.
The strange yet powerful correlation between usenet cluelessness and bunk-peddling is best summarised in the following quote:
"Opinions may of course differ on this topic, but wouldn't it be better to persuade the hon. Usenaut, as a first priority, to post accurate information, before persuading them to abandon this remarkably accurate indicator of usenet bogosity?"
-- Alan Flavell in comp.lang.perl.misc
CONFIGURATION
Lisp for .gnus File
(add-hook 'gnus-article-decode-hook '(lambda ()
(gnus-article-decode-charset)
(let ((coding-system-for-read last-coding-system-used)
(coding-system-for-write last-coding-system-used))
(call-process-region (point-min) (point-max)
"/path/to/gnusfilter" t (current-buffer))
)))
The recommended installation path for your script is ~/.gnusfilter.
General Parameters and Exported Symbols
These are the export lists for News::GnusFilter. See the Export manpage for more details.
my %parameters =
(
HEADER => "Gnus-Warning", # header added
NSLOOKUP => "nslookup", # '' avoids DNS lookups
PASSTHRU_BYTES => 8192, # filter disabled
LINE_LEN => 80, # columns
EGO => 10, # self-ref's in new text
TOLERANCE => 50, # % quoted text
MAX_CONTROL => 5, # control chars
MIN_LINES => 20, # short posts are OK
SIG_LINES => 4, # acceptable sig lines
NEWSGROUPS => 2, # spam cutoff
FBI => 100, # tolerable bogosity level
VERBOSE => 0, # toggles debugging output
);
@EXPORT_OK = keys %parameters;
%EXPORT_TAGS = (
params => \@EXPORT_OK,
tests => [
qw/
missing_headers bogus_address
annoying_subject cross_post
lines_too_long control_characters
miswrapped check_quotes
jeopardy_quoted misattribution
bad_signature mimes
/
],
);
@EXPORT = (
@{$EXPORT_TAGS{tests}},
qw/
groan groanf
lines references newsgroups head body paragraphs sig
/
);
Import Options
By default, GnusFilter exports all the standard :tests
. It also provides access to the message itself via the head()
, body()
, lines()
, paragraphs()
, and sig()
functions. See WRITING HEADERS and SCORING for details on groan()
and groanf()
.
If you need to tune some of the parameters, they are not exported by default, so you can import them either by name or all at once with the :params
tag:
use News::GnusFilter qw/ :tests :params /;
FBI = 200; # raise tolerable bogosity level to 200
VERBOSE = 1; # enable debugging output
HEADER = "X-Filter";
...
The parameters are exported as lvalued subs, and is the only place where this module uses special features of perl 5.6+.
WRITING HEADERS and SCORING
groan, groanf
groan()
and groanf()
are the analogs of print and printf, and are exported by default. The value of the warning header may be changed globally via HEADER:
HEADER="X-Format-Warning"; # overrides default "Gnus-Warning"
groan "mycheck failed" unless mycheck(body);
Default Score Settings
These settings are modifiable through the set_score
sub. See the description in Scoring API below for details.
# scoring parameters
my %goof; # counts occurrence of each error type
my %weight = # error type => default score
( # typical range of %goof value:
totalquote => 100, #
jeopardy_quoted => 80, # boolean (0-1)
misattribution => 60, #
lines_too_long => 50, #
missing_headers => 50, # 0-2
mime_crap => 40, # 0-3? :
annoying_subject => 40, # ~0-4
cross_post => 30, # 0,~2-4
bogus_address => 30, # 0-3 : 822, dns
miswrapped => 30, # ~0-5 : lines (up to 5)
control_chars => 20, # 0-5 : up to 5 chars
ego => 5, # 0,~10-20 : I me my count
overquoted => 2, # 0-50 : percentage over TOLERANCE
bad_signature => 2, # 0,5-20 : lines
code => -5, # 0,~10-30
);
# set_score - scripter's interface to %goof and %weight
sub set_score {
my $href = pop @_;
# override weight table
@weight{ keys %$href } = values %$href if ref $href;
return bless \%goof;
}
# score - returns Flavell Bogosity Index
sub score {
my $score = 0;
$score += $goof{$_} * $weight{$_}
for grep {exists $weight{$_}} keys %goof;
return $score;
}
Scoring API - set_score, score
set_score()
provides access to the %goof
and %weight
hashes, which form the basis of the Flavell Bogosity Index calculator score()
. The SYNOPSIS contains a sample usage.
score()
calculates the current bogosity index based on the rules applied so far. Neither set_score
nor score
are importable, so script writers should use OO-like syntax or their package-qualified names.
Note: GnusFilter is not an OO package- although set_score()
returns a blessed reference to %goof
, the final automatic score()
calculation is not OO. However, if necessary it can be disabled by setting FBI = 0
in your script.
use News::GnusFilter qw/:tests FBI/;
FBI = 0;
MESSAGE TESTS
These are the exported functions that form the basis of a GnusFilter script. These functions are memoized to avoid repeat warnings and overscoring.
- misattribution
-
Checks for proper attribution in quoted text.
- cross_post
-
Warns of newsgroup spamming (level determined by
NEWSGROUPS
). On an original post, it returns total number of posted groups, on followups it just returns 1. - bogus_address
-
Validates the Reply-To: (or From:, if not present) header using rfc822 and a dns lookup on the domain. Setting
NSLOOKUP
to a false value will disable the dns lookup- otherwiseNSLOOKUP
should point to the location of your nslookup(8) binary. - control_characters
-
Look for control characters in the message body. returns their number (up to
MAX_CONTROL
). - lines_too_long
-
Check for oversized lines as set by
LINE_LEN
. The return value is boolean. - missing_headers
-
Verifies existence of Subject: and References: header as necessary.
- miswrapped
-
Tests for miswrapped lines in quoted and regular text. Returns number of occurrences, which may be excessive for things like posted logfiles.
- jeopardy_quoted
-
Tests for upside-down posting style (newsgroup replies should follow quoted text, not vice-versa). return value is boolean.
- check_quotes
-
Overtaxed sub that checks for overquoted messages. Also looks for over-opinionated text (too many I's) and lots of code (oft considered a good thing :). In scalar context, it returns the total number of quoted lines. Resulting warnings are subject to
VERBOSE
,MIN_LINES
,EGO
, andTOLERANCE
settings. - bad_signature
-
Checks for standard signature block. If the lines exceed
SIG_LINES
, it returns the number of lines in signature (up to 20). Otherwise returns 0.+10 is added to the return value for nonstandard sig sep's.
- attribution
-
Looks for the attribution text preceding the quoted text and returns it.
- annoying_subject
-
Complains if the subject contains useless words in it. Returns the number of faux pas if this is an original post, otherwise returns a false value for followups.
my @patterns = ( qr/ ( [?!]{3,} ) /x, qr/ ( HELP ) /x, qr/ ( PLEASE ) /x, qr/ (NEWB[IE]{2})/xi, qr/ ( GURU ) /xi, );
- mimes
-
Warns if the message is MIME-encoded.
BUGS
Terribly slow on large messages.
Etiquette rules may need adjusting for normal e-mail.
Does not (currently) look for quoted sigs
manually wrapped logfiles are heavily penalized
some context sensitive stuff (original, request, newsgroup, mail) is wrong
uses the
my $x if 0;
trick.
NOTES
Return values, default settings, and especially regexps are subject to change. Please send bug reports and patches to the author.
AUTHOR
Joe Schaefer <joe+cpan@sunstarsys.com>. This package borrows heavily from Tom Christiansen's msgchk script.
COPYRIGHT
Copyright 2001 Joe Schaefer. This code is free software; it is freely modifiable and redistributable under the same terms as Perl itself.