From spamassassin-talk-admin@lists.sourceforge.net Wed Aug 21 13:08:21 2002
Return-Path: <spamassassin-talk-admin@example.sourceforge.net>
Delivered-To: yyyy@localhost.netnoteinc.com
Received: from localhost (localhost [127.0.0.1])
by phobos.labs.netnoteinc.com (Postfix) with ESMTP id 97A1C43C36
for <jm@localhost>; Wed, 21 Aug 2002 08:08:14 -0400 (EDT)
Received: from phobos [127.0.0.1]
by localhost with IMAP (fetchmail-5.9.0)
for jm@localhost (single-drop); Wed, 21 Aug 2002 13:08:14 +0100 (IST)
Received: from usw-sf-list2.sourceforge.net (usw-sf-fw2.sourceforge.net
[216.136.171.252]) by dogma.slashnull.org (8.11.6/8.11.6) with ESMTP id
g7LC87Z23405 for <jm-sa@jmason.org>; Wed, 21 Aug 2002 13:08:07 +0100
Received: from usw-sf-list1-b.sourceforge.net ([10.3.1.13]
helo=usw-sf-list1.sourceforge.net) by usw-sf-list2.sourceforge.net with
esmtp (Exim 3.31-VA-mm2 #1 (Debian)) id 17hUGM-0006px-00; Wed,
21 Aug 2002 05:07:02 -0700
Received: from line-zh-102-185.adsl.econophone.ch ([212.53.102.185]
helo=dragon.roe.ch) by usw-sf-list1.sourceforge.net with esmtp (Cipher
TLSv1:DES-CBC3-SHA:168) (Exim 3.31-VA-mm2 #1 (Debian)) id 17hUF5-0002Iz-00
for <spamassassin-talk@lists.sourceforge.net>; Wed, 21 Aug 2002 05:05:43
-0700
Received: from roe by dragon.roe.ch with LOCAL id 17hUEr-0007Qt-00 for
spamassassin-talk@lists.sourceforge.net; Wed, 21 Aug 2002 14:05:29 +0200
From: Daniel Roethlisberger <daniel@roe.ch>
To: SAtalk <spamassassin-talk@example.sourceforge.net>
Message-Id: <20020821120529.GA27260@dragon.roe.ch>
Mail-Followup-To: SAtalk <spamassassin-talk@example.sourceforge.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.4i
Subject: [SAtalk] German spam corpus / foreign language spam
Sender: spamassassin-talk-admin@example.sourceforge.net
Errors-To: spamassassin-talk-admin@example.sourceforge.net
X-Beenthere: spamassassin-talk@example.sourceforge.net
X-Mailman-Version: 2.0.9-sf.net
Precedence: bulk
List-Help: <mailto:spamassassin-talk-request@example.sourceforge.net?subject=help>
List-Post: <mailto:spamassassin-talk@example.sourceforge.net>
List-Subscribe: <https://example.sourceforge.net/lists/listinfo/spamassassin-talk>,
<mailto:spamassassin-talk-request@lists.sourceforge.net?subject=subscribe>
List-Id: Talk about SpamAssassin <spamassassin-talk.example.sourceforge.net>
List-Unsubscribe: <https://example.sourceforge.net/lists/listinfo/spamassassin-talk>,
<mailto:spamassassin-talk-request@lists.sourceforge.net?subject=unsubscribe>
List-Archive: <http://www.geocrawler.com/redir-sf.php3?list=spamassassin-talk>
X-Original-Date: Wed, 21 Aug 2002 14:05:29 +0200
Date: Wed, 21 Aug 2002 14:05:29 +0200
I've been lurking the SA lists since I installed SA on a production
machine a while back. While SA did a surprisingly accurate job on
detecting English language spam, it did not succeed very well on German
language spam, which I keep getting increasing amounts of lately. I've
got a lousy results with out of the box scores, very few spam is acually
cought.
What is the strategy with respect to foreign language spam recognition
in SA? I've seen extremely few non-english rules. Is there foreign
language rule development going on? Has anybody done work on German
spam?
In any case, I've started spam/nonspam corpi consisting of only German
(and Swiss-German, respectively) messages, to be able to help with
German rules. Anybody willing to contribute to the corpus feel free to
resend/bounce German spam in a sane way to spam@roe.ch . I cannot be
bothered to subscribe to SAsightings just for the odd German spam every
hundred++ messages.. how about a list for foreign language spam
sightings?
Has anybody done this before or am I on the edge of duplicating effort
here?
I've been thinking on this a bit. I think it would be best if there
would be general provisions for foreign language rules. In the spirit of
the ok_languages option; let users easily enable or disable rules in
certain languages. Like a foreign_rules option which could be used to
control which foreign rulesets are active. Usually people would want to
use checks in all languages which are in the ok_languages list.
Is there any development or are there plans along those lines? Are there
other people willing to contribute to effective spam filtering rules in
German language?
Any kind of feedback is welcome, even flames ;)
Cheers,
Dan
--
Daniel Roethlisberger <daniel@roe.ch>
OpenPGP key id 0x804A06B1 (1024/4096 DSA/ElGamal)
144D 6A5E 0C88 E5D7 0775 FCFD 3974 0E98 804A 06B1
>> privacy through technology, not legislation <<
-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
_______________________________________________
Spamassassin-talk mailing list
Spamassassin-talk@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk