# What spamfilter do you recommend?

## fritte

I've just migrated from Debian where I used spamoracle and KMail to read my mail and filter out spam. Spamoracle seems to not be available in Gentoo, are there any equivalents? Does anyone have a recommendation for a good spam filter package that works with KMail?

Thanks!

----------

## neenee

i recommend spamassassin; there are some

very recent posts about spamassassin. try

searching for them.

good luck.

----------

## fritte

Ok, spamassasin looks good. What is the package name? Is it 'mail-spamassasin'?

----------

## neenee

Mail-SpamAssassin

----------

## fritte

Oh my GOD! Spamassassin takes 30 SECONDS to classify ONE email! This is incredible. Spamoracle did that in a *fraction* of a second.

Have I done something wrong? Shouldn't it be faster that this?

----------

## delta407

SpamAssassin checks a bunch of blacklists, does DNS resolution, asks Vipul's Razor to check its database, DCC, and so on. Most of it is network activity; the pattern matching rules and Bayesian filtering should take less than 2 seconds of CPU time.

If you're filtering a lot of mail, you can lose quite a bit of efficiency by loading and unloading a Perl interpreter with each mail. That's what spamd/spamc was designed to avoid -- check the man pages for how to keep SpamAssassin resident in RAM (which speeds things up).

----------

## fritte

Score: +5, informative!

Thanks. I'll go ahead and run it via spamd instead and see what happens.

----------

## Htwo

if your connection is slow the network test may be doing it. 

add "local_tests_only 1" to /etc/mail/spamassassin/local.cf

----------

## fritte

Ok, done that, still takes 30 seconds to classify one single email.

I guess spamassassin is awesome for keeping things 100% clean, but I don't really need that. I can take a spam email that slips through now and then, but I *can't* take that KMail freezez for 30 seconds every time there's an incoming email.

Any information on how to reconfigure spamassassin to achieve this?

----------

## fritte

 *Htwo wrote:*   

> if your connection is slow the network test may be doing it. 
> 
> add "local_tests_only 1" to /etc/mail/spamassassin/local.cf

 

Sorry! Didn't see this until now. I've just done that and restarted the daemon (spamd) and it *still* takes 30 seconds to classify one single email. And I'm not on a very slow connection either, unless 10Mbit pure ethernet (no DSL) is considered slow.

----------

## dsegel

I've never noticed SpamAssassin taking any time at all to filter messages, so clearly there's something strange going on on your system. 

Unfortunately I can't help you figure out what it is.  :Sad: 

One thing to check - they recently downgraded SA from version 2.60 back to 2.55. If you're running the 2.6 version maybe there's something wrong with it. If you emerge sync and re-emerge SA it will bring you back to 2.55.

----------

## Unne

SpamAssassin also takes about 30 seconds per email on my machine.  I don't really care, since all my mail gets fetched as a cron job every 5 minutes, but what's more, it's missed plenty of spam already in the 5 days I've had it.  Not to mention that stupid osirusoft thing that caused plenty of legit emails to be filtered as spam.  I guess I had higher expectations for SpamAssassin.  Or else I haven't configured it properly.  I run 2.55-r1 by the way.  I don't run it as a daemon though, so maybe that's it.  Not important to me anyways, like I said.

----------

## splooge

Be a geek.  Run your own mail server.  It's the cool way to do things.  Then you get the smtp server to run it through spamassassin instead of your client.  Yeah!

----------

## Regor

SpamAssassin also missed many spams for me while I was running it. I have since switched to spambayes, which I've been really happy with.

After a few days of training my false negative rate is <= 1/day with no false positives.

Unfortunately, it's not in portage, but installing it manually isn't much of a chore. Check it out at http://spambayes.sourceforge.net/

----------

## fritte

Ok, the fact of the matter seems to be that for some people it takes 30 seconds. For some it doesn't, it takes 2-3 seconds instead. Right? I don't mind if classifying an email takes 30 seconds as long as it doesn't lock up KMail during that time, which it does.

I'll have to walk down that road and put it in verbose logging mode (how to do that?) and see what the f**k is going on here.

----------

## Janne Pikkarainen

Maybe Perl has UTF8 support compiled in for those who see SpamAssassin performing poorly? At least with Red Hat 9.0 that's the case... in that case you could try

LANG="en_US"

or whatever language you prefer, just don't put ".utf8" extension after it.

----------

## pilla

SpamAssassin with -local flags does not take 30 seconds in my machine for sure. It was quite fast, but right now I am using bogofilter with evolution

Search google and you'll find a good tutorial on integrating both.

----------

## feardapenguin

Ironic that this discussion is going on today.  I was just looking at Spam-Assassin and decided to see if there were any comments on the forums.

I've been quite satisfied with my procmail recipe but I need to look into a Bayesian filter.  Seems that Spam-Assassin is pretty popular.

Today's filter stats so far:

Total:                60

Non-spam:          7

Missed-spam:      8  (mostly base-64's)

Filtered Spam:   45

Not too bad, but I'm sure a good Bayesian could do much better.

Procmail is very flexible (I actually enjoy getting creative with regexp) but it is unable to handle some of the more tricky techniques used today.  The other downside is that it requires constant updating and maintenance.

----------

## ronmon

I've been using mailfilter for a couple years. In fact I used to maintain the slackware packages for it. Like feardapenguin's setup, it is a pain to keep current and it only checks headers, not content. The really nice thing about it is that it deletes the spam from the POP server before it is downloaded to my machine.

Since I have bogofilter running with sylpheed-claws now I have to let some spam in to train it. Then I think I'll go back to using mailfilter with bogofilter hopefully catching whatever sneaks in. There's nothing like a good layered defense when the hordes attack.

----------

## Klavs

I use SpamAssassin, and it performs very well in my setup.

What I do is that I run my own mailserver (PostFix) and use amavisd-new (I'm working on an ebuild for this - but there's not make or configure scripts, so I've been delaying the tedious work  :Smile:  - SpamAssassin doesn't even take a  second to classify my email (on a Via C3 800mhz):

Aug 29 09:09:26 [amavis] (02542-03) Passed, <some-newsletter-emailaddr> -> <my-addr>, Message-ID: <20030829064528.537C613E73@thesending.mailserver.tld>, Hits: -3.6

Aug 29 09:09:26 [postfix/smtp]

I don't use RBL and razor - only bayes.

If you get POP mail f.ex., you can use f.ex. fetchmail to retrieve the email and deliver it to your local mailserver and get it working that way, without delaying your kmail klient.

and p.s. a Good tip - don't use mbox, unless you have a very small mailbox.

I have 10k+ emails and mbox would have killed me (and my mailclient)

----------

## pilla

 *fritte wrote:*   

> Ok, done that, still takes 30 seconds to classify one single email.
> 
> I guess spamassassin is awesome for keeping things 100% clean, but I don't really need that. I can take a spam email that slips through now and then, but I *can't* take that KMail freezez for 30 seconds every time there's an incoming email.
> 
> Any information on how to reconfigure spamassassin to achieve this?

 

It cannot reach 100% of all SPAM. We have SpamAssassin in the server but I still catch some spam with bogofilter in my evolution client.

----------

## fritte

Ok, I'm going to need more assistance on this. Here goes:

My SpamAssassin version is 2.55-r1

The following commands returns this:

```

$> echo $LANG

$>
```

(nothing)

```
$> env | grep LANG

$>
```

(nothing)

```
$> locale

LANG=POSIX   <--------- !!!

LC_CTYPE="POSIX"

LC_NUMERIC="POSIX"

LC_TIME="POSIX"

LC_COLLATE="POSIX"

LC_MONETARY="POSIX"

LC_MESSAGES="POSIX"

LC_PAPER="POSIX"

LC_NAME="POSIX"

LC_ADDRESS="POSIX"

LC_TELEPHONE="POSIX"

LC_MEASUREMENT="POSIX"

LC_IDENTIFICATION="POSIX"

LC_ALL=
```

$>

So, LANG seems to have some kind of setting, and that setting is POSIX. Have I interpreted things correct? If so, how to I turn this utf8 thing off in this case, since it doesn't even seem to be on??

Thanks for all help you've been providing!

----------

## eyevee99

I've just decided to install spambayes...  How do I set it up to run when I log in?  Does it have to run as root etc?

Wouldn't each user have to run their own version?

Cheers,

ryan

----------

## isomer

Give POPFile a shot. Been working very well for me. I don't think it's in portage, but it's simple to install and run, even without root priveleges.

----------

## churcher

Good tip on bogofilter and Evolution, just set that up.  Seems to be working ok, though I guess it will take awhile to 'train'.  Plus, it's in portage!

----------

## bisho

For a multiuser system the best one is DSPAM. It uses bayesian filter improved (not only analizes word statistics but sentences as well), each user has its own data, the data is cleaned with unused words to keep the databases clean, provides a reply to an address based training, etc.

For one person I use bogofilter, because its really fast and I have a bunch of mail & spam samples to teach the filter and get it working well fast.

Spamassasain could be a good option to start, to catch emails with the rules until the bayesian filter is trained. At first it would be slow, but at the end you could disable all the rules but bayesian when is trained enought.

----------

## Regor

 *eyevee99 wrote:*   

> I've just decided to install spambayes...  How do I set it up to run when I log in?  Does it have to run as root etc?
> 
> Wouldn't each user have to run their own version?
> 
> Cheers,
> ...

 

How you run it depends on what your needs are. Personally, I invoke it via procmail since I'm already using it to parse mail into different inboxes.

There are examples for different situations on the Spambayes webpage, use them as a guide.

And although I imagine you could set it up to run system-wide with one spam database, it would be difficult to train since what one user might consider legitimate email might well look like spam to another so it's unlikely that one ruleset would fit all.

----------

## neenee

i run spamassassin in combination with razor,

which seems to ctach 100% over a persiod

of a week so far (30+ mails per day, about

10% is spam).

my only problem with it is, is that it takes

10+ seconds to verify one email.

----------

## gOA-pSY

simply add "skip_rbl_checks 1" to your /etc/mail/spamassassin/local.cf and it only needs ~1sec for each mail!

----------

## neenee

thank you for the suggestion  :Smile: 

(i will first check what that setting

does though).

*update*

ok. that skips the real time black-

list.

----------

## pfft

/var/qmail/supervise/qmail-smtpd/run

#!/bin/sh

QMAILDUID=`id -u qmaild`

NOFILESGID=`id -g qmaild`

exec /usr/bin/softlimit -m 8000000 \

        /usr/bin/tcpserver -v -p -R -x /etc/tcp.smtp.cdb \

        -u $QMAILDUID -g $NOFILESGID 0 smtp \

        /usr/bin/rblsmtpd -b -r relays.ordb.org \

        -r dev.null.dk \

        -r dnsbl.njabl.org \

        -r qmail.bondedsender.org \

        -r cbl.abuseat.org \

        -r dnsbl.delink.net \

        -r blackholes.easynet.nl \

        -r dnsbl.sorbs.net \

        -r bl.technovision.dk \

        -r vox.schpider.com \

         /var/qmail/bin/qmail-smtpd \

         /bin/checkpasword /bin/true 2>&1

;=)

----------

## Mirrorball

I used to use spamassassin but the program wasn't able to detect some spam I was receiving a lot of lately, about viagra and other drugs, because the spammers were using bad HTML and wrong spelling to avoid detection. I now use POPFile and it's doing a very good job. It's the program I recommend at the moment.

----------

## nsahoo

what's wrong with using thunderbird or mozilla-mail instead? it has a bayesian filter.

----------

## Chris W

Just for those wondering why SpamAssassin (SA) misses quite a few.  After install SA uses only the rules to determine spam-ness.  The Bayesian filter in SA doesn't kick in until it has seen enough spam.  It trains itself based on the spammiest stuff it sees (based only on the rules) and deliberately avoids anything that could be a false positive (i.e. low scoring spam).   You can accelerate the process by using the sa-learn utility on stuff you deliberately categorise as spam or ham.  Once the Bayesian stuff kicks in the false negatives largely dry up.  I get one or two false negatives, and no false positives, from every week (approx 200 spams).

Looking at my logs for today: the longest spam processing took 11 seconds, with all the others (50 odd) under 4 seconds.  This runs on a Pentium II 300MHz with all thr RBL and other checks.

KMail hanging during the filtering process is not SA's fault.  If the program were suitably multithreaded the UI responsiveness would remain.[/quote]

----------

## neenee

that's good info  :Wink: 

i would like to add to that by giving the tip of creating

a cronjob for sa-learn, such as this for evolution:

0 0 * * * sa-learn --mbox --spam /home/neenee/evolution/local/Spam

which will learn spam from your evolution Spam

folder in mbox format every day at midnight.

i used to do this manually from my window-managers'

popup window, but this works just fine as well.

perhaps a more space-efficient way would be to

have sa-learn learn spam once every week and

then cleaning out the spam folder since it has al-

ready learnt that spam.

good luck  :Wink: 

----------

## asimon

I too like SpamAssassin very much. It was easy to setup (much more easier than for example dspam or crm114) and works very good here. I don't use the online tests, only local stuff and Bayesian filter. With the Bayesian filter SpamAssassin is very efficient (most comparisions I found on the web are based on rather old SA versions which did'nt had the bayesian filter, thus it's no wonder why in those tests bogofilter, dspam, etc. are much better than SA). Looking at the report header I see that every spam mail I get gets alone from the BAYES_99 rule alone enough points to be classified as spam.

But as with every Bayesian filter it's very important to constantly train SA with your spam and ham mails. From what I see autolearning (with the default parameters) alone is not enough because I at least don't get much spam with very high scores, thus for most spam the autolearner actually doesn't learn. But with a cron job doing sa-learn this is no problem. Sometimes some spam mails get's through (like lately some spam mails with many misspelled words or some fake mail from MS with some securiy update for Windows) but after feeding them to sa-learn the next ones will not get through.

What I appreciate especially is that SA after around 6000 (I use it since January) spam mails it never classified a ham mail as spam (at least as far as I can tell  :Wink:  ). This is very important.

The future of SA also looks very bright. Version 3 will include an API for plugins. Thus we will have plugins for other filters like dspam (already exists), crm114, bofofilter, etc.

On the SA mailing list there is a post where someone did some small benchmark to compare the speed of SA's bayesian filter and dspam (it's usually said that SA is very slow because it's in Perl and dspam very fast). In this test SA was only 10% slower than dspam.

----------

## mallchin

Well, I'm in a quandry.

I use amavisd-new which supports spamassassin, dpsma and razor and don't know which to use, or if it'll let me use any or all of them together. I guess just SA would be ideal, I've been using that for a while okay.

My other wonder is how I can use neenee's sa-learn cronjob to teach SA on my mail server, but my evo mail is on my workstation. How can I use SA to learn on my workstation and port the rules it's learnt to my mail server?

/me throws stones at nasty spammers, shoo shoo

----------

## BlinkEye

as you have a mail server your mail first gets to it. so there should be no problem of filtering mails there. bad thing is if you use pop and spam gets through, though, which you might be using. hmm

----------

## xiangzi

Have been using bogofilter with evolution for a week or 2 now as described here:

http://www.ime.usp.br/~rsilva/bogo-and-evo/

Trained on about 1000 spam and a bit more non-spam, but the results so far are pretty disappointing -- it's only getting 1/2 to 2/3 of the spam, and doesn't seem to be improving over time (in fact today has been it's worst day ever, more like 1/4).  But other people sound happy with it.  Did you have to tweak it to get good results, and if so, what was your recipe?

----------

## BlinkEye

the really important thing is that you don't put bad mails in the .Ham folder and train it with these. i made this mistake the first time and it didn't work out. i'd say: put less mails in you .Ham folder - but mails you're really sure of and you're set. btw.: what about the configs? i'm not sure at all how to set them (i'm using bogofilter only for some days now) but maybe you don't have them set correctly?!? i've looked for examples and explanation but haven't found anything.

----------

## xiangzi

Yeah, I was careful to expunge all the spam from the ham folder first.

----------

## BlinkEye

well, then i guess your config file's the bug. i'm really satisfied with bogofilter yet - i had only about 20 spam mails left to train but it works flawlessly

----------

## Kick

 *Chris W wrote:*   

> Looking at my logs for today: the longest spam processing took 11 seconds, with all the others (50 odd) under 4 seconds.  This runs on a Pentium II 300MHz with all thr RBL and other checks.

 

Really? How much RAM do you have?

My PII-300, take between 30s and 1mn to check any single message, with network tests *disabled*!!

Well, after further analysis, it seems that SA 3.0 is a true memory hogger. Perhaps you're talking about SA 2.60, which is indeed much more faster on my own box.

----------

