# Spamassassin - Bayes help needed.  [SOLVED]

## wetkitty

First the  numbers

SA version 2.63

Mail server installed using Sabrex's how to:

https://forums.gentoo.org/viewtopic-t-171499-highlight-spamassassin+qmail.html

local.cf  (relevant parts anyway ):

```
# Text to prepend to subject if rewrite_subject is used

subject_tag *****SPAM*****

report_header 1

# Encapsulate spam in an attachment

report_safe 1

add_header all Status _YESNO_, hits=_HITS_ required=_REQD_ tests=_TESTS_ autolearn=_AUTOLEARN_ version=_VERSION_

# Use terse version of the spam report

use_terse_report 0

# Enable the Bayes system

use_bayes               1

bayes_min_ham_num 200

bayes_min_spam_num 200

bayes_use_hapaxes 1

# Enable Bayes auto-learning

auto_learn              1

auto_learn_threshold_nonspam    1.0

auto_learn_threshold_spam       7.0

bayes_path      /root/.spamassassin/bayes
```

spamassassin -D --lint outputs the following regarding bayes:

```

debug: using "/usr/share/spamassassin" for default rules dir

debug: using "/etc/mail/spamassassin" for site rules dir

debug: using "/root/.spamassassin" for user state dir

debug: using "/root/.spamassassin/user_prefs" for user prefs file

debug: bayes: 23462 tie-ing to DB file R/O /root/.spamassassin/bayes_toks

debug: bayes: 23462 tie-ing to DB file R/O /root/.spamassassin/bayes_seen

debug: bayes: found bayes db version 2

debug: Score set 3 chosen.

debug: Initialising learner

debug: Loading languages file...

debug: Language possibly: en,sco

debug: is Net::DNS::Resolver available? yes

debug: trying (3) amazon.de...

debug: looking up MX for 'amazon.de'

debug: MX for 'amazon.de' exists? 1

debug: MX lookup of amazon.de succeeded => Dns available (set dns_available to hardcode)

debug: is DNS available? 1

debug: all '*From' addrs: ignore@compiling.spamassassin.taint.org

debug: running header regexp tests; score so far=0

debug: running body-text per-line regexp tests; score so far=2.077

debug: bayes corpus size: nspam = 2195, nham = 2557

debug: uri tests: Done uriRE

debug: tokenize: header tokens for *F = "U*ignore D*compiling.spamassassin.taint.org D*spamassassin.taint.org D*taint.org D*org"

debug: tokenize: header tokens for *m = " 1112896121 lint_rules "

debug: bayes token 'somewhat' => 0.0919180934020199

debug: bayes: score = 0.0919180934020198

debug: bayes: 23462 untie-ing

debug: bayes: 23462 untie-ing db_toks

debug: bayes: 23462 untie-ing db_seen
```

and

```

debug: running meta tests; score so far=4.984

debug: is spam? score=3.46 required=5.5 tests=BAYES_01,DATE_MISSING,DCC_CHECK,NO_REAL_NAME

```

Notice that bayes is used and weighting on a rule caused a shift in the score - this is how I would like it to work for real, but notice the headers from mail  processed by spamd using the same config:

```

X-Spam-Status: Yes, hits=33.9 required=5.5

X-Spam-Level: +++++++++++++++++++++++++++++++++

X-Spam-Report: SA TESTS

     1.1 SARE_HEAD_HDR_XSPAM Message headers used which identify spam

     2.5 MANGLED_SOMA BODY: mangled Soma

     0.6 J_CHICKENPOX_32 BODY: 3alpha-pock-2alpha

     2.3 MANGLED_PHRMCY BODY: mangled pharmacy

     2.3 MANGLED_AFFORD BODY: mangled affordable

     0.1 SAVE_UP_TO BODY: Save Up To

     2.5 MANGLED_CIALIS BODY: mangled Cialis

     2.3 MANGLED_ONLINE BODY: mangled online

     2.5 MANGLED_AMBIEN BODY: mangled ambien

     2.5 MANGLED_XANAX BODY: mangled xanax

     0.6 J_CHICKENPOX_36 BODY: 3alpha-pock-6alpha

     0.6 J_CHICKENPOX_12 BODY: 1alpha-pock-2alpha

     1.8 RAZOR2_CF_RANGE_51_100 BODY: Razor2 gives confidence between 51 and 100

     [cf: 100]

     1.1 MIME_BASE64_TEXT RAW: Message text disguised using base64 encoding

     0.9 RAZOR2_CHECK Listed in Razor2 (http://razor.sf.net/)

     1.8 DCC_CHECK Listed in DCC (http://rhyolite.com/anti-spam/dcc/)

     0.1 RCVD_IN_NJABL RBL: Received via a relay in dnsbl.njabl.org

     [80.134.75.72 listed in dnsbl.njabl.org]

     1.5 DRUGS_ERECTILE_OBFU Obfuscated reference to an erectile drug

     1.0 DRUGS_ERECTILE Refers to an erectile drug

     1.0 DRUGS_ANXIETY_OBFU Obfuscated reference to an anxiety control drug

     0.0 DRUGS_SLEEP Refers to a sleep aid drug

     0.0 DRUGS_ANXIETY Refers to an anxiety control drug

     0.0 DRUGS_MUSCLE Refers to a muscle relaxant

     2.2 SARE_MULT_RATW_02 Spammer sign in headers

     1.0 DRUGS_ANXIETY_EREC Refers to both an erectile and an anxiety drug

     0.5 DRUGS_SLEEP_EREC Refers to both an erectile and a sleep aid drug

     1.0 DRUGS_MANYKINDS Refers to at least four kinds of drugs
```

No reference to bayes - ever.

So my question is:

Is the bayes being used by spamd - if so where and how, if not what needs to be done to get it working like the test run?

----------

## wetkitty

Just a bump to see if there are any Spamassassin guru's out there today.

----------

## giant

Hmm how long is your server running ?

What kind of mail traffic are we talking about ?

Are you using some sort of autolearn for missed spam mails ?

I don't see anything wrong in your config ...

If bayes is working you should see somehting like that:

```

0.0 HTML_MESSAGE           BODY: HTML included in message

   3.0 HTML_IMAGE_ONLY_08     BODY: HTML: images with 400-800 bytes of words

   0.2 MIME_HTML_ONLY         BODY: Message only has text/html MIME parts

   -2.6 BAYES_00               BODY: Bayesian spam probability is 0 to 1%

   [score: 0.0000]

   0.0 MIME_QP_LONG_LINE      RAW: Quoted-printable line longer than 76 chars

   -0.2 AWL                    AWL: From: address is in the auto white-list

```

You are sure that you start spamd with the right /etc/conf.d/spamd settings ?

In my case I have a special user set up where I store the bayes dbs - with a new setup I am testing the mysql storage.

The owner / rights on your bayes path are correct - if spamd cannot write to those files it won't work ...

Just a couple thoughts ....

----------

## wetkitty

```
Hmm how long is your server running ? 
```

Uptime? - Can't seem to get more than six months, always ends up getting shutdown to move to a different rack or some such thing.

This particular mail config is at least a year old.  After originally setting it up Spamassassn appeared to be working great - it wasn't until I started looking to improve the performance I noticed the missing Bayes.

```
What kind of mail traffic are we talking about ? 
```

20,000 messages a month - expecting it to double in a few months.

```
Are you using some sort of autolearn for missed spam mails ? 
```

A trusted friend who is (was) receiving lots of spam is using Thunderbird along with my IMAP server.  Thunderbird drops spam into its junk folder (plus any he manually tags).  A cron job runs salearn against that junk folder and restarts spamd.

That and autowhitelist and autolearn are enabled in the configs.  ( I can see autowhitelist working properly )

```
If bayes is working you should see somehting like that:

Code:

0.0 HTML_MESSAGE           BODY: HTML included in message

   3.0 HTML_IMAGE_ONLY_08     BODY: HTML: images with 400-800 bytes of words

   0.2 MIME_HTML_ONLY         BODY: Message only has text/html MIME parts

   -2.6 BAYES_00               BODY: Bayesian spam probability is 0 to 1%

   [score: 0.0000]

   0.0 MIME_QP_LONG_LINE      RAW: Quoted-printable line longer than 76 chars

   -0.2 AWL                    AWL: From: address is in the auto white-list
```

That is exactly what I'm missing.

```
You are sure that you start spamd with the right /etc/conf.d/spamd settings ? 
```

Just one option there:

```
# Config file for /etc/init.d/spamd

# Some options:

#

# -a for auto-white-list

# -c to create a per user configuration file

# -L if you want to suppress DNS lookup

# -u USER to run as a user other than root

#

# for more help look in man spamd

SPAMD_OPTS="-a"

```

```
In my case I have a special user set up where I store the bayes dbs - with a new setup I am testing the mysql storage.

The owner / rights on your bayes path are correct - if spamd cannot write to those files it won't work ... 
```

Well, it uses the dbs when running 'spamassassin -D --lint', but it also scores using bayes when running 'spamassassin -D --lint'.  There is something different between running 'spamassassin -D --lint' and '/etc/init.d/spamd start'.  After reading your post I'm going to start looking for a user or permission difference between the two.

Any other thoughts will be appreciated - I'll post back with any success (or failure)

Thanks

----------

## Ateo

Are the parameters [for local.cf] bayes_auto_learn_threshold_nonspam and/or bayes_auto_learn_threshold_spam an option in SA 2.63? If so, you probably want to set those.

----------

## wetkitty

```
Are the parameters [for local.cf] bayes_auto_learn_threshold_nonspam and/or bayes_auto_learn_threshold_spam an option in SA 2.63? If so, you probably want to set those.
```

Yes, those are set to 1 and 7 respectively.

I did find a solution though!  And it is related to permissions.  I added debugging (-D)  to the /etc/conf.d/spamd and after reviewing the logs found that it was unable to read /root/.spamassassin.

It would seem  that running 'spamassassin -D --lint' as root works becuase it is always root, however the actual running version drops to a lower privileged user after starting.  So changing the file permissions to

```
drwxrwx---   2 root qscand   176 Apr 14 14:55 .spamassassin
```

solved everything.

I would be curious to know if there are any security issues with having those permissions though?

----------

## FastTurtle

I'm thinking that you may want to look into giving SA ownership of the file instead of changing perms as it should be safer

----------

## Ateo

 *wetkitty wrote:*   

> 
> 
> ```
> Are the parameters [for local.cf] bayes_auto_learn_threshold_nonspam and/or bayes_auto_learn_threshold_spam an option in SA 2.63? If so, you probably want to set those.
> ```
> ...

 

What were the permissions before hand? Giving write permissions to the group is something I, personally, try to avoid.

----------

## giant

Glad it worked  :Smile: 

Just to wrap this up. This is my conf:

```

cat /etc/conf.d/spamd

# Config file for /etc/init.d/spamd

SPAMD_OPTS="-x -u spamd  -H /home/spamd"

```

Disables per User config and runs with user spamd and stored everything under /home/spamd

Which looks like this then

```

spamd # ls -l /home/spamd/

total 9288

-rw-------  1 spamd spamd   38304 Apr 15 09:57 bayes_journal

-rw-------  1 spamd spamd 5210112 Apr 15 09:57 bayes_seen

-rw-------  1 spamd spamd 5210112 Apr 15 09:57 bayes_toks

```

This is the production server. On my test sever I am testing a setup using mysql. Or better a combination of amavisd with Maia Mailguard.

----------

