# DSPAM: retraning errors via command line fails [Solved]

## overkll

I've been using dspam for over a year now and I really like it.  It's fast and accurate.

I use squirrelmail as a alternative for users, and I've added the sqmail plugin "spam_buttons".  This enables buttons in the message list view and a links in msg read view to reverse the classification of the message from spam to innocent or innocent to spam for false positives/negatives.

I also offer up spam/notspam aliases for users to forward fp/fn's to, but that's not so popular.  I use the "ParseToHeaders on" and ChangeModeOnParse on" settings in dspam.conf which uses the dspam signatures to switch to the correct user and correct class.  No problems there.

Here's the weird thing,  if I try to cat a message to dspam for retraining, it barfs:

```
# /usr/bin/dspam --user joeblow --class=spam --source=spam < email.msg

...

...

3116: [03/12/2008 15:32:43]     Content-Type                      text/plain; charset=ISO-8859-1; format=flowed

3116: [03/12/2008 15:32:43]     Content-Transfer-Encoding         7bit

3116: [03/12/2008 15:32:43]     X-DSPAM-Result                    Innocent

3116: [03/12/2008 15:32:43]     X-DSPAM-Processed                 Wed Mar 12 14:09:27 2008

3116: [03/12/2008 15:32:43]     X-DSPAM-Confidence                0.9899

3116: [03/12/2008 15:32:43]     X-DSPAM-Probability               0.0000

3116: [03/12/2008 15:32:43]     X-DSPAM-Signature                 4,47d82a6725459544614015

                                 

                          ]     Testing

                                 

3116: [03/12/2008 15:32:43] mysql_fetch_row() failed in _ds_get_signature

3116: [03/12/2008 15:32:43] DSPAM Instance Shutdown.  Exit Code: 0
```

If I send the dspam signature, it works fine:

```
# /usr/bin/dspam --user joeblow --class=spam --source=error --signature=4,47d82a6725459544614015

...

...

3252: [03/12/2008 15:42:49] libdspam returned probability of 1.000000

3252: [03/12/2008 15:42:49] message result: SPAM

3252: [03/12/2008 15:42:49] appending header X-DSPAM-Reclassified: Spam

3252: [03/12/2008 15:42:49] appending header Subject: **SPAM**

3252: [03/12/2008 15:42:49] assembling component 0

3252: [03/12/2008 15:42:49] DSPAM Instance Shutdown.  Exit Code: 0

```

W T F ! ! !

The first examples shows that the signature is found, but for some reason there seems to be a mysql error.  This doesn't happen with the email aliases - that works perfectly.  I also tried adding "--mode=unlearn" to no avail.

Maybe I'm missing something?  Can anyone give me a clue before I go bald pulling my hair out?

Steveb ? magic919 ?Last edited by overkll on Fri Mar 14, 2008 2:45 pm; edited 1 time in total

----------

## steveb

 *overkll wrote:*   

> Here's the weird thing,  if I try to cat a message to dspam for retraining, it barfs:
> 
> ```
> # /usr/bin/dspam --user joeblow --class=spam --source=spam < email.msg
> ```
> ...

 Try:

```
/usr/bin/dspam --user joeblow --class=spam --source=error < email.msg
```

You must use either "error" or "corpus" or "inoculation" for source. A source "spam" does not exist  :Smile: 

 *overkll wrote:*   

> If I send the dspam signature, it works fine:
> 
> ```
> # /usr/bin/dspam --user joeblow --class=spam --source=error --signature=4,47d82a6725459544614015
> ```
> ...

 Here you do it right. You specify "error" as source. If you would use "spam" as source, then you would probably run into the same issue as above.

// SteveB

----------

## overkll

Thanks for responding.

That was a typo.  I did use "--source=error".  Tried it again just to make sure - same error.

```
/usr/bin/dspam --user joeblow --class=innocent --source=courpus < email.msg
```

works as well.  No surprise since it's not having to find the signature in the message header.

Very puzzling.  Any more ideas?

----------

## magic919

Are you using MySQLUIDInSignature?  If so you don't have to specify the actual user.  Try with user root, say.

----------

## overkll

Yes, I am using MySQLUIDInSignature.  That's what is puzzling.  If I retrain via the aliases, it works - dspam finds the signature, switches users and retrains ("ParseToHeaders on" and ChangeModeOnParse on") for the user in the signature.

From the error message, it looks like dspam is failing to find the signature because of a mysql error????

```
8148: [03/13/2008 12:31:47] mysql_fetch_row() failed in _ds_get_signature
```

The dspam.debug log show it's having problems with the message body:

```
...

...

8148: [03/13/2008 12:31:47] attribute MySQLDb = dspam

8148: [03/13/2008 12:31:47] attribute MySQLConnectionCache = 10

8148: [03/13/2008 12:31:47] attribute MySQLUIDInSignature = on

8148: [03/13/2008 12:31:47] attribute LocalMX = 127.0.0.1

8148: [03/13/2008 12:31:47] attribute ProcessorURLContext = on

8148: [03/13/2008 12:31:47] attribute ProcessorBias = on

' doesn't contains `:' characterde.c:365: unexpected data: header string '

8148: [03/13/2008 12:31:47] decode.c:365: unexpected data: header string 'Testin' doesn't contains `:' character

' doesn't contains `:' characterde.c:365: unexpected data: header string '

8148: [03/13/2008 12:31:47] decode.c:365: unexpected data: header string 'Let's ' doesn't contains `:' character

' doesn't contains `:' characterde.c:365: unexpected data: header string '

' doesn't contains `:' characterde.c:365: unexpected data: header string '

' doesn't contains `:' characterde.c:365: unexpected data: header string '

' doesn't contains `:' characterde.c:365: unexpected data: header string '

' doesn't contains `:' characterde.c:365: unexpected data: header string '

' doesn't contains `:' characterde.c:365: unexpected data: header string '

8148: [03/13/2008 12:31:47] scanning component 0 for a DSPAM signature

8148: [03/13/2008 12:31:47]   : Encoding     : 1

8148: [03/13/2008 12:31:47]   : Media Type   : 0

8148: [03/13/2008 12:31:47]   : Media Subtype: 0

8148: [03/13/2008 12:31:47]   : Headers:

8148: [03/13/2008 12:31:47]     Return-Path                       <>

...

...
```

Of course the message body doesn't contain ":", it's the body, not a header field and that shouldn't interfere with dspam finding the signature value in the header.  I added "showFactors=on" just to make sure there wasn't some stray character at the end of the dspam signature - no help there either

BTW, this is a message from the cyrus-imap user directory if that helps at all.

----------

## overkll

Perhaps the message format is borking the dspam message parsing function/routine?  I know that with shell scripting, "for" and "while" loops barf on values that have a space in them or anything else that the $IFS (Internal Field Seperator" misinterprets.  I know that dspam isn't using a shell script, but maybe something similar is happening.  Look at this output from the debug log:

```
...

...

8768: [03/13/2008 13:06:24]     X-DSPAM-Confidence                0.9899

8768: [03/13/2008 13:06:24]     X-DSPAM-Probability               0.0000

8768: [03/13/2008 13:06:24]     X-DSPAM-Signature                 4,47d95a6563251445574349

                   3:06:24]     Testing again.

                                 

  68: [03/13/2008 13:06:24]     Let's see if it barfs on the body.

8768: [03/13/2008 13:06:24] mysql_fetch_row() failed in _ds_get_signature

8768: [03/13/2008 13:06:24] DSPAM Instance Shutdown.  Exit Code: 0
```

Notice that the process id and timestamp are incomplete.  Looks as if dspam trys to parse the header, but the message format isn't cooperating.  Maybe there's a line-feed or some other character screwing up the parsed value of the signature????

----------

## overkll

My previous posts showed the dspam.debug log output via "tail -f /var/log/dspam/dspam.debug.  Using "less" to view the file reveals something interesting.  Same two sections failing, but less shows a "^M" character:

```
...

...

8982: [03/13/2008 13:48:40] attribute LocalMX = 127.0.0.1

8982: [03/13/2008 13:48:40] attribute ProcessorURLContext = on

8982: [03/13/2008 13:48:40] attribute ProcessorBias = on

8982: [03/13/2008 13:48:40] decode.c:365: unexpected data: header string '^M' doesn't contains `:' character

8982: [03/13/2008 13:48:40] decode.c:365: unexpected data: header string 'Here we go again.^M' doesn't contains `:' character

8982: [03/13/2008 13:48:40] decode.c:365: unexpected data: header string 'Multiple lines of text^M' doesn't contains `:' character

8982: [03/13/2008 13:48:40] decode.c:365: unexpected data: header string 'Interspersed with new paragraphs.^M' doesn't contains `:' character

8982: [03/13/2008 13:48:40] decode.c:365: unexpected data: header string '^M' doesn't contains `:' character

8982: [03/13/2008 13:48:40] decode.c:365: unexpected data: header string 'Will it puke?^M' doesn't contains `:' character

8982: [03/13/2008 13:48:40] decode.c:365: unexpected data: header string '^M' doesn't contains `:' character

8982: [03/13/2008 13:48:40] decode.c:365: unexpected data: header string 'Probably.^M' doesn't contains `:' character

8982: [03/13/2008 13:48:40] decode.c:365: unexpected data: header string 'The Bitch!!!!^M' doesn't contains `:' character

8982: [03/13/2008 13:48:40] decode.c:365: unexpected data: header string '^M' doesn't contains `:' character

8982: [03/13/2008 13:48:40] scanning component 0 for a DSPAM signature

8982: [03/13/2008 13:48:40]   : Encoding     : 0

8982: [03/13/2008 13:48:40]   : Media Type   : 0

8982: [03/13/2008 13:48:40]   : Media Subtype: 0

8982: [03/13/2008 13:48:40]   : Headers:

...

...

8982: [03/13/2008 13:48:40]     Subject                           Testing dspam again, and again, and again....

8982: [03/13/2008 13:48:40]     Content-Type                      text/plain; charset=ISO-8859-1; format=flowed

8982: [03/13/2008 13:48:40]     Content-Transfer-Encoding         7bit

8982: [03/13/2008 13:48:40]     X-DSPAM-Result                    Innocent

8982: [03/13/2008 13:48:40]     X-DSPAM-Processed                 Thu Mar 13 13:46:33 2008

8982: [03/13/2008 13:48:40]     X-DSPAM-Confidence                0.9899

8982: [03/13/2008 13:48:40]     X-DSPAM-Probability               0.0000

8982: [03/13/2008 13:48:40]     X-DSPAM-Signature                 4,47d9768989682029830544

8982: [03/13/2008 13:48:40]     ^M                                 

8982: [03/13/2008 13:48:40]     Here we go again.^M                

8982: [03/13/2008 13:48:40]     Multiple lines of text^M           

8982: [03/13/2008 13:48:40]     Interspersed with new paragraphs.^M  

8982: [03/13/2008 13:48:40]     ^M                                 

8982: [03/13/2008 13:48:40]     Will it puke?^M                    

8982: [03/13/2008 13:48:40]     ^M                                 

8982: [03/13/2008 13:48:40]     Probably.^M                        

8982: [03/13/2008 13:48:40]     The Bitch!!!!^M                    

8982: [03/13/2008 13:48:40]     ^M                                 

8982: [03/13/2008 13:48:40] mysql_fetch_row() failed in _ds_get_signature

8982: [03/13/2008 13:48:40] DSPAM Instance Shutdown.  Exit Code: 0
```

What the heck is that?  And better yet, how do I get rid of it?

----------

## magic919

^M = Carriage returns I think.

----------

## overkll

I'm not sure.  Could be newline or return, newline?  In any case, if dspam interprets the signature with the "^M" on the end, of course it won't find the signature.  Do you use cyrus-imapd?

----------

## magic919

I use Dovecot for IMAP.

http://www.google.co.uk/search?num=100&hl=en&client=firefox-a&rls=org.mozilla:en-GB:official&hs=73X&q=%5Em+character&revid=1290326884&sa=X&oi=revisions_inline&resnum=0&ct=revision&cd=1

----------

## overkll

Looks like dspam is trying to parse the body as headers.  It also appears that dspam is inserting the "^M", as they don't appear in the raw message, only the debug output.

If you cat/pipe a message directly from one of your dovecot maildirs to dspam for retraining, does it work?

----------

## magic919

I guess it is scanning the bodies for URLs as you have it configured to do so.

I retrain using a script via a cron job.  It takes the messages from my Spam folder and retrains them.  It works.

small snippet...

```

dspam $spam_opts --user $opt_u --client < $spam"

```

----------

## overkll

Tried without the ProcessorURLContext off as well.  No difference.  This is messed up!

I tried writing the message in Mozilla-Thunderbird and with Squirrelmail.  No change with Thunderbird, but the Squirrelmail message got one step closer with two addition lines before dspam barfed:

```
9501: [03/13/2008 14:59:54]     X-DSPAM-Result                    Innocent

9501: [03/13/2008 14:59:54]     X-DSPAM-Processed                 Thu Mar 13 14:53:50 2008

9501: [03/13/2008 14:59:54]     X-DSPAM-Confidence                0.9899

9501: [03/13/2008 14:59:54]     X-DSPAM-Probability               0.0000

9501: [03/13/2008 14:59:54]     X-DSPAM-Signature                 8,47d9864e93632105115549

9501: [03/13/2008 14:59:54]     ^M                                 

9501: [03/13/2008 14:59:54]     Here we go again.^M                

9501: [03/13/2008 14:59:54]     ^M                                 

9501: [03/13/2008 14:59:54]     Dammit!^M                          

9501: [03/13/2008 14:59:54]     ^M                                 

9501: [03/13/2008 14:59:54]     ^M                                 

9501: [03/13/2008 14:59:54] decoding message block from encoding type 2

9501: [03/13/2008 14:59:54] message is signed.  retaining original text for reassembly

9501: [03/13/2008 14:59:54] mysql_fetch_row() failed in _ds_get_signature

9501: [03/13/2008 14:59:54] DSPAM Instance Shutdown.  Exit Code: 0
```

So, original message is scanned and classified sucessfully.

Retraining via alias works fine.

Retraining via command line with "--signature=" option works fine.

Training corpus via command line works fine.

Command line retraining with message source fails.

Maybe I found a bug?

----------

## magic919

Maybe.  I'm sure SteveB will surface at some point and take a look  :Smile: 

----------

## overkll

Found a workaround:

Per the cat man page:

```
       -v, --show-nonprinting

              use ^ and M- notation, except for LFD and TAB
```

1) cat -v reveals a `^M' at the end of EACH AND EVERY line!

```
cat -v /var/spool/imap/j/user/joeblow/72.
```

2) Piped the output to sed and replaced the `^M' s with nothing:

```
# sed 's/\^M//g'
```

3) Piped final output to dspam

"/usr/bin/dspam --user joeblow --source=error --class=spam"

All together now:

```
cat -v /var/spool/imap/j/user/joeblow | sed 's/\^M//g' | /usr/bin/dspam --user joeblow --source=error --class=spam"
```

RESULT:

```
...

...

9586: [03/13/2008 15:19:42] reclassifying iteration 1 result: 0

9586: [03/13/2008 15:19:42] libdspam returned probability of 1.000000

9586: [03/13/2008 15:19:42] message result: SPAM

9586: [03/13/2008 15:19:42] appending header X-DSPAM-Reclassified: Spam

9586: [03/13/2008 15:19:42] assembling component 0

9586: [03/13/2008 15:19:42] DSPAM Instance Shutdown.  Exit Code: 0
```

T A D A ! ! ! !

What a freakin hassle!!!!

Thanks for your help!

----------

## magic919

I'm glad you got it sorted.  Not sure I did much more than 'listen'.  Enjoy DSPAM!

 :Smile: 

----------

## overkll

It helps when someone listens  :Smile: 

Now I'm still curious to who the culprit is - that is which app is tagging the lines with "^M".

Postfix delivers to dspam.  Tried both dspam as a postfix transport and dspam in daemon mode.  Still the ^M.

Dspam delivers to Cyrus' lmtp socket.  That's the socket listed in /etc/cyrus.conf, not the postfix supplied cyrus transport which uses the cyrus-imap "deliver" command.  The default cyrus lmtp socket listens on /var/imap/socket/lmtp.  I had to loosen up the perms in order for dspam to deliver successfully to the socket (chmod o+rx on /var/imap and /var/imap/socket).

I'm suspecting cyrus is the culprit, but if it is, I think steveb also uses cyrus and would have run into the same issue.

----------

## overkll

Looks like dspam is the culprit.

I run the command I posted above, then add "--deliver=innocent,spam --stdout | cat -v" and the ^M's are back!

```
MIME-Version: 1.0^M

Content-Type: text/plain;charset=iso-8859-1^M

X-Priority: 3 (Normal)^M

Importance: Normal^M

Content-Transfer-Encoding: quoted-printable^M

X-DSPAM-Result: Innocent^M

X-DSPAM-Processed: Thu Mar 13 14:53:50 2008^M

X-DSPAM-Confidence: 0.9899^M

X-DSPAM-Probability: 0.0000^M

X-DSPAM-Signature: 8,47d9864e93632105115549^M

X-DSPAM-Reclassified: Spam^M

^M

Here we go again.

Dammit!

```

Funny, but the ^M's are gone on the message body now.

So the whole thing is:

```
cat -v /var/spool/imap/j/user/joeblow/72. \

| sed 's/\^M//g' | dspam --user joeblow \

--class=innocent --source=error --deliver=innocent,spam --stdout \

| cat -v
```

I alternate the --class= to keep testing.

I'm curious if you run this on one of your dovecot messages, if you'll get the ^M's as well.

----------

## steveb

Can you try to activate "Broken lineStripping"?

```
#

# Broken MTA Options: Some MTAs don't support the proper functionality

# necessary. In these cases you can activate certain features in DSPAM to

# compensate. 'returnCodes' causes DSPAM to return an exit code of 99 if

# the message is spam, 0 if not, or a negative code if an error has occured.

# Specifying 'case' causes DSPAM to force the input usernames to lowercase.

# Spceifying 'lineStripping' causes DSPAM to strip ^M's from messages passed

# in.

#

Broken lineStripping
```

// SteveB

----------

## overkll

Activating "Broken lineStripping" does the trick!  :Smile: 

I was looking in the dspam readme, man dspam, man imapd.conf, and man pipe (postfix) for solutions.  Guess I was looking in the wrong places.  Thank you for pointing that out.

Do you have this issue with cyrus/dspam combination as well?

Do you have any more tips for a Postfix/Amavis/Clamav/Dspam/Cyrus setup?

BTW, do you use the cyrus deliver command, or the cyrus lmtp socket for message delivery to cyrus?  I'm struggling with folder delivery ie user+folder@domain.tld.

----------

## magic919

Here's a funny thing.  I been catching up and trying a few of these on one of my systems and it segfaults.  This is only when run o commandline as the filtering is working perfectly.

Seems someone broke Gentoo-flavour DSPAM from dspam-3.8.0-r8 onwards, for me anyway.  Looking at strace I see it chokes when it reads my group file.  A roll back to r7 fixes it.  I'll have to dig deeper.

----------

## overkll

I thought dspam WAS working perfectly.  The command line retraining pointed out it wasn't.  Even though retraining false positives by forwarding as an attachment to the alias "spam@mydomain.tld" worked, the forwarded messaged bounced, THEN the signature was found by dspam in the BOUNCED message and retraining was performed on the bounced message, not the original forwarded message.

I don't think this was an issue with 3.8.0-r8 or lower, but from r1 through r8, I heavily modified the ebuilds for my own systems as I felt many of the patches were unnecessary and feared the continuing "patch creep" would eventually introduce extra bugs.  This is the first time I've used the gentoo default ebuild.

When I tried the r9 version (I think it was r9),  I also had segfaults.  Commenting out the "append-flags $(bindnow-flags)" from the src_compile secton of the ebuild fixed that.  I'm not experiencing any segfaults with the r-11 ebuild now, nor have I commented out "append-flags" in the current ebuild.  You may want to try without "append_flags" just to test.

----------

## magic919

Sounds like something to try.  For the moment I've built Gntoo DSPAM r11 but left out patch 26_all_group-member-matching.patch and that works on command line.

----------

