# DSPAM + classification group

## petrjanda

Ive got a user called global that has a quite a good spam/ham corpus, i would like to test the corpus, but i dont want to apply the group globally to all users (2000+), ive got a number of users that would like to test the corpus, but i dont know how to configure dspam to use the global user for only a selected users, ie. user1, user2, user3. Is there a way?

----------

## steveb

Hallo petrjanda

You could create a merged group in your DSPAM home directory containing this:

```
global:merged:user1,user2,user3
```

On a normal DSPAM installation this should be here /var/spool/dspam/data/group. I don't know if it is realy there when you install with the Gentoo ebuild, since the Ebuild is a mess (this is MY viewpoint).

To see, where you need to create this group file, you could execute the following command:

```
dspam --version|sed -n "s:^.*\-\-with\-dspam\-home=\([^ ]*\).*:\1/data/group:gIp"
```

The merged group allows you to have the "global" data used/merged in realtime for user1,user2,user3. If you want user1,user2,user3 to have as well only one quarantine and share other stuff together, then consider using managed groups.

cheers

SteveB

BTW: Brno? Nice! The parents of my ex girlfriend are from Brno.

----------

## petrjanda

 *steveb wrote:*   

> Hallo petrjanda
> 
> You could create a merged group in your DSPAM home directory containing this:
> 
> ```
> ...

 

Thanks, i'll give that a whirl. What if the testing users receive innocent/spam missclassified? My setup uses username@spam.xxxx and username@ham.xxx to train on error. If the user1 replies to user1@ham.xxxx to report an innocent missclasified will that also train the global user's corpus? Im using DSPAM 3.4.9 as I found that 3.6.x has a rather terrible accuracy. If not, what is the best way to update the global user's tokens in the future to reflect changes/innovations in spam? Also, another question is concerning global classification group. I find that if mail is incorrectly classified by the global user,, and user1 tries to train his own tokens by replying to user1@ham.xxx or user1@spam.xxx respectively, the global classification group is used even when User1 is actually trying to train that the email was incorrectly classified. Looking at dspam stats, thereas a plus +1 on innocent misclassified but next time a same email is recevied dspam still misclassifies based on the global group even though User1 apparently trained his tokens for this kind of message. Is this a correct behaviour?

What nationality are you?  :Smile: 

----------

## steveb

 *petrjanda wrote:*   

> Thanks, i'll give that a whirl. What if the testing users receive innocent/spam missclassified? My setup uses username@spam.xxxx and username@ham.xxx to train on error. If the user1 replies to user1@ham.xxxx to report an innocent missclasified will that also train the global user's corpus?

 NO! it will not at all train the data for your global user. It will only train the data for user1.

 *petrjanda wrote:*   

> Im using DSPAM 3.4.9 as I found that 3.6.x has a rather terrible accuracy.

 Funny! I use DSPAM 3.6.5. Anyway... If you find DSPAM 3.6.x to have terrible accurancy, then you don't understand what DSPAM is. DSPAM is not a normal Anti-Spam filter. DSPAM is alot more than that. It includes serval algorithms for classifying ham/spam mails, whitelisting, web interface, etc... All that is DSPAM. So telling me, that DSPAM 3.4.9 has better accurancy then DSPAM 3.6.5 is strange to me. You would better tell me, that using algorithm "burton" with pvalue "graham" had better result in DSPAM 3.4.9 then in DSPAM 3.6.5.

Anyway... you have luck. Today I am on my way to publish again new data on the DSPAM mailing list about my progress in training. My old stats are here.

My current training with the same data as in the above mentioned url has the following accurancy:

```
nautilus / # dspam_stats -H globaluser

globaluser:

                TP True Positives:         1769983

                TN True Negatives:         2951556

                FP False Positives:          8185

                FN False Negatives:           403

                SC Spam Corpusfed:            447

                NC Nonspam Corpusfed:        3605

                TL Training Left:               0

                SHR Spam Hit Rate          99.98%

                HSR Ham Strike Rate:        0.28%

                OCA Overall Accuracy:      99.82%

nautilus / #
```

Don't get confused about the high TP and TN. I don't have that much data! I only have:

```
nautilus / # for foo in /var/spam/vunet.training.maildir.00?/set??/nonspam/*;do echo 1;done|wc -l

285176

nautilus / # for foo in /var/spam/vunet.training.maildir.00?/set??/spam/*;do echo 1;done|wc -l

185529

nautilus / #
```

This is 285'176 ham and 185'529 spam messages.

But I trained this time diffrendly then before (It's to much to explain what I did, but the term used for that method of training is "Training to Exhaustion") and I am still not finished. However... using all the spam data from spamarchive.org and runing it over my current DSPAM installation does produce this:

```
nautilus submit # _total_spam_count=0 ;_total_false_negative_count=0 ; for foo in *.r2.gz ; do _total_spam="$(mboxgrep --basic-regexp . --mailbox-format=zmbox --count ${foo})" ; let _total_spam_count=$((_total_spam_count+_total_spam)) ; _total_false_negative=$(mboxgrep --basic-regexp . --mailbox-format=zmbox --pipe="dspam --user globaluser --classify --deliver=summary 2>/dev/null" ${foo} | grep -v "result=\"Spam" | wc -l) ; let _total_false_negative_count=$((_total_false_negative_count+_total_false_negative)) ; echo "${foo} : Total SPAM: ${_total_spam} - Total FN: ${_total_false_negative}" ;  done ; echo ; echo "Total SPAM: ${_total_spam_count} - Total FN: ${_total_false_negative_count}" ; echo "Accurancy: "$(echo "scale=20 ; 100 - ( $_total_false_negative_count * 100 / $_total_spam_count )" | bc)"%"

800.r2.gz : Total SPAM: 1815 - Total FN: 0

801.r2.gz : Total SPAM: 228 - Total FN: 0

802.r2.gz : Total SPAM: 144 - Total FN: 0

803.r2.gz : Total SPAM: 2391 - Total FN: 0

804.r2.gz : Total SPAM: 74 - Total FN: 0

805.r2.gz : Total SPAM: 1537 - Total FN: 0

806.r2.gz : Total SPAM: 553 - Total FN: 1

807.r2.gz : Total SPAM: 137 - Total FN: 0

808.r2.gz : Total SPAM: 234 - Total FN: 0

809.r2.gz : Total SPAM: 2098 - Total FN: 0

810.r2.gz : Total SPAM: 218 - Total FN: 0

811.r2.gz : Total SPAM: 124 - Total FN: 0

812.r2.gz : Total SPAM: 1701 - Total FN: 0

813.r2.gz : Total SPAM: 94 - Total FN: 0

814.r2.gz : Total SPAM: 1715 - Total FN: 0

815.r2.gz : Total SPAM: 101 - Total FN: 0

816.r2.gz : Total SPAM: 378 - Total FN: 0

817.r2.gz : Total SPAM: 150 - Total FN: 0

818.r2.gz : Total SPAM: 122 - Total FN: 0

819.r2.gz : Total SPAM: 183 - Total FN: 0

820.r2.gz : Total SPAM: 145 - Total FN: 0

821.r2.gz : Total SPAM: 3157 - Total FN: 0

822.r2.gz : Total SPAM: 94 - Total FN: 0

823.r2.gz : Total SPAM: 77 - Total FN: 0

824.r2.gz : Total SPAM: 738 - Total FN: 0

825.r2.gz : Total SPAM: 182 - Total FN: 0

826.r2.gz : Total SPAM: 249 - Total FN: 0

827.r2.gz : Total SPAM: 98 - Total FN: 0

828.r2.gz : Total SPAM: 169 - Total FN: 0

829.r2.gz : Total SPAM: 169 - Total FN: 0

830.r2.gz : Total SPAM: 115 - Total FN: 0

831.r2.gz : Total SPAM: 127 - Total FN: 0

832.r2.gz : Total SPAM: 188 - Total FN: 0

833.r2.gz : Total SPAM: 928 - Total FN: 0

834.r2.gz : Total SPAM: 134 - Total FN: 0

835.r2.gz : Total SPAM: 137 - Total FN: 0

836.r2.gz : Total SPAM: 113 - Total FN: 0

837.r2.gz : Total SPAM: 99 - Total FN: 0

838.r2.gz : Total SPAM: 114 - Total FN: 0

839.r2.gz : Total SPAM: 511 - Total FN: 0

840.r2.gz : Total SPAM: 302 - Total FN: 2

841.r2.gz : Total SPAM: 13623 - Total FN: 0

842.r2.gz : Total SPAM: 9209 - Total FN: 0

843.r2.gz : Total SPAM: 305 - Total FN: 0

844.r2.gz : Total SPAM: 155 - Total FN: 0

845.r2.gz : Total SPAM: 20924 - Total FN: 0

846.r2.gz : Total SPAM: 2316 - Total FN: 0

847.r2.gz : Total SPAM: 213 - Total FN: 0

848.r2.gz : Total SPAM: 1250 - Total FN: 0

849.r2.gz : Total SPAM: 146 - Total FN: 0

850.r2.gz : Total SPAM: 103 - Total FN: 0

851.r2.gz : Total SPAM: 87 - Total FN: 0

852.r2.gz : Total SPAM: 7199 - Total FN: 0

853.r2.gz : Total SPAM: 1593 - Total FN: 0

854.r2.gz : Total SPAM: 116 - Total FN: 0

855.r2.gz : Total SPAM: 2074 - Total FN: 0

856.r2.gz : Total SPAM: 1884 - Total FN: 0

857.r2.gz : Total SPAM: 1734 - Total FN: 0

858.r2.gz : Total SPAM: 1372 - Total FN: 0

859.r2.gz : Total SPAM: 1728 - Total FN: 0

860.r2.gz : Total SPAM: 854 - Total FN: 0

861.r2.gz : Total SPAM: 2023 - Total FN: 0

862.r2.gz : Total SPAM: 1130 - Total FN: 0

863.r2.gz : Total SPAM: 125 - Total FN: 0

864.r2.gz : Total SPAM: 1943 - Total FN: 0

865.r2.gz : Total SPAM: 118 - Total FN: 0

866.r2.gz : Total SPAM: 3481 - Total FN: 0

867.r2.gz : Total SPAM: 483 - Total FN: 0

868.r2.gz : Total SPAM: 94 - Total FN: 0

869.r2.gz : Total SPAM: 709 - Total FN: 0

870.r2.gz : Total SPAM: 48 - Total FN: 0

871.r2.gz : Total SPAM: 32 - Total FN: 0

872.r2.gz : Total SPAM: 52 - Total FN: 0

873.r2.gz : Total SPAM: 332 - Total FN: 0

874.r2.gz : Total SPAM: 215 - Total FN: 0

875.r2.gz : Total SPAM: 176 - Total FN: 0

876.r2.gz : Total SPAM: 460 - Total FN: 0

877.r2.gz : Total SPAM: 196 - Total FN: 0

878.r2.gz : Total SPAM: 476 - Total FN: 0

879.r2.gz : Total SPAM: 102 - Total FN: 0

880.r2.gz : Total SPAM: 702 - Total FN: 0

881.r2.gz : Total SPAM: 3650 - Total FN: 0

882.r2.gz : Total SPAM: 6933 - Total FN: 0

883.r2.gz : Total SPAM: 118 - Total FN: 0

884.r2.gz : Total SPAM: 76 - Total FN: 0

886.r2.gz : Total SPAM: 3408 - Total FN: 0

887.r2.gz : Total SPAM: 75 - Total FN: 0

888.r2.gz : Total SPAM: 966 - Total FN: 0

889.r2.gz : Total SPAM: 115 - Total FN: 0

890.r2.gz : Total SPAM: 102 - Total FN: 0

891.r2.gz : Total SPAM: 180 - Total FN: 0

892.r2.gz : Total SPAM: 85 - Total FN: 0

893.r2.gz : Total SPAM: 202 - Total FN: 0

894.r2.gz : Total SPAM: 9 - Total FN: 0

895.r2.gz : Total SPAM: 161 - Total FN: 0

896.r2.gz : Total SPAM: 61 - Total FN: 0

897.r2.gz : Total SPAM: 302 - Total FN: 0

898.r2.gz : Total SPAM: 196 - Total FN: 0

899.r2.gz : Total SPAM: 213 - Total FN: 0

900.r2.gz : Total SPAM: 358 - Total FN: 0

901.r2.gz : Total SPAM: 97 - Total FN: 0

902.r2.gz : Total SPAM: 70 - Total FN: 0

903.r2.gz : Total SPAM: 89 - Total FN: 0

904.r2.gz : Total SPAM: 403 - Total FN: 0

905.r2.gz : Total SPAM: 195 - Total FN: 0

906.r2.gz : Total SPAM: 261 - Total FN: 0

907.r2.gz : Total SPAM: 55 - Total FN: 0

908.r2.gz : Total SPAM: 78 - Total FN: 0

909.r2.gz : Total SPAM: 244 - Total FN: 0

910.r2.gz : Total SPAM: 170 - Total FN: 0

911.r2.gz : Total SPAM: 98 - Total FN: 0

912.r2.gz : Total SPAM: 233 - Total FN: 0

913.r2.gz : Total SPAM: 138 - Total FN: 0

914.r2.gz : Total SPAM: 63 - Total FN: 0

915.r2.gz : Total SPAM: 245 - Total FN: 0

916.r2.gz : Total SPAM: 30 - Total FN: 0

917.r2.gz : Total SPAM: 188 - Total FN: 0

918.r2.gz : Total SPAM: 52 - Total FN: 0

919.r2.gz : Total SPAM: 163 - Total FN: 0

920.r2.gz : Total SPAM: 359 - Total FN: 0

921.r2.gz : Total SPAM: 25 - Total FN: 0

922.r2.gz : Total SPAM: 4 - Total FN: 0

923.r2.gz : Total SPAM: 164 - Total FN: 0

924.r2.gz : Total SPAM: 37 - Total FN: 0

925.r2.gz : Total SPAM: 14 - Total FN: 0

926.r2.gz : Total SPAM: 166 - Total FN: 0

927.r2.gz : Total SPAM: 121 - Total FN: 0

928.r2.gz : Total SPAM: 330 - Total FN: 0

929.r2.gz : Total SPAM: 323 - Total FN: 0

930.r2.gz : Total SPAM: 308 - Total FN: 0

931.r2.gz : Total SPAM: 285 - Total FN: 0

932.r2.gz : Total SPAM: 41 - Total FN: 0

933.r2.gz : Total SPAM: 43 - Total FN: 0

934.r2.gz : Total SPAM: 252 - Total FN: 0

935.r2.gz : Total SPAM: 116 - Total FN: 0

936.r2.gz : Total SPAM: 42 - Total FN: 0

937.r2.gz : Total SPAM: 269 - Total FN: 0

938.r2.gz : Total SPAM: 292 - Total FN: 0

939.r2.gz : Total SPAM: 235 - Total FN: 0

940.r2.gz : Total SPAM: 141 - Total FN: 0

941.r2.gz : Total SPAM: 257 - Total FN: 0

942.r2.gz : Total SPAM: 262 - Total FN: 0

943.r2.gz : Total SPAM: 106 - Total FN: 0

944.r2.gz : Total SPAM: 152 - Total FN: 0

945.r2.gz : Total SPAM: 204 - Total FN: 0

946.r2.gz : Total SPAM: 109 - Total FN: 0

947.r2.gz : Total SPAM: 79 - Total FN: 0

948.r2.gz : Total SPAM: 1064 - Total FN: 0

Total SPAM: 127507 - Total FN: 3

Accurancy: 99.99764718799752170470%

nautilus submit # cd ../submitautomated/

nautilus submitautomated # _total_spam_count=0 ;_total_false_negative_count=0 ; for foo in *.r2.gz ; do _total_spam="$(mboxgrep --basic-regexp . --mailbox-format=zmbox --count ${foo})" ; let _total_spam_count=$((_total_spam_count+_total_spam)) ; _total_false_negative=$(mboxgrep --basic-regexp . --mailbox-format=zmbox --pipe="dspam --user globaluser --classify --deliver=summary 2>/dev/null" ${foo} | grep -v "result=\"Spam" | wc -l) ; let _total_false_negative_count=$((_total_false_negative_count+_total_false_negative)) ; echo "${foo} : Total SPAM: ${_total_spam} - Total FN: ${_total_false_negative}" ;  done ; echo ; echo "Total SPAM: ${_total_spam_count} - Total FN: ${_total_false_negative_count}" ; echo "Accurancy: "$(echo "scale=20 ; 100 - ( $_total_false_negative_count * 100 / $_total_spam_count )" | bc)"%"

800.r2.gz : Total SPAM: 118 - Total FN: 0

801.r2.gz : Total SPAM: 145 - Total FN: 0

802.r2.gz : Total SPAM: 135 - Total FN: 0

803.r2.gz : Total SPAM: 156 - Total FN: 0

804.r2.gz : Total SPAM: 186 - Total FN: 0

805.r2.gz : Total SPAM: 133 - Total FN: 0

806.r2.gz : Total SPAM: 89 - Total FN: 0

807.r2.gz : Total SPAM: 88 - Total FN: 0

808.r2.gz : Total SPAM: 215 - Total FN: 0

809.r2.gz : Total SPAM: 176 - Total FN: 0

810.r2.gz : Total SPAM: 175 - Total FN: 0

811.r2.gz : Total SPAM: 122 - Total FN: 0

812.r2.gz : Total SPAM: 104 - Total FN: 0

813.r2.gz : Total SPAM: 128 - Total FN: 0

814.r2.gz : Total SPAM: 139 - Total FN: 0

815.r2.gz : Total SPAM: 80 - Total FN: 0

816.r2.gz : Total SPAM: 60 - Total FN: 0

817.r2.gz : Total SPAM: 80 - Total FN: 0

818.r2.gz : Total SPAM: 92 - Total FN: 0

819.r2.gz : Total SPAM: 112 - Total FN: 0

820.r2.gz : Total SPAM: 175 - Total FN: 0

821.r2.gz : Total SPAM: 214 - Total FN: 0

822.r2.gz : Total SPAM: 197 - Total FN: 0

823.r2.gz : Total SPAM: 117 - Total FN: 0

824.r2.gz : Total SPAM: 89 - Total FN: 0

825.r2.gz : Total SPAM: 125 - Total FN: 0

826.r2.gz : Total SPAM: 57 - Total FN: 0

827.r2.gz : Total SPAM: 60 - Total FN: 0

828.r2.gz : Total SPAM: 128 - Total FN: 0

829.r2.gz : Total SPAM: 165 - Total FN: 0

830.r2.gz : Total SPAM: 132 - Total FN: 0

831.r2.gz : Total SPAM: 171 - Total FN: 0

832.r2.gz : Total SPAM: 132 - Total FN: 0

833.r2.gz : Total SPAM: 131 - Total FN: 0

834.r2.gz : Total SPAM: 113 - Total FN: 0

835.r2.gz : Total SPAM: 158 - Total FN: 0

836.r2.gz : Total SPAM: 92 - Total FN: 0

837.r2.gz : Total SPAM: 81 - Total FN: 0

838.r2.gz : Total SPAM: 79 - Total FN: 0

839.r2.gz : Total SPAM: 99 - Total FN: 0

840.r2.gz : Total SPAM: 64 - Total FN: 0

841.r2.gz : Total SPAM: 113 - Total FN: 0

842.r2.gz : Total SPAM: 92 - Total FN: 0

843.r2.gz : Total SPAM: 42 - Total FN: 0

844.r2.gz : Total SPAM: 55 - Total FN: 0

845.r2.gz : Total SPAM: 65 - Total FN: 0

846.r2.gz : Total SPAM: 49 - Total FN: 0

847.r2.gz : Total SPAM: 42 - Total FN: 0

849.r2.gz : Total SPAM: 34 - Total FN: 0

850.r2.gz : Total SPAM: 69 - Total FN: 0

851.r2.gz : Total SPAM: 33 - Total FN: 0

852.r2.gz : Total SPAM: 27 - Total FN: 0

853.r2.gz : Total SPAM: 20 - Total FN: 0

854.r2.gz : Total SPAM: 45 - Total FN: 0

855.r2.gz : Total SPAM: 64 - Total FN: 0

856.r2.gz : Total SPAM: 40 - Total FN: 0

857.r2.gz : Total SPAM: 50 - Total FN: 0

858.r2.gz : Total SPAM: 63 - Total FN: 0

859.r2.gz : Total SPAM: 57 - Total FN: 0

860.r2.gz : Total SPAM: 58 - Total FN: 0

861.r2.gz : Total SPAM: 56 - Total FN: 0

862.r2.gz : Total SPAM: 77 - Total FN: 0

863.r2.gz : Total SPAM: 62 - Total FN: 0

864.r2.gz : Total SPAM: 67 - Total FN: 0

865.r2.gz : Total SPAM: 42 - Total FN: 0

866.r2.gz : Total SPAM: 77 - Total FN: 0

867.r2.gz : Total SPAM: 53 - Total FN: 0

868.r2.gz : Total SPAM: 81 - Total FN: 0

869.r2.gz : Total SPAM: 69 - Total FN: 0

870.r2.gz : Total SPAM: 85 - Total FN: 0

871.r2.gz : Total SPAM: 76 - Total FN: 0

872.r2.gz : Total SPAM: 173 - Total FN: 0

873.r2.gz : Total SPAM: 84 - Total FN: 0

874.r2.gz : Total SPAM: 82 - Total FN: 0

875.r2.gz : Total SPAM: 84 - Total FN: 0

876.r2.gz : Total SPAM: 66 - Total FN: 0

877.r2.gz : Total SPAM: 76 - Total FN: 0

878.r2.gz : Total SPAM: 52 - Total FN: 0

879.r2.gz : Total SPAM: 107 - Total FN: 0

880.r2.gz : Total SPAM: 120 - Total FN: 0

881.r2.gz : Total SPAM: 70 - Total FN: 0

882.r2.gz : Total SPAM: 30 - Total FN: 0

883.r2.gz : Total SPAM: 44 - Total FN: 0

884.r2.gz : Total SPAM: 52 - Total FN: 0

885.r2.gz : Total SPAM: 55 - Total FN: 0

886.r2.gz : Total SPAM: 64 - Total FN: 0

887.r2.gz : Total SPAM: 448 - Total FN: 0

888.r2.gz : Total SPAM: 23 - Total FN: 0

889.r2.gz : Total SPAM: 42 - Total FN: 0

890.r2.gz : Total SPAM: 5 - Total FN: 0

891.r2.gz : Total SPAM: 52 - Total FN: 0

892.r2.gz : Total SPAM: 21 - Total FN: 0

893.r2.gz : Total SPAM: 5 - Total FN: 0

894.r2.gz : Total SPAM: 13 - Total FN: 0

895.r2.gz : Total SPAM: 3 - Total FN: 0

896.r2.gz : Total SPAM: 3 - Total FN: 0

897.r2.gz : Total SPAM: 5 - Total FN: 0

898.r2.gz : Total SPAM: 17 - Total FN: 0

899.r2.gz : Total SPAM: 75 - Total FN: 0

900.r2.gz : Total SPAM: 25 - Total FN: 0

901.r2.gz : Total SPAM: 22 - Total FN: 0

902.r2.gz : Total SPAM: 21 - Total FN: 0

903.r2.gz : Total SPAM: 50 - Total FN: 0

904.r2.gz : Total SPAM: 69 - Total FN: 0

905.r2.gz : Total SPAM: 26 - Total FN: 0

906.r2.gz : Total SPAM: 175 - Total FN: 0

907.r2.gz : Total SPAM: 75 - Total FN: 0

908.r2.gz : Total SPAM: 48 - Total FN: 0

909.r2.gz : Total SPAM: 85 - Total FN: 0

910.r2.gz : Total SPAM: 58 - Total FN: 0

911.r2.gz : Total SPAM: 36 - Total FN: 0

912.r2.gz : Total SPAM: 52 - Total FN: 0

913.r2.gz : Total SPAM: 43 - Total FN: 0

914.r2.gz : Total SPAM: 80 - Total FN: 0

915.r2.gz : Total SPAM: 306 - Total FN: 0

916.r2.gz : Total SPAM: 23 - Total FN: 0

917.r2.gz : Total SPAM: 33 - Total FN: 0

918.r2.gz : Total SPAM: 64 - Total FN: 0

919.r2.gz : Total SPAM: 71 - Total FN: 0

920.r2.gz : Total SPAM: 19 - Total FN: 0

921.r2.gz : Total SPAM: 25 - Total FN: 0

922.r2.gz : Total SPAM: 23 - Total FN: 0

923.r2.gz : Total SPAM: 3 - Total FN: 0

924.r2.gz : Total SPAM: 4 - Total FN: 0

925.r2.gz : Total SPAM: 46 - Total FN: 0

926.r2.gz : Total SPAM: 44 - Total FN: 0

927.r2.gz : Total SPAM: 3 - Total FN: 0

928.r2.gz : Total SPAM: 23 - Total FN: 0

929.r2.gz : Total SPAM: 42 - Total FN: 0

930.r2.gz : Total SPAM: 6 - Total FN: 0

931.r2.gz : Total SPAM: 7 - Total FN: 0

932.r2.gz : Total SPAM: 140 - Total FN: 0

933.r2.gz : Total SPAM: 22 - Total FN: 0

934.r2.gz : Total SPAM: 33 - Total FN: 0

935.r2.gz : Total SPAM: 29 - Total FN: 0

936.r2.gz : Total SPAM: 20 - Total FN: 0

937.r2.gz : Total SPAM: 7 - Total FN: 0

938.r2.gz : Total SPAM: 5 - Total FN: 0

939.r2.gz : Total SPAM: 16 - Total FN: 0

940.r2.gz : Total SPAM: 54 - Total FN: 0

941.r2.gz : Total SPAM: 37 - Total FN: 0

942.r2.gz : Total SPAM: 24 - Total FN: 0

943.r2.gz : Total SPAM: 53 - Total FN: 0

944.r2.gz : Total SPAM: 18 - Total FN: 0

945.r2.gz : Total SPAM: 7 - Total FN: 0

946.r2.gz : Total SPAM: 82 - Total FN: 0

947.r2.gz : Total SPAM: 27 - Total FN: 0

948.r2.gz : Total SPAM: 13 - Total FN: 0

949.r2.gz : Total SPAM: 62 - Total FN: 0

Total SPAM: 11002 - Total FN: 0

Accurancy: 100.00000000000000000000%

nautilus submitautomated #
```

As you can see, I have on the submit 99.99764718799752170470% and on the submitautomated I have 100% accurancy. And I don't use the data from spamarchive.org. I have my own data. But I use spamarchive.org as a reference to know how good my filter is.

I am sure, that my DSPAM setup is special. For example I have 169 IgnoreHeader entries in my dspam.conf and other stuff you probably don't have. As for the preferences used for my globaluser. They are:

```
nautilus / # dspam_admin list preference globaluser

enableBNR=on

enableWhitelist=off

fallbackDomain=

ignoreGroups=off

localStore=globaluser

makeCorpus=off

optIn=on

optOut=off

optOutClamAV=on

processorBias=off

showFactors=off

signatureLocation=header

spamAction=deliver

spamSubject=

statisticalSedation=0

storeFragments=off

trainingMode=TEFT

trainPristine=off

whitelistThreshold=9999999

nautilus / #
```

 *petrjanda wrote:*   

> If not, what is the best way to update the global user's tokens in the future to reflect changes/innovations in spam?

 You can merge the user tokens with the one from your global user. Or you could use spam traps or blocking lists to inoculate data into DSPAM. There are many ways to keep the data for your global user up to date. If you need more info on that, then let me know.

 *petrjanda wrote:*   

> Also, another question is concerning global classification group.

 Do you use classification or merged group?

 *petrjanda wrote:*   

> I find that if mail is incorrectly classified by the global user,, and user1 tries to train his own tokens by replying to user1@ham.xxx or user1@spam.xxx respectively, the global classification group is used even when User1 is actually trying to train that the email was incorrectly classified. Looking at dspam stats, thereas a plus +1 on innocent misclassified but next time a same email is recevied dspam still misclassifies based on the global group even though User1 apparently trained his tokens for this kind of message. Is this a correct behaviour?

 Yes. But only for all the other users except user1, because user1 has trained his data to include the correct tokens. If you don't like that, you could use a managed group (on top of the merged group), but then every one will have the same corpus and same quarantine. I personaly find it very bad to do that, since spam for user1 could be ham for user2.

 *petrjanda wrote:*   

> What nationality are you? 

 Swiss

cheers

SteveBLast edited by steveb on Sun May 21, 2006 4:56 pm; edited 1 time in total

----------

## petrjanda

 *steveb wrote:*   

> 
> 
>  *petrjanda wrote:*   Im using DSPAM 3.4.9 as I found that 3.6.x has a rather terrible accuracy. Funny! I use DSPAM 3.6.5. Anyway... If you find DSPAM 3.6.x to have terrible accurancy, then you don't understand what DSPAM is. DSPAM is not a normal Anti-Spam filter. DSPAM is alot more than that. It includes serval algorithms for classifying ham/spam mails, whitelisting, web interface, etc... All that is DSPAM. So telling me, that DSPAM 3.4.9 has better accurancy then DSPAM 3.6.5 is strange to me. You would better tell me, that using algorithm "burton" with pvalue "graham" had better result in DSPAM 3.4.9 then in DSPAM 3.6.5.
> 
> 

 

Could you send me your dspam.conf to janda.petr@gmail.com so I could have a look at your configuration and use it as reference for mine? I would like to use 3.6 but because it had given me such bad results(in semi-default dspam configuration), i had to downgrade.

 *Quote:*   

> 
> 
> You can merge the user tokens with the one from your global user. Or you could use spam traps or blocking lists to inoculate data into DSPAM. There are many ways to keep the data for your global user up to date. If you need more info on that, then let me know.
> 
> 

 

More info would be nice!  :Smile: 

 *Quote:*   

> 
> 
> Do you use classification or merged group?
> 
> 

 

Up until now I was using classification group. Im gonna use merged group now.

 *Quote:*   

> Yes. But only for all the other users except user1, because user1 has trained his data to include the correct tokens. If you don't like that, you could use a managed group (on top of the merged group), but then every one will have the same corpus and same quarantine. I personaly find it very bad to do that, since spam for user1 could be ham for user2.
> 
> 

 

So do you have an idea why user1 still couldnt filter it? I remember looking at dspam.debug, it basically started off ok as if it was going to train for User1, but then global classification group was added, it checked global's tokens, and incorrectly classified the message, never returned to train User1.

Thanks!

----------

## steveb

 *petrjanda wrote:*   

> Could you send me your dspam.conf to janda.petr@gmail.com so I could have a look at your configuration and use it as reference for mine? I would like to use 3.6 but because it had given me such bad results(in semi-default dspam configuration), i had to downgrade.

 This is one of my dspam.conf (sligthly modified for you) file. Keep in mind, that this is 3.6.5 installed with my own ebuild:

```
## $Id: dspam.conf.in,v 1.70 2006/02/15 18:19:40 jonz Exp $

## dspam.conf -- DSPAM configuration file

##

#

# DSPAM Home: Specifies the base directory to be used for DSPAM storage

#

Home /var/spool/dspam

#

# StorageDriver: Specifies the storage driver backend (library) to use.

# You'll only need to set this if you are using dynamic storage driver plugins.

# The default when one storage driver is specified is to statically link. Be 

# sure to include the path to the library if necessary, and some systems may 

# use an extension other than .so.

#

# Options include:

#

#   libmysql_drv.so     libpgsql_drv.so   libsqlite_drv.so

#   libsqlite3_drv.so   libora_drv.so     libdb4_drv.so

#   libdb3_drv.so       libhash_drv.so

#

# IMPORTANT: Switching storage drivers requires more than merely changing

# this option. If you do not wish to lose all of your data, you will need to

# migrate it to the new backend before making this change.

#

StorageDriver /usr/lib/libmysql_drv.so

#

# Trusted Delivery Agent: Specifies the local delivery agent DSPAM should call 

# when delivering mail as a trusted user. Use %u to specify the user DSPAM is 

# processing mail for. It is generally a good idea to allow the MTA to specify 

# the pass-through arguments at run-time, but they may also be specified here.

#

# Most operating system defaults:

#TrustedDeliveryAgent "/usr/bin/procmail"       # Linux

#TrustedDeliveryAgent "/usr/bin/mail"           # Solaris

#TrustedDeliveryAgent "/usr/libexec/mail.local" # FreeBSD

#TrustedDeliveryAgent "/usr/bin/procmail"       # Cygwin

#

# Other popular configurations:

#TrustedDeliveryAgent "/usr/cyrus/bin/deliver"   # Cyrus

#TrustedDeliveryAgent "/bin/maildrop"      # Maildrop

#TrustedDeliveryAgent "/usr/local/sbin/exim -oMr spam-scanned" # Exim

#

#UntrustedDeliveryAgent "/usr/bin/procmail -d %u"

TrustedDeliveryAgent "/usr/sbin/sendmail"

#

# Untrusted Delivery Agent: Specifies the local delivery agent and arguments

# DSPAM should use when delivering mail and running in untrusted user mode.

# Because DSPAM will not allow pass-through arguments to be specified to 

# untrusted users, all arguments should be specified here. Use %u to specify

# the user DSPAM is processing mail for. This configuration parameter is only 

# necessary if you plan on allowing untrusted processing.

#

#UntrustedDeliveryAgent "/usr/bin/procmail -d %u"

UntrustedDeliveryAgent "/usr/sbin/sendmail"

#

# SMTP or LMTP Delivery: Alternatively, you may wish to use SMTP or LMTP 

# delivery to deliver your message to the mail server. You will need to 

# configure with --enable-daemon to use host delivery, however you do not need 

# to operate in daemon mode. Specify an IP address or UNIX path to a domain 

# socket below as a host.

#

# If you would like to set up DeliveryHost's on a per-domain basis, use

# the syntax: DeliveryHost.domain.com 1.2.3.4

#

#DeliveryHost        127.0.0.1

#DeliveryPort        24

#DeliveryIdent       localhost

#DeliveryProto       LMTP

#

# Quarantine Agent: DSPAM's default behavior is to quarantine all mail it 

# thinks is spam. If you wish to override this behavior, you may specify

# a quarantine agent which will be called with all messages DSPAM thinks is

# spam. Use %u to specify the user DSPAM is processing mail for.

#

#QuarantineAgent   "/usr/bin/procmail -d spam"

#

# DSPAM can optionally process "plused users" (addresses in the user+detail

# form) by truncating the username just before the "+", so all internal

# processing occurs for "user", but delivery will be performed for

# "user+detail". This is only useful if the LDA can handle "plused users"

# (for example Cyrus IMAP) and when configured for LMTP delivery above

#

# NOTE: Plused detail presently only works when usernames are provided and

#       not fully qualified email address (@domain).

#

#EnablePlusedDetail   on

#

# Quarantine Mailbox: DSPAM's LMTP code can send spam mail using LMTP to a 

# "plused" mailbox (such as user+quarantine) leaving quarantine processing

# for retraining or deletion to be performed by the LDA and the mail client.

# "plused" mailboxes are supported by Cyrus IMAP and possibly other LDAs.

# The mailbox name must have the +

#

#QuarantineMailbox   +quarantine

#

# OnFail: What to do if local delivery or quarantine should fail. If set

# to "unlearn", DSPAM will unlearn the message prior to exiting with an

# un successful return code. The default option, "error" will not unlearn

# the message but return the appropriate error code. The unlearn option

# is use-ful on some systems where local delivery failures will cause the

# message to be requeued for delivery, and could result in the message

# being processed multiple times. During a very large failure, however, 

# this could cause a significant load increase.

#

OnFail error

# Trusted Users: Only the users specified below will be allowed to perform

# administrative functions in DSPAM such as setting the active user and

# accessing tools. All other users attempting to run DSPAM will be restricted;

# their uids will be forced to match the active username and they will not be

# able to specify delivery agent privileges or use tools.

#

Trust root

Trust mail

Trust mailnull 

Trust smmsp

Trust daemon

Trust nobody

Trust majordomo

Trust apache

Trust mailman

Trust postfix

Trust dspam

#

# Debugging: Enables debugging for some or all users. IMPORTANT: DSPAM must

# be compiled with debug support in order to use this option. DSPAM should

# never be running in production with debug active unless you are 

# troubleshooting problems.

#

# DebugOpt: One or more of: process, classify, spam, fp, inoculation, corpus

#   process     standard message processing

#   classify    message classification using --classify

#   spam        error correction of missed spam

#   fp          error correction of false positives

#   inoculation message inoculations (source=inoculation)

#   corpus      corpusfed messages (source=corpus)

#

#Debug *

#Debug bob bill

#

#DebugOpt process spam fp

#DebugOpt process classify spam fp inoculation corpus

#

# ClassAlias: Alias a particular class to spam/nonspam. This is useful if

# classifying things other than spam.

#ClassAliasSpam badstuff

#ClassAliasNonspam goodstuff

#

# Training Mode: The default training mode to use for all operations, when

# one has not been specified on the commandline or in the user's preferences.

# Acceptable values are: toe, tum, teft, notrain

#

TrainingMode toe

#

# TestConditionalTraining: By default, dspam will retrain certain errors

# until the condition is no longer met. This usually accelerates learning.

# Some people argue that this can increase the risk of errors, however.

#

TestConditionalTraining on

#

# Features: Specify features to activate by default; can also be specified

# on the commandline. See the documentation for a list of available features.

# If _any_ features are specified on the commandline, these are ignored.

#

# NOTE: For standard "CRM114" Markovian weighting, use sbph

#

#Feature sbph

Feature noise

Feature chained

Feature whitelist

# Training Buffer: The training buffer waters down statistics during training.

# It is designed to prevent false positives, but can also dramatically reduce

# dspam's catch rate during initial training. This can be a number from 0

# (no buffering) to 10 (maximum buffering). If you are paranoid about false

# positives, you should probably enable this option.

Feature tb=5

#

# Algorithms: Specify the statistical algorithms to use, overriding any

# defaults configured in the build. The options are:

#    naive       Naive-Bayesian (All Tokens)

#    graham      Graham-Bayesian ("A Plan for Spam")

#    burton      Burton-Bayesian (SpamProbe)

#    robinson    Robinson's Geometric Mean Test (Obsolete)

#    chi-square  Fisher-Robinson's Chi-Square Algorithm

#

# You may have multiple algorithms active simultaneously, but it is strongly

# recommended that you group Bayesian algorithms with other Bayesian

# algorithms, and any use of Chi-Square remain exclusive.

#

# NOTE: For standard "CRM114" Markovian weighting, use 'naive', or consider

#       using 'burton' for slightly better accuracy

#

# Don't mess with this unless you know what you're doing

#

#Algorithm chi-square

#Algorithm naive

Algorithm burton graham naive

#Algorithm burton

#

# PValue: Specify the technique used for calculating PValues, overriding any

# defaults configured in the build. These options are:

#    graham      Graham's Technique ("A Plan for Spam")

#    robinson    Robinson's Technique 

#    markov      Markovian Weighted Technique

#

# Unlike algorithms, you may only have one of these defined. Use of the

# chi-square algorithm automatically changes this to robinson.

#

# Don't mess with this unless you know what you're doing.

#

#PValue robinson

#PValue markov

PValue graham

#

# SupressWebStats: Enable this if you are not using the CGI, and don't want

# .stats files written.

#SupressWebStats on

#

# ImprobabilityDrive: Calculate odds-ratios for ham/spam, and add to

# X-DSPAM-Improbability headers

ImprobabilityDrive on

#

# Preferences: Specify any preferences to set by default, unless otherwise

# overridden by the user (see next section) or a default.prefs file.

# If user or default.prefs are found, the user's preferences will override any

# defaults.

#

Preference "trainingMode=TOE"      # TEFT, TUM, TOE

Preference "spamAction=tag"      # tag, quarantine, deliver

Preference "signatureLocation=message"   # 'message' or 'headers'

Preference "spamSubject=[SPAM]"

Preference "statisticalSedation=5"   # 0 to 9

Preference "enableBNR=on"      # on, off

Preference "showFactors=off"      # on, off

Preference "enableWhitelist=on"      # on, off

Preference "whitelistThreshold=10"

#

# Overrides: Specifies the user preferences which may override configuration

# and commandline defaults. Any other preferences supplied by an untrusted user

# will be ignored.

#

AllowOverride enableBNR

AllowOverride enableWhitelist

AllowOverride fallbackDomain

AllowOverride ignoreGroups

AllowOverride localStore

AllowOverride makeCorpus

AllowOverride optIn

AllowOverride optOut

AllowOverride optOutClamAV

AllowOverride processorBias

AllowOverride showFactors

AllowOverride signatureLocation

AllowOverride spamAction

AllowOverride spamSubject

AllowOverride statisticalSedation

AllowOverride storeFragments

AllowOverride trainPristine

AllowOverride trainingMode

AllowOverride whitelistThreshold

# --- MySQL ---

#

# Storage driver settings: Specific to a particular storage driver. Uncomment

# the configuration specific to your installation, if applicable.

#

MySQLServer          /var/run/mysqld/mysqld.sock

MySQLPort

MySQLUser            dspam

MySQLPass          XXXXXXXXXXXXXXXXXXX

MySQLDb              dspam

MySQLCompress      true

# If you are using replication for clustering, you can also specify a separate

# server to perform all writes to.

#

#MySQLWriteServer   /var/run/mysqld/mysqld.sock

#MySQLWritePort      

#MySQLWriteUser      dspam

#MySQLWritePass      changeme

#MySQLWriteDb      dspam_write

#MySQLCompress      true

# If your replication isn't close to real-time, your retraining might fail if 

# the  signature isn't found. One workaround for this is to use the write

# database for all signature reads:

#

#MySQLReadSignaturesFromWriteDb   on

# Use this if you have the 4.1 quote bug (see doc/mysql.txt)

#MySQLSupressQuote   on

# If you're running DSPAM in client/server (daemon) mode, uncomment the

# setting below to override the default connection cache size (the number

# of connections the server pools between all clients). The connection cache

# represents the maximum number of database connections *available* and should

# be set based on the maximum number of concurrent connections you're likely

# to have. Each connection may be used by only one thread at a time, so all

# other threads _will block_ until another connection becomes available.

#

#MySQLConnectionCache   10

# If you're using vpopmail or some other type of virtual setup and wish to

# change the table dspam uses to perform username/uid lookups, you can over-

# ride it below

#MySQLVirtualTable          dspam_virtual_uids

#MySQLVirtualUIDField       uid

#MySQLVirtualUsernameField  username

# UIDInSignature: MySQL supports the insertion of the user id into the DSPAM 

# signature. This allows you to create one single spam or fp alias 

# (pointing to some arbitrary user), and the uid in the signature will

# switch to the correct user. Result: you need only one spam alias 

MySQLUIDInSignature   on

# --- PostgreSQL ---

#PgSQLServer       127.0.0.1

#PgSQLPort         5432

#PgSQLUser         dspam

#PgSQLPass         changeme

#PgSQLDb           dspam

# If you're running DSPAM in client/server (daemon) mode, uncomment the

# setting below to override the default connection cache size (the number

# of connections the server pools between all clients).

#

#PgSQLConnectionCache   3

# UIDInSignature: PgSQL supports the insertion of the user id into the DSPAM 

# signature. This allows you to create one single spam or fp alias 

# (pointing to some arbitrary user), and the uid in the signature will

# switch to the correct user. Result: you need only one spam alias

#PgSQLUIDInSignature   on 

# If you're using vpopmail or some other type of virtual setup and wish to

# change the table dspam uses to perform username/uid lookups, you can over-

# ride it below

#PgSQLVirtualTable          dspam_virtual_uids

#PgSQLVirtualUIDField       uid

#PgSQLVirtualUsernameField  username

# --- Oracle ---

#OraServer       "(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=127.0.0.1)(PORT=1521))(CONNECT_DATA=(SID=PROD)))"

#OraUser         dspam

#OraPass         changeme

#OraSchema       dspam

# --- SQLite ---

#SQLitePragma   "synchronous = OFF"

# --- Hash ---

# HashRecMax: Default number of records to create in the initial segment when

# building hash files. 100,000 yields files 1.6MB in size, but can fill up

# fast, so be sure to increase this (to a million or more) if you're not using

# autoextend.

#

# Primes List:

#  53, 97, 193, 389, 769, 1543, 3079, 6151, 12289, 24593, 49157, 98317, 196613,

#  393241, 786433, 1572869, 3145739, 6291469, 12582917, 25165843, 50331653, 

#  100663319, 201326611, 402653189, 805306457, 1610612741, 3221225473, 

#  4294967291

#

HashRecMax      98317

# HashAutoExtend: Autoextend hash databases when they fill up. This allows

# them to continue to train by adding extents (extensions) to the file. There 

# will be a small delay during the growth process, as everything needs to be 

# closed and remapped. 

#

HashAutoExtend      on  

# HashMaxExtents: The maximum number of extents that may be created in a single

# hash file. Set this to zero for unlimited

#

HashMaxExtents      0

# HashExtentSize: The record size for newly created extents. Creating this too

# small could result in many extents being created. Creating this too large

# could result in excessive disk space usage.

#

HashExtentSize      49157

# HashMaxSeek: The maximum number of records to seek to insert a new record

# before failing or adding a new extent. Setting this too high will exhaustively

# scan each segment and kill performance. Typically, a low value is acceptable

# as even older extents will continue to fill over time.

#

HashMaxSeek      100

# HashConcurrentUser: If you are using a single, stateful hash database in

# daemon mode, specifying a concurrent user will cause the user to be 

# permanently mapped into memory and shared via rwlocks.

#

#HashConcurrentUser   user

# HashConnectionCache: If running in daemon mode, this is the max # of

# concurrent connections that will be supported. NOTE: If you are using

# HashConcurrentUser, this option is ignored, as all connections are read-

# write locked instead of mutex locked.

HashConnectionCache   10

# LDAP: Perform various LDAP functions depending on LDAPMode variable.

# Presently, the only mode supported is 'verify', which will verify the existence

# of an unknown user in LDAP prior to creating them as a new user in the system.

# This is useful on some systems acting as gateway machines.

#

#LDAPMode   verify

#LDAPHost   ldaphost.mydomain.com

#LDAPFilter   "(mail=%u)"

#LDAPBase   ou=people,dc=domain,dc=com

# Optionally, you can specify storage profiles, and specify the server to

# use on the commandline with --profile. For example:

#

Profile Nautilus

MySQLServer.Nautilus      /var/run/mysqld/mysqld.sock

MySQLPort.Nautilus      3306

MySQLUser.Nautilus      dspam

MySQLPass.Nautilus      XXXXXXXXXXXXXXXXXXX

MySQLDb.Nautilus      dspam

MySQLCompress.Nautilus      true

MySQLUIDInSignature.Nautilus   on

#

#Profile DECAlpha

#MySQLServer.DECAlpha   10.0.0.1

#MySQLPort.DECAlpha     3306

#MySQLUser.DECAlpha     dspam

#MySQLPass.DECAlpha     changeme

#MySQLDb.DECAlpha       dspam

#MySQLCompress.DECAlpha true

#

#Profile Sun420R

#MySQLServer.Sun420R    10.0.0.2

#MySQLPort.Sun420R      3306

#MySQLUser.Sun420R      dspam

#MySQLPass.Sun420R      changeme

#MySQLDb.Sun420R        dspam

#MySQLCompress.Sun420R  false

#

DefaultProfile Nautilus

#

# If you're using storage profiles, you can set failovers for each profile.

# Of course, if you'll be failing over to another database, that database

# must have the same information as the first. If you're using a global

# database with no training, this should be relatively simple. If you're

# configuring per-user data, however, you'll need to set up some type of

# replication between databases.

#

#Failover.DECAlpha      SUN420R

#Failover.Sun420R       DECAlpha

# If the storage fails, the agent will follow each profile's failover up to

# a maximum number of failover attempts. This should be set to a maximum of

# the number of profiles you have, otherwise the agent could loop and try

# the same profile multiple times (unless this is your desired behavior).

#

#FailoverAttempts       1

#

# Ignored headers: If DSPAM is behind other tools which may add a header to

# incoming emails, it may be beneficial to ignore these headers - especially

# if they are coming from another spam filter. If you are _not_ using one of

# these tools, however, leaving the appropriate headers commented out will

# allow DSPAM to use them as telltale signs of forged email.

#

IgnoreHeader X--MailScanner-SpamCheck

IgnoreHeader X-Admission-MailScanner-SpamCheck

IgnoreHeader X-Admission-MailScanner-SpamScore

IgnoreHeader X-Amavis-Alert

IgnoreHeader X-Antispam

IgnoreHeader X-AntiVirus

IgnoreHeader X-Antivirus-Scanner

IgnoreHeader X-Antivirus-Status

IgnoreHeader X-Assp-Spam-Prob

IgnoreHeader X-AV-Scanned

IgnoreHeader X-AVAS-Spam-Level

IgnoreHeader X-AVAS-Spam-Score

IgnoreHeader X-AVAS-Spam-Status

IgnoreHeader X-AVAS-Spam-Symbols

IgnoreHeader X-AVAS-Virus-Status

IgnoreHeader X-AVK-Virus-Check

IgnoreHeader X-Barracuda-Spam-Flag

IgnoreHeader X-Barracuda-Spam-Report

IgnoreHeader X-Barracuda-Spam-Score

IgnoreHeader X-Barracuda-Spam-Status

IgnoreHeader X-BTI-AntiSpam

IgnoreHeader X-Bogosity

IgnoreHeader X-ClamAntiVirus-Scanner

IgnoreHeader X-CRM114-Status

IgnoreHeader X-Despammed-Tracer

IgnoreHeader X-ELTE-SpamCheck

IgnoreHeader X-ELTE-SpamCheck-Details

IgnoreHeader X-ELTE-SpamScore

IgnoreHeader X-ELTE-SpamVersion

IgnoreHeader X-ELTE-VirusStatus

IgnoreHeader X-GMX-Antispam

IgnoreHeader X-GMX-Antivirus

IgnoreHeader X-Greylist

IgnoreHeader X-GWSPAM

IgnoreHeader X-HTMLM

IgnoreHeader X-HTMLM-Info

IgnoreHeader X-HTMLM-Score

IgnoreHeader X-iHateSpam-Checked

IgnoreHeader X-iHateSpam-Quarantined

IgnoreHeader X-IMAIL-SPAM-STATISTICS

IgnoreHeader X-IMAIL-SPAM-URL-DBL

IgnoreHeader X-IMAIL-SPAM-VALFROM

IgnoreHeader X-IMAIL-SPAM-VALHELO

IgnoreHeader X-IMAIL-SPAM-VALREVDNS

IgnoreHeader X-IronPort-Anti-Spam-Filtered

IgnoreHeader X-IronPort-Anti-Spam-Result

IgnoreHeader X-Kaspersky-Antivirus

IgnoreHeader X-KSV-Antispam

IgnoreHeader X-Mailer

IgnoreHeader X-MailScanner

IgnoreHeader X-MailScanner-Information

IgnoreHeader X-MailScanner-SpamCheck

IgnoreHeader X-MDaemon-Deliver-To

IgnoreHeader X-MDAV-Processed

IgnoreHeader X-MDRemoteIP

IgnoreHeader X-MIE-MailScanner-SpamCheck

IgnoreHeader X-MIMEOLE

IgnoreHeader X-Mlf-Spam-Status

IgnoreHeader X-MSMail-Priority

IgnoreHeader X-NAI-Spam-Checker-Version

IgnoreHeader X-NAI-Spam-Flag

IgnoreHeader X-NAI-Spam-Level

IgnoreHeader X-NAI-Spam-Route

IgnoreHeader X-NAI-Spam-Rules

IgnoreHeader X-NAI-Spam-Score

IgnoreHeader X-NAI-Spam-Threshold

IgnoreHeader X-NetcoreISpam1-ECMScanner

IgnoreHeader X-NetcoreISpam1-ECMScanner-From

IgnoreHeader X-NetcoreISpam1-ECMScanner-Information

IgnoreHeader X-NetcoreISpam1-ECMScanner-SpamCheck

IgnoreHeader X-NetcoreISpam1-ECMScanner-SpamScore

IgnoreHeader X-NEWT-spamscore

IgnoreHeader X-No-Spam

IgnoreHeader X-Olypen-Virus

IgnoreHeader X-OWM-SpamCheck

IgnoreHeader X-OWM-VirusCheck

IgnoreHeader X-PAA-AntiVirus

IgnoreHeader X-PAA-AntiVirus-Message

IgnoreHeader X-PIRONET-NDH-MailScanner-SpamCheck

IgnoreHeader X-PIRONET-NDH-MailScanner-SpamScore

IgnoreHeader X-PN-SPAMFiltered

IgnoreHeader X-Priority

IgnoreHeader X-Proofpoint-Spam-Details

IgnoreHeader X-purgate

IgnoreHeader X-purgate-Ad

IgnoreHeader X-purgate-ID

IgnoreHeader X-RAV-AntiVirus

IgnoreHeader X-Rc-Spam

IgnoreHeader X-Rc-Virus

IgnoreHeader X-RedHat-Spam-Score

IgnoreHeader X-RedHat-Spam-Warning

IgnoreHeader X-RegEx

IgnoreHeader X-RegEx-Score

IgnoreHeader X-RITmySpam

IgnoreHeader X-RITmySpam-IP

IgnoreHeader X-RITmySpam-Spam

IgnoreHeader X-Rocket-Spam

IgnoreHeader X-SA-GROUP

IgnoreHeader X-SA-RECEIPTSTATUS

IgnoreHeader X-Sohu-Antivirus

IgnoreHeader X-Spam

IgnoreHeader X-Spam-Check

IgnoreHeader X-Spam-Checked-By

IgnoreHeader X-Spam-Checker

IgnoreHeader X-Spam-Checker-Version

IgnoreHeader X-Spam-DCC

IgnoreHeader X-Spam-Details

IgnoreHeader X-Spam-detection-level

IgnoreHeader X-Spam-Filter

IgnoreHeader X-Spam-Filtered

IgnoreHeader X-Spam-Flag

IgnoreHeader X-Spam-Level

IgnoreHeader X-Spam-OrigSender

IgnoreHeader X-Spam-Pct

IgnoreHeader X-Spam-Prev-Subject

IgnoreHeader X-Spam-Processed

IgnoreHeader X-Spam-Pyzor

IgnoreHeader X-Spam-Rating

IgnoreHeader X-Spam-Report

IgnoreHeader X-Spam-Scanned

IgnoreHeader X-Spam-Score

IgnoreHeader X-Spam-Status

IgnoreHeader X-Spam-Tagged

IgnoreHeader X-Spam-Tests

IgnoreHeader X-Spam-Tests-Failed

IgnoreHeader X-Spam-Virus

IgnoreHeader X-Spamadvice

IgnoreHeader X-Spamarrest-noauth

IgnoreHeader X-Spamarrest-speedcode

IgnoreHeader X-SpamBouncer

IgnoreHeader X-Spambayes-Classification

IgnoreHeader X-SpamCatcher-Score

IgnoreHeader X-SpamCop-Checked

IgnoreHeader X-SpamCop-Disposition

IgnoreHeader X-SpamCop-Whitelisted

IgnoreHeader X-Spamcount

IgnoreHeader X-SpamDetected

IgnoreHeader X-SpamInfo

IgnoreHeader X-SpamPal

IgnoreHeader X-SpamPal-Timeout

IgnoreHeader X-SpamReason

IgnoreHeader X-SpamScore

IgnoreHeader X-Spamsensitivity

IgnoreHeader X-SpamTest-Categories

IgnoreHeader X-SpamTest-Info

IgnoreHeader X-SpamTest-Method

IgnoreHeader X-SpamTest-Status

IgnoreHeader X-SpamTest-Version

IgnoreHeader X-STA-NotSpam

IgnoreHeader X-STA-Spam

IgnoreHeader X-TERRACE-SPAMMARK

IgnoreHeader X-TERRACE-SPAMRATE

IgnoreHeader X-to-viruscore

IgnoreHeader X-Text-Classification

IgnoreHeader X-Text-Classification-Data

IgnoreHeader X-UCD-Spam-Score

IgnoreHeader x-uscspam

IgnoreHeader X-Virus-Check

IgnoreHeader X-Virus-Checked

IgnoreHeader X-Virus-Checker-Version

IgnoreHeader X-Virus-Scan

IgnoreHeader X-Virus-Scanned

IgnoreHeader X-Virus-Scanner

IgnoreHeader X-Virus-Scanner-Result

IgnoreHeader X-Virus-Status

IgnoreHeader X-VirusChecked

IgnoreHeader X-Virusscan

IgnoreHeader X-WinProxy-AntiVirus

IgnoreHeader X-WinProxy-AntiVirus-Message

#

# Lookup: Perform lookups on streamlined blackhole list servers (see

# http://www.nuclearelephant.com/projects/sbl/). The streamlined blacklist

# server is machine-automated, unsupervised blacklisting system designed to

# provide real-time and highly accurate blacklisting based on network spread.

# When performing a lookup, DSPAM will automatically learn the inbound message 

# as spam if the source IP is listed. Until an official public RABL server is 

# available, this feature is only useful if you are running your own 

# streamlined blackhole list server for internal reporting among multiple mail 

# servers. Provide the name of the lookup zone below to use.

#

# This function performs standard reverse-octet.domain lookups, and while it

# will function with many RBLs, it's strongly discouraged to use those

# maintained by humans as they're often inaccurate and could hurt filter

# learning and accuracy.

#

Lookup   "sbl-xbl.spamhaus.org"

#

# RBLInoculate: If you want to inoculate the user from RBL'd messages it would

# have otherwise missed, set this to on.

#

RBLInoculate on

#

# Notifications: Enable the sending of notification emails to users (first

# message, quarantine full, etc.)

#

Notifications   on

#

# Purge configuration: Set dspam_clean purge default options, if not otherwise

# specified on the commandline

#

#PurgeSignatures 14          # Stale signatures

#PurgeNeutral    90          # Tokens with neutralish probabilities

#PurgeUnused     90          # Unused tokens

#PurgeHapaxes    30          # Tokens with less than 5 hits (hapaxes)

#PurgeHits1S   15          # Tokens with only 1 spam hit

#PurgeHits1I   15          # Tokens with only 1 innocent hit

#

# Purge configuration for SQL-based installations using purge.sql

#

PurgeSignature   off # Specified in purge.sql

PurgeNeutral   90

PurgeUnused   off # Specified in purge.sql

PurgeHapaxes   off # Specified in purge.sql

PurgeHits1S   off # Specified in purge.sql

PurgeHits1I   off # Specified in purge.sql

#

# Local Mail Exchangers: Used for source address tracking, tells DSPAM which

# mail exchangers are local and therefore should be ignored in the Received:

# header when tracking the source of an email. Note: you should use the address

# of the host as appears between brackets [ ] in the Received header.

#

LocalMX 127.0.0.1

#

# Logging: Disabling logging for users will make usage graphs unavailable to

# them. Disabling system logging will make admin graphs unavailable.

#

SystemLog on

UserLog   on

#

# TrainPristine: for systems where the original message remains server side 

# and can therefore be presented in pristine format for retraining. This option

# will cause DSPAM to cease all writing of signatures and DSPAM headers to the 

# message, and deliver the message in as pristine format as possible. This mode

# REQUIRES that the original message in its pristine format (as of delivery) 

# be presented for retraining, as in the case of webmail, imap, or other 

# applications where the message is actually kept server-side during reading, 

# and is preserved. DO NOT use this switch unless the original message can be 

# presented for retraining with the ORIGINAL HEADERS and NO MODIFICATIONS.

#

#TrainPristine on

#

# Opt: in or out; determines DSPAM's default filtering behavior. If this value

# is set to in, users must opt-in to filtering by dropping a .dspam file in

# /var/dspam/opt-in/user.dspam (or if you have homedirs configured, a .dspam

# folder in their home directory).  The default is opt-out, which means all 

# users will be filtered unless a .nodspam file is dropped in 

# /var/dspam/opt-out/user.nodspam

#

Opt in

#

# TrackSources: specify which (if any) source addresses to track and report

# them to syslog (mail.info). This is useful if you're running a firewall or

# blacklist and would like to use this information. Spam reporting also drops

# RABL blacklist files (see http://www.nuclearelephant.com/projects/rabl/). 

#

TrackSources spam nonspam

#

# ParseToHeaders: In lieu of setting up individual aliases for each user,

# DSPAM can be configured to automatically parse the To: address for spam and

# false positive forwards. From there, it can be configured to either set the

# DSPAM user based on the username specified in the header and/or change the

# training class and source accordingly. The options below can be used to 

# customize most common types of header parsing behavior to avoid the need for

# multiple aliases, or if using LMTP, aliases entirely..

#

# ParseToHeader: Parse the To: headers of an incoming message. This must be

#                set to 'on' to use either of the following features.

# 

# ChangeModeOnParse: Automatically change the class (to spam or innocent)

#   depending on whether spam- or notspam- was specified, and change the source

#   to 'error'. This is convenient if you're not using aliases at all, but

#   are delivering via LMTP.

#

# ChangeUserOnParse: Automatically change the username to match that specified

#   in the To: header. For example, spam-bob@domain.tld will set the username

#   to bob, ignoring any --user passed in. This may not always be desirable if

#   you are using virtual email addresses as usernames. Options:

#     on or user   take the portion before the @ sign only

#     full      take everything after the initial {spam,notspam}-.

#

ParseToHeaders on

ChangeModeOnParse on

ChangeUserOnParse off

#

# Broken MTA Options: Some MTAs don't support the proper functionality

# necessary. In these cases you can activate certain features in DSPAM to

# compensate. 'returnCodes' causes DSPAM to return an exit code of 99 if

# the message is spam, 0 if not, or a negative code if an error has occured.

# Specifying 'case' causes DSPAM to force the input usernames to lowercase.

# Spceifying 'lineStripping' causes DSPAM to strip ^M's from messages passed

# in.

#

#Broken returnCodes

Broken case

Broken lineStripping

#

# MaxMessageSize: You may specify a maximum message size for DSPAM to process.

# If the message is larger than the maximum size, it will be delivered 

# without processing. Value is in bytes.

#

MaxMessageSize 20971520

#

# Virus Checking: If you are running clamd, DSPAM can perform stream-based

# virus checking using TCP. Uncomment the values below to enable virus

# checking. 

#

# ClamAVResponse: reject (reject or drop the message with a permanent failure)

#                 accept (accept the message and quietly drop the message)

#                 spam   (treat as spam and quarantine/tag/whatever)

#

#ClamAVPort   3310

#ClamAVHost   127.0.0.1

#ClamAVResponse   accept

#

# Daemonized Server: If you are running DSPAM as a daemonized server using

# --daemon, the following parameters will override the default. Use the

# ServerPass option to set up accounts for each client machine. The DSPAM

# server will process and deliver the message based on the parameters 

# specified. If you want the client machine to perform delivery, use

# the --stdout option in conjunction with a local setup. 

#

#ServerPort      24

ServerQueueSize      32

ServerPID      /var/run/dspam/dspam.pid

#

# ServerMode specifies the type of LMTP server to start. This can be one of:

#     dspam: DSPAM-proprietary DLMTP server, for communicating with dspamc

#  standard: Standard LMTP server, for communicating with Postfix or other MTA

#      auto: Speak both DLMTP and LMTP; auto-detect by ServerPass.IDENT

#

ServerMode auto

# If supporting DLMTP (dspam) mode, dspam clients will require authentication 

# as they will be passing in parameters. The idents below will be used to

# determine which clients will be speaking DLMTP, so if you will be using

# both LMTP and DLMTP from the same host, be sure to use something other

# than the server's hostname below (which will be sent by the MTA during a 

# standard LMTP LHLO).

# 

#ServerPass.Relay1   "secret"

#ServerPass.Relay2   "password"

#

ServerPass.Nautilus   "XXXXXXXXXXXXXXXXXXX"

# If supporting standard LMTP mode, server parameters will need to be specified

# here, as they will not be passed in by the mail server. The ServerIdent

# specifies the 250 response code ident sent back to connecting clients and

# should be set to the hostname of your server, or an alias.

#

# NOTE: If you specify --user in ServerParameters, the RCPT TO will be

#       used only for delivery, and not set as the active user for processing.

#

ServerParameters   "--deliver=innocent,spam -d %u"

ServerIdent      "XXXXXXXXXXXXXXXXXX"

# If you wish to use a local domain socket instead of a TCP socket, uncomment

# the following. It is strongly recommended you use local domain sockets if

# you are running the client and server on the same machine, as it eliminates

# much of the bandwidth overhead.

#

ServerDomainSocketPath  "/var/run/dspam/dspam.sock"

#

# Client Mode: If you are running DSPAM in client/server mode, uncomment and

# set these variables. A ClientHost beginning with a / will be treated as

# a domain socket.

#

#ClientHost   /tmp/dspam.sock

#ClientIdent   "secret@Relay1"

#

#ClientHost   127.0.0.1

#ClientPort   24

#ClientIdent   "secret@Relay1"

# RABLQueue: Touch files in the RABL queue

# If you are a reporting streamlined blackhole list participant, you can

# touch ip addresses within the directory the rabl_client process is watching.

#

#RABLQueue   /var/spool/rabl

ClientHost   /var/run/dspam/dspam.sock

ClientIdent   "XXXXXXXXXXXXXXXXXXXXXXXXXXX@Nautilus"

# DataSource: If you are using any type of data source that does not include

# email-like headers (such as documents), uncomment the line below. This

# will cause the entire input to be treated like a message "body"

#

#DataSource      document

# ProcessorWordFrequency: By default, words are only counted once per message.

# If you are classifying large documents, however, you may wish to count once

# per occurrence instead.

#

#ProcessorWordFrequency  occurrence

# ProcessorBias: Bias causes the filter to lean more toward 'innocent', and

# usually greatly reduces false positives. It is the default behavior of

# most Bayesian filters (including dspam). 

#

# NOTE: You probably DONT want this if you're using Markovian Weighting, unless

# you are paranoid about false positives.

#

ProcessorBias on

## EOF
```

 *petrjanda wrote:*   

> More info would be nice! 

 

Okay... On my dspam.conf you see, that I enabled "RBLInoculate" and I enabled "Lookup". This is one way to keep up the global user with new spam data. But you need to point your spam traps to be delivered to your "global" user.

Another way of doing it, could be to set up some spam traps and in your MTA (I use Postfix) deliver any mail captured in that spam trap to be automaticly learned by your global user. I would not blindly push everything into the global user (this is bad! Except if you feed ham mails as well to that global user, else you will end up with to much spam tokens and this does have a negative influence on the accurancy). Keep in mind, that it would be the best to have 50% ham and 50% spam in your token data. It is okay to have 2/3 ham and 1/3 spam. A good document describing how to implement that in Postfix can be found here.

Another way is to use dspam_merge. What you could do (from time to time and only if you trust the source data/user):

```
dspam_merge user1 user2 ... userN -o global
```

Another way is to use inoculation groups. Read the README to see what that exactly is and how it works.

 *petrjanda wrote:*   

> Up until now I was using classification group. Im gonna use merged group now.

 Swich now to merged groups! Classification group is not that what you are looking for. The problem with classification groups is, that a mail could be tagged as "spam" but the header still say that is "ham" and vice versa. Merged group is way way better.

 *petrjanda wrote:*   

> So do you have an idea why user1 still couldnt filter it? I remember looking at dspam.debug, it basically started off ok as if it was going to train for User1, but then global classification group was added, it checked global's tokens, and incorrectly classified the message, never returned to train User1.

 This is a problem with classification groups. Use merged groups. This has much better result and does exactly what you want.

I would set your "global" user to TEFT mode while learning or building up the corpus data, but after that you need to set it to TOE. If you don't set it to TOE, then the purge script will delete tokens. And you don't want that on the "global" user to happen.

If you want to keep the data small, then switch everyone to TOE. This is sligthly less accurate then TEFT, but this helps to keep the data in your storage to a minimum level.

If you want a faster and more flexible way of purging your old tokens from DSPAM, then use this dspam.cron script (you should have one already in /etc/cron.daily/dspam.cron):

```
#!/bin/bash

# Copyright 1999-2005 Gentoo Foundation

# Distributed under the terms of the GNU General Public License v2

#

# Remove old signatures and unimportant tokens from the DSPAM database

#

#

# Function to run dspam_clean

#

run_dspam_clean() {

        if [[ ! -f "/usr/bin/dspam_clean" ]]

        then

                echo "/usr/bin/dspam_clean not found!"

                return 1

        else

                /usr/bin/dspam_clean -s -p -u >/dev/null 2>&1

                return 0

        fi

}

#

# Function to check if we have all needed tools

#

check_for_tools() {

        local myrc=0

        for foo in awk head tail cut sed

        do

                DSPAM_Check_App="$(${foo} --version 2>&1)"

                if [[ "${DSPAM_Check_App/ *}" == "bash:" ]]

                then

                        echo "Command ${foo} not found!"

                        myrc=1

                fi

        done

        return ${myrc}

}

#

# Check for needed tools

#

check_for_tools

if [[ "$?" -ne "0" ]]

then

        # We have not all needed tools installed. Run just the dspam_clean part.

        run_dspam_clean

        exit $?

fi

#

# Try to get DSPAM home directory

#

DSPAM_HOMEDIR="$(grep ^dspam /etc/passwd|awk -F : '{print $6}')"

if [ ! -f ${DSPAM_HOMEDIR}/*.data ]

then

        # Something is wrong in passwd! Check if /etc/mail/dspam exists instead.

        if [ -f /etc/mail/dspam/*.data ]

        then

                DSPAM_HOMEDIR="/etc/mail/dspam"

        fi

fi

if [[ -f "${DSPAM_HOMEDIR}/mysql.data" ]]

then

        if [[ ! -f "/usr/bin/mysql_config" ]]

        then

                echo "Can not run MySQL purge script:"

                echo "  /usr/bin/mysql_config does not exist"

                run_dspam_clean

                exit 1

        fi

        DSPAM_MySQL_PURGE_SQL=""

        DSPAM_MySQL_VER="$(/usr/bin/mysql_config --version | sed "s:\([^0-9\.]*\)::g")"

        DSPAM_MySQL_MAJOR="$(echo "${DSPAM_MySQL_VER}" | cut -d. -f1)"

        DSPAM_MySQL_MINOR="$(echo "${DSPAM_MySQL_VER}" | cut -d. -f2)"

        DSPAM_MySQL_MICRO="$(echo "${DSPAM_MySQL_VER}" | cut -d. -f3)"

        DSPAM_MySQL_INT="$((DSPAM_MySQL_MAJOR * 65536 + DSPAM_MySQL_MINOR * 256 + DSPAM_MySQL_MICRO))"

        # For MySQL >= 4.1 use the new purge script

        if [[ "${DSPAM_MySQL_INT}" -ge "262400" ]]

        then

                if [[ -f "${DSPAM_HOMEDIR}/config/mysql_purge-4.1-optimized.sql" || -f "${DSPAM_HOMEDIR}/mysql_purge-4.1-optimized.sql" ]]

                then

                        [[ -f "${DSPAM_HOMEDIR}/config/mysql_purge-4.1-optimized.sql" ]] && DSPAM_MySQL_PURGE_SQL="${DSPAM_HOMEDIR}/config/mysql_purge-4.1-optimized.sql"

                        [[ -f "${DSPAM_HOMEDIR}/mysql_purge-4.1-optimized.sql" ]] && DSPAM_MySQL_PURGE_SQL="${DSPAM_HOMEDIR}/mysql_purge-4.1-optimized.sql"

                else

                        [[ -f "${DSPAM_HOMEDIR}/config/mysql_purge-4.1.sql" ]] && DSPAM_MySQL_PURGE_SQL="${DSPAM_HOMEDIR}/config/mysql_purge-4.1.sql"

                        [[ -f "${DSPAM_HOMEDIR}/mysql_purge-4.1.sql" ]] && DSPAM_MySQL_PURGE_SQL="${DSPAM_HOMEDIR}/mysql_purge-4.1.sql"

                fi

        else

                [[ -f "${DSPAM_HOMEDIR}/config/mysql_purge.sql" ]] && DSPAM_MySQL_PURGE_SQL="${DSPAM_HOMEDIR}/config/mysql_purge.sql"

                [[ -f "${DSPAM_HOMEDIR}/mysql_purge.sql" ]] && DSPAM_MySQL_PURGE_SQL="${DSPAM_HOMEDIR}/mysql_purge.sql"

        fi

        if [[ "${DSPAM_MySQL_PURGE_SQL}" == "" ]]

        then

                echo "Can not run MySQL purge script:"

                echo "  No mysql_purge SQL script found"

                run_dspam_clean

                exit 1

        fi

        if [[ ! -f "/usr/bin/mysql" ]]

        then

                echo "Can not run MySQL purge script:"

                echo "  /usr/bin/mysql does not exist"

                run_dspam_clean

                exit 1

        fi

        # Get DSPAM MySQL username and password

        DSPAM_MySQL_HOST="$(cat ${DSPAM_HOMEDIR}/mysql.data|head -n 1|tail -n 1)"

        DSPAM_MySQL_PORT="$(cat ${DSPAM_HOMEDIR}/mysql.data|head -n 2|tail -n 1)"

        DSPAM_MySQL_USER="$(cat ${DSPAM_HOMEDIR}/mysql.data|head -n 3|tail -n 1)"

        DSPAM_MySQL_PWD="$(cat ${DSPAM_HOMEDIR}/mysql.data|head -n 4|tail -n 1)"

        DSPAM_MySQL_DB="$(cat ${DSPAM_HOMEDIR}/mysql.data|head -n 5|tail -n 1)"

        # Run the MySQL purge script

        (/usr/bin/mysql --user="${DSPAM_MySQL_USER}" --password="${DSPAM_MySQL_PWD}" ${DSPAM_MySQL_DB} < ${DSPAM_MySQL_PURGE_SQL}) 1>/dev/null 2>&1

        # Run the dspam_clean command

        run_dspam_clean

        # Optimize the MySQL tables for DSPAM

        for foo in $(/usr/bin/mysql --user="${DSPAM_MySQL_USER}" --password="${DSPAM_MySQL_PWD}" --silent --skip-column-names --batch ${DSPAM_MySQL_DB} -e 'SHOW TABLES;' 2>&1)

        do

                (/usr/bin/mysql --user="${DSPAM_MySQL_USER}" --password="${DSPAM_MySQL_PWD}" ${DSPAM_MySQL_DB} -e "OPTIMIZE TABLE ${foo};") 1>/dev/null 2>&1

        done

        exit 0

elif [[ -f "${DSPAM_HOMEDIR}/pgsql.data" ]]

then

        DSPAM_PgSQL_PURGE_SQL=""

        [[ -f "${DSPAM_HOMEDIR}/config/pgsql_purge.sql" ]] && DSPAM_PgSQL_PURGE_SQL="${DSPAM_HOMEDIR}/config/pgsql_purge.sql"

        [[ -f "${DSPAM_HOMEDIR}/pgsql_purge.sql" ]] && DSPAM_PgSQL_PURGE_SQL="${DSPAM_HOMEDIR}/pgsql_purge.sql"

        if [[ "${DSPAM_PgSQL_PURGE_SQL}" == "" ]]

        then

                echo "Can not run PostgreSQL purge script:"

                echo "  No pgsql_purge SQL script found"

                run_dspam_clean

                exit 1

        fi

        if [[ ! -f "/usr/bin/psql" ]]

        then

                echo "Can not run PostgreSQL purge script:"

                echo "  /usr/bin/psql does not exist"

                run_dspam_clean

                exit 1

        fi

        # Get DSPAM PostgreSQL username and password

        DSPAM_PgSQL_HOST="$(cat ${DSPAM_HOMEDIR}/pgsql.data|head -n 1|tail -n 1)"

        DSPAM_PgSQL_PORT="$(cat ${DSPAM_HOMEDIR}/pgsql.data|head -n 2|tail -n 1)"

        DSPAM_PgSQL_USER="$(cat ${DSPAM_HOMEDIR}/pgsql.data|head -n 3|tail -n 1)"

        DSPAM_PgSQL_PWD="$(cat ${DSPAM_HOMEDIR}/pgsql.data|head -n 4|tail -n 1)"

        DSPAM_PgSQL_DB="$(cat ${DSPAM_HOMEDIR}/pgsql.data|head -n 5|tail -n 1)"

        # Run the PostgreSQL purge script

        (PGUSER=${DSPAM_PgSQL_USER} PGPASSWORD=${DSPAM_PgSQL_PWD} /usr/bin/psql -U ${DSPAM_PgSQL_USER} -d ${DSPAM_PgSQL_DB} -p ${DSPAM_PgSQL_PORT} -h ${DSPAM_PgSQL_HOST} -f ${DSPAM_PgSQL_PURGE_SQL}) 1>/dev/null 2>&1

        # Run the dspam_clean command

        run_dspam_clean

        exit 0

elif [[ -f "${DSPAM_HOMEDIR}/oracle.data" ]]

then

        DSPAM_Oracle_PURGE_SQL=""

        [[ -f "${DSPAM_HOMEDIR}/config/ora_purge.sql" ]] && DSPAM_Oracle_PURGE_SQL="${DSPAM_HOMEDIR}/config/ora_purge.sql"

        [[ -f "${DSPAM_HOMEDIR}/ora_purge.sql" ]] && DSPAM_Oracle_PURGE_SQL="${DSPAM_HOMEDIR}/ora_purge.sql"

        if [[ "${DSPAM_Oracle_PURGE_SQL}" == "" ]]

        then

                echo "Can not run Oracle purge script:"

                echo "  No ora_purge SQL script found"

                run_dspam_clean

                exit 1

        fi

        if [[ ! -f "/usr/bin/sqlplus" ]]

        then

                echo "Can not run PostgreSQL purge script:"

                echo "  /usr/bin/sqlplus does not exist"

                run_dspam_clean

                exit 1

        fi

        # Get DSPAM PostgreSQL username and password

        DSPAM_Oracle_DBLINK="$(cat ${DSPAM_HOMEDIR}/oracle.data|head -n 1|tail -n 1)"

        DSPAM_Oracle_USER="$(cat ${DSPAM_HOMEDIR}/oracle.data|head -n 2|tail -n 1)"

        DSPAM_Oracle_PWD="$(cat ${DSPAM_HOMEDIR}/oracle.data|head -n 3|tail -n 1)"

        DSPAM_Oracle_SCHEMA="$(cat ${DSPAM_HOMEDIR}/oracle.data|head -n 4|tail -n 1)"

        # Run the Oracle purge script

        (/usr/bin/sqlplus -s ${DSPAM_Oracle_USER}/${DSPAM_Oracle_PWD} @${DSPAM_Oracle_PURGE_SQL}) 1>/dev/null 2>&1

        # Run the dspam_clean command

        run_dspam_clean

        exit 0

else

        run_dspam_clean

        exit $?

fi
```

As you see, I use a optimized purge script (mysql_purge-4.1-optimized.sql). This is the content of my optimized script:

```
# $Id: purge-4.1.sql,v 1.5 2005/07/14 13:50:10 jonz Exp $

# => http://www.solidcore.dk/blog/2006/02/optimizing_dspam.html

set @a=to_days(current_date());

START TRANSACTION;

delete from dspam_token_data

  where (innocent_hits*2) + spam_hits < 5

  and from_days(@a-60) > last_hit;

COMMIT;

START TRANSACTION;

delete from dspam_token_data

  where innocent_hits = 1 and spam_hits = 0

  and @a-from_days(last_hit) > 15;

COMMIT;

START TRANSACTION;

delete from dspam_token_data

  where innocent_hits = 0 and spam_hits = 1

  and @a-to_days(last_hit) > 15;

COMMIT;

START TRANSACTION;

delete from dspam_token_data

USING

  dspam_token_data LEFT JOIN dspam_preferences

  ON dspam_token_data.uid = dspam_preferences.uid

  AND dspam_preferences.preference = 'trainingMode'

  AND dspam_preferences.value in('TOE','TUM','NOTRAIN')

WHERE from_days(@a-90) > dspam_token_data.last_hit

AND dspam_preferences.uid IS NULL;

COMMIT;

START TRANSACTION;

delete from dspam_token_data

USING

  dspam_token_data LEFT JOIN dspam_preferences

  ON dspam_token_data.uid = dspam_preferences.uid

  AND dspam_preferences.preference = 'trainingMode'

  AND dspam_preferences.value = 'TUM'

WHERE from_days(@a-90) > dspam_token_data.last_hit

AND innocent_hits + spam_hits < 50

AND dspam_preferences.uid IS NOT NULL;

COMMIT;

START TRANSACTION;

delete from dspam_signature_data

  where from_days(@a-14) > created_on;

COMMIT
```

You can read more about it here.

If you need a script to delete in MySQL a user with all his data, then you could use this script:

```
#!/bin/bash

[[ "$1" == "" ]] && echo "Missing username" && exit 1

_dspam_sysconfdir=$(dspam --version|sed -n "s:^.*\-\-sysconfdir\=\([^ ]*\).*:\1:gIp");

if [[ "${_dspam_sysconfdir}" == "" ]];

then

        echo "Error: Could not get DSPAM system config directory";

        exit 1

elif [[ ! -d "${_dspam_sysconfdir}" ]];

then

        echo "Error: DSPAM system config directory does not exist";

        exit 1

elif [[ ! -f "${_dspam_sysconfdir}/mysql.data" ]];

then

        echo "Error: DSPAM mysql.data file does not exist";

        exit 1

fi

_dspam_mysql_host="$(cat ${_dspam_sysconfdir}/mysql.data|head -n 1|tail -n 1)"

_dspam_mysql_port="$(cat ${_dspam_sysconfdir}/mysql.data|head -n 2|tail -n 1)"

_dspam_mysql_user="$(cat ${_dspam_sysconfdir}/mysql.data|head -n 3|tail -n 1)"

_dspam_mysql_password="$(cat ${_dspam_sysconfdir}/mysql.data|head -n 4|tail -n 1)"

_dspam_mysql_db="$(cat ${_dspam_sysconfdir}/mysql.data|head -n 5|tail -n 1)"

_dspam_user_uid=$(mysql --user="${_dspam_mysql_user}" --password="${_dspam_mysql_password}" --host="localhost" --batch --skip-column-names -e "USE ${_dspam_mysql_db};SELECT uid FROM dspam_virtual_uids WHERE 1 AND username='${1}';")

if [[ "${_dspam_user_uid}" == "" ]];

then

        echo "Error: Can not get UID for user ${1}";

        exit 1

fi

_dspam_delete_uid="USE ${_dspam_mysql_db};$(mysql --user="${_dspam_mysql_user}" --password="${_dspam_mysql_password}" --host="localhost" --batch --skip-column-names -e "USE ${_dspam_mysql_db};SHOW TABLES;" | sed "s:^\(.*\)$:DELETE FROM \1 WHERE uid='${_dspam_user_uid}';:g")";

echo "Executing:"

echo ${_dspam_delete_uid}

mysql --user="${_dspam_mysql_user}" --password="${_dspam_mysql_password}" --host="localhost" --batch --skip-column-names -e "${_dspam_delete_uid};"
```

I have written other scripts (for example for setting trainingMode for all users and other stuff like that). If you need more scripts, then let me know.

If you want, I could upload my current data from my training system (this is the system where I do my current training and where all the config files I posted there are comming from). The dspam_token_data table is not that big (around 1'200'000 tokens) and if I dump the data with mysqldump and compress it with bzip2, then the data is only 13MB.

But I would not suggest you to use that data. It is way better to use your own data, since my data has alot of stuff wich is probably not needed in your environment (I have alot of german mails in this data). And one last advice: DON'T TRUST ANYONE, when it comes to Anti-Spam. Use your own data and own training. That's the best thing to do! DSPAM uses statistical data and if you use data from some one else (for example from me), then you are spoiling your accurancy.

If you need spam corpus data, then look here:Spam ArchiveFoxmail Set A (homepage)

Foxmail Set B (homepage)

Foxmail Set C (homepage)

Foxmail Set D (homepage)

Foxmail Set E (homepage)

Foxmail Set F (homepage)

If you need spam and ham corpus data, then look here:Ling-spamSpamAssassin public mail corpusSynthetic (Annexia/Xpert) CorpusTREC 2005 Public Spam Corpus (this is mostly the same as the Enron Email Dataset, but machine sorted (therefore not that accurate) into spam/ham)Anti-Spam-SMTP-Proxy (ASSP) corpus data

If you need unsorted mails, then look here (this data is unsorted! You need to classify it your self into spam/ham):Enron Email DatasetThe 20 Newsgroups data set

BIG warning about the public corpi: They contain alot of malware (trojans, virus, etc). Clean them with a Anti-Virus software!

 *petrjanda wrote:*   

> Thanks!

 No problem  :Smile: 

cheers

SteveBLast edited by steveb on Sat May 13, 2006 10:55 am; edited 1 time in total

----------

## petrjanda

 *Quote:*   

> 
> 
> Okay... On my dspam.conf you see, that I enabled "RBLInoculate" and I enabled "Lookup". This is one way to keep up the global user with new spam data. But you need to point your spam traps to be delivered to your "global" user.
> 
> 

 

And how do i go about pointing it to the global user? Where is this done?

Thanks so much for all this!!!

----------

## steveb

 *petrjanda wrote:*   

> And how do i go about pointing it to the global user? Where is this done?

 What MTA do you use?

 *petrjanda wrote:*   

> Thanks so much for all this!!!

 No problem

cheers

SteveB

----------

## magic919

Great post Stevee.  I'm sure some of this will help me with DSPAM too.

----------

## steveb

 *magic919 wrote:*   

> Great post Stevee.  I'm sure some of this will help me with DSPAM too.

 Thanks! I am so happy with DSPAM, that I can not understand why so much people still use SpamAssassin? A heuristic approach (as the one in SA) is in no way so good as the statistical approach.

I have read so much about classifying mails into spam/ham, that I know that such a coctail of heristic and handwritten rules as the one in SA will never ever be flexible enought and accurate enught to beat something like DSPAM or CRM114 or another statistical filters.

In my experiance, the Markovian algorithm (used in CRM114 or DSPAM and some other filters) is the absolute fastest in learning and produces very accurate results. The Markovian algorithm in DSPAM (you need to use the hash driver) is good, but does not offer enought flexibility as the other drivers in DSPAM. But I am very sure, that this will change in future releases of DSPAM. When this happens, then I will probably switch everything in DSPAM to use the hash driver. But for now, I use the other algorithms.

cheers

SteveB

----------

## petrjanda

 *steveb wrote:*   

>  *petrjanda wrote:*   And how do i go about pointing it to the global user? Where is this done? What MTA do you use?
> 
>  *petrjanda wrote:*   Thanks so much for all this!!! No problem
> 
> cheers
> ...

 

Postfix.

----------

## steveb

 *petrjanda wrote:*   

> Postfix.

 Do you have Procmail on that server? If so, then you could just pass the message to procmail. Something like this:

```
# relearn missclassified spam messages

:0

* ^X-DSPAM-Result: Innocent

{

   :0

   | /usr/bin/dspam --user $USER --class=spam --source=error

   

   :0

   /dev/null

}

# delete correctly classified spam

:0

* ^X-DSPAM-Result: Spam

{

   :0

   /dev/null

}

# delete everything else

:0

/dev/null
```

Or you could pipe every mail getting to your spam trap, directly to DSPAM:

In /etc/mail/aliases add something linke this:

```
spam_user@domain.tld "| /usr/bin/dspam --user global --class=spam --source=corpus"
```

cheers

SteveB

----------

## petrjanda

 *steveb wrote:*   

>  *petrjanda wrote:*   Postfix. Do you have Procmail on that server? If so, then you could just pass the message to procmail. Something like this:
> 
> ```
> # relearn missclassified spam messages
> 
> ...

 

Once again, Thanks!

----------

## petrjanda

Steve,

Ive actually got another question. What if I wanted some users not to be merged? 

For example:

global:merged:* would merge global's tokens into all 2000 users, however what if i dont want user1 and user2 to be merged, let them have an empty set of tokens?

----------

## steveb

 *petrjanda wrote:*   

> Steve,
> 
> Ive actually got another question. What if I wanted some users not to be merged? 
> 
> For example:
> ...

 Then you have two possibilities:Add all the users (1998 of them) without user1 and user2 (you can use wildcards to make the list smaller):

```
global:merged:user3,user4,user5,user6,user7,.....,a*,b*,c*,d*,e*,userx
```

Use

```
global:merged:*
```

but for user1 and user2 you add the preference to ignore any group membership:

```
dspam_admin add preference user1 ignoreGroups on
```

and

```
dspam_admin add preference user2 ignoreGroups on
```

BTW: Using them in a merged group does not mean, that their tokens will be merged with your "global" user. I mean: it will be merged, but not in the database. It will only be merged at run time. In the database they still have no tokens from your "global" user. Do you understand that? 

Best would be to use the merged group and use TOE as training mode. That way only errors will be trained and this keeps the database very small, while still providing good accurancy.

cheers

SteveB

----------

## petrjanda

 *steveb wrote:*   

>  *petrjanda wrote:*   Steve,
> 
> Ive actually got another question. What if I wanted some users not to be merged? 
> 
> For example:
> ...

 

Yep, I understand that. Thanks!

----------

## steveb

 *petrjanda wrote:*   

> Yep, I understand that. Thanks!

 Any other question about DSPAM? I feel in the mood to answer more stuff  :Wink: 

----------

## steveb

FYI: DSPAM 3.6.6 is out

```
3.6.6 is a maintenance release

MAINT: Phased out deprecated Berkeley DB drivers

MAINT: Phased out legacy tools (dspam_corpus, dspam_genaliases)

BUGFIX: When using logfile, write errors result in segfault

BUGFIX: Compiler warnings with sqlite_drv and sqlite3_drv

BUGFIX: MySQLUIDInSignature causes segfault on retrain

BUGFIX: trainPristine preference "off" does not override default
```

Works so far without any problem over here (even with multiple storage drivers (currently I have mysql_drv and hash_drv)):

```
nautilus / # dspam --version

DSPAM Anti-Spam Suite 3.6.6 (agent/library)

Copyright (c) 2002-2006 Jonathan A. Zdziarski

http://dspam.nuclearelephant.com

DSPAM may be copied only under the terms of the GNU General Public License,

a copy of which can be found with the DSPAM distribution kit.

Configuration parameters: --prefix=/usr --host=i686-pc-linux-gnu --mandir=/usr/share/man --infodir=/usr/share/info --datadir=/usr/share --sysconfdir=/etc --localstatedir=/var/lib --enable-long-username --with-delivery-agent=/usr/bin/procmail --enable-large-scale --with-dspam-home=/var/spool/dspam --sysconfdir=/etc/mail/dspam --with-mysql-includes=/usr/include/mysql --with-mysql-libraries=/usr/lib/mysql --enable-preferences-extension --enable-daemon --enable-virtual-users --with-storage-driver=mysql_drv,hash_drv --build=i686-pc-linux-gnu

nautilus / #
```

cheers

SteveB

----------

## petrjanda

 *steveb wrote:*   

>  *petrjanda wrote:*   Yep, I understand that. Thanks! Any other question about DSPAM? I feel in the mood to answer more stuff 

 

Ok, then, Why do some people find that DSPAM 3.6 is less accurate than 3.4 with the same amount of corpused data?

----------

## steveb

 *petrjanda wrote:*   

> Ok, then, Why do some people find that DSPAM 3.6 is less accurate than 3.4 with the same amount of corpused data?

 Probably because they don't read the README and leave the default they had with 3.4.x. Some algorithm are depreciated in 3.6.x.

Gereraly I would say, that 3.6.x with the hash driver (Markovian algorithm) is the most accurate (but as well the most unflexible) algorithm in DSPAM. 3.4.x does not have such an accurate algorithm.

Beside that, 3.6.x has some nice features to higher the accurancy (like naive, noise, etc).

And another thing: The 3.6 series discourages you from using corpus feeding. It is better to use dspam_train then feeding spam/ham with dspam_corpus.

cheers

SteveB

----------

## petrjanda

When I first installed DSPAM 3.6, i used a DSPAM 3.6 conf file and it was still very innacurate. I then installed DSPAM 3.4 and used its default conffile and it was clearly much more accurate. Thats what i cant explain.

----------

## steveb

 *petrjanda wrote:*   

> Ok, then, Why do some people find that DSPAM 3.6 is less accurate than 3.4 with the same amount of corpused data?

 

You see in my above posts, that I have a very high accurancy for my user "globaluser". And for this user I extra turned off whitelisting, processor bias, training buffer and all that stuff. And I still manage to get 99.997% (3 errors in 127'507) accurancy on the spamarchive.org submit Spam and 100% (0 errors in 11'002) accurancy on the submitautomated spamarchive.org Spam.

I would not call those numbers a bad accurancy. What do you think?

cheers

SteveB

----------

## steveb

 *petrjanda wrote:*   

> When I first installed DSPAM 3.6, i used a DSPAM 3.6 conf file and it was still very innacurate. I then installed DSPAM 3.4 and used its default conffile and it was clearly much more accurate. Thats what i cant explain.

 Can you post your dspam.conf file and can you say how you trained? Did you use dspam_train (wich was not available in 3.4.x but it was available as a Perl script from the DSPAM mailing list or you could check it out from CVS and still use it for 3.6.x)

cheers

SteveB

----------

## steveb

BTW: Can you run my above mentioned command for the spamarchive.org files? What accurancy do you get with your 3.4.x installation?

----------

## steveb

 *petrjanda wrote:*   

> Ok, then, Why do some people find that DSPAM 3.6 is less accurate than 3.4 with the same amount of corpused data?

 When you write about "some people", do you mean Gentoo users or other users? I ask, because I don't use the Gentoo ebuild for DSPAM. I have my own ebuild (don't ask! It's a long storry... I am just not that happy with the ebuild and after complaining on bugzilla for a while, I started to maintain my own ebuild) for DSPAM. Maybe this has a influence as well on the accurancy?

cheers

SteveB

----------

## petrjanda

 *steveb wrote:*   

>  *petrjanda wrote:*   Ok, then, Why do some people find that DSPAM 3.6 is less accurate than 3.4 with the same amount of corpused data? 
> 
> You see in my above posts, that I have a very high accurancy for my user "globaluser". And for this user I extra turned off whitelisting, processor bias, training buffer and all that stuff. And I still manage to get 99.997% (3 errors in 127'507) accurancy on the spamarchive.org submit Spam and 100% (0 errors in 11'002) accurancy on the submitautomated spamarchive.org Spam.
> 
> I would not call those numbers a bad accurancy. What do you think?
> ...

 

They are not bad at all, however, with the amount of training that user has had its no surprise. Im talking about 3000 ham and 3000 spam kinda stuff. I used the SA corpus. Simply doing dspam_train on 3.6.4 yielded very bad results, downgrading to 3.4.9 and using dspam_corpus yielded much better results (using the same corpus).

What people am i talking about? a bunch on the dspam mailing list.

I dont have any of those old configuration files anymore however.

----------

## steveb

 *petrjanda wrote:*   

> They are not bad at all, however, with the amount of training that user has had its no surprise. Im talking about 3000 ham and 3000 spam kinda stuff. I used the SA corpus. Simply doing dspam_train on 3.6.4 yielded very bad results, downgrading to 3.4.9 and using dspam_corpus yielded much better results (using the same corpus).

 You see? That's the point. You used dspam_corpus on 3.4.x and dspam_train on 3.6.x.

That are complete differend tools. dspam_corpus does FORCE all tokens to be inside the database, while dspam_train is more intelligent (and better in the long run) than dspam_corpus.

That's the reason John has removed dspam_corpus from 3.6.6.

Anyway... After doing your training with the SA corpus, you should not look at your "global" user and judge the accurancy by his TP, TN, FP, FN numbers. They are irrelevant. What count is the accurancy after you use "global" as merged group for all the other users.

One other thing: How did you train your "global" user? With TEFT or TOE or TUM?

I would strongly suggest to train with TEFT and set all other user to use TOE. You could set TOE in your dspam.conf and then in the preferences for "global" you set his trainingMode to TEFT.

BTW: If you want, then I can send you my database in a compressed format. Or I could upload it on one of my servers and send you the link. This will sure enhance your filtering accurancy. You could merge my data from my "globaluser" user with your "global" user.

BTW2: The SA corpus is okay, but it's not the best corpus available. If you only train with the SA corpus, then your users will sure not have a big accurancy. Better would be to train with more data or you switch to the hash driver. This driver is extremly accurate and produces a gazillion of tokens. There you will probably only need 500 ham and spam messages to get a very very high accurancy. But using graham or burton or any of the other algorithm, you will need more training data to get better accurancy.

BTW3: Did you enable the chained tokens ("chain") and did you enable Bayesian Noise Reduction in your 3.6 series of DSPAM? This enhances accurancy very very much.

cheers

SteveB

cheers

SteveB

----------

## petrjanda

[quote="steveb"] *petrjanda wrote:*   

> They are not bad at all, however, with the amount of training that user has had its no surprise. Im talking about 3000 ham and 3000 spam kinda stuff. I used the SA corpus. Simply doing dspam_train on 3.6.4 yielded very bad results, downgrading to 3.4.9 and using dspam_corpus yielded much better results (using the same corpus).

 You see? That's the point. You used dspam_corpus on 3.4.x and dspam_train on 3.6.x.

That are complete differend tools. dspam_corpus does FORCE all tokens to be inside the database, while dspam_train is more intelligent (and better in the long run) than dspam_corpus.

[/quite]

I see.

 *Quote:*   

> 
> 
> One other thing: How did you train your "global" user? With TEFT or TOE or TUM?
> 
> 

 

Im pretty sure it was TEFT.

 *Quote:*   

> 
> 
> BTW: If you want, then I can send you my database in a compressed format. Or I could upload it on one of my servers and send you the link. This will sure enhance your filtering accurancy. You could merge my data from my "globaluser" user with your "global" user.
> 
> 

 

If you could put your spam/ham corpuses gzipped on an ftp server so i could fetch them it would be really great!

 *Quote:*   

> 
> 
> BTW3: Did you enable the chained tokens ("chain") and did you enable Bayesian Noise Reduction in your 3.6 series of DSPAM? This enhances accurancy very very much.
> 
> 

 

I enabled chains back then, not sure about BNR. However now i do have both enabled for sure.

Thanks

----------

## steveb

 *petrjanda wrote:*   

>  *steveb wrote:*    *petrjanda wrote:*   They are not bad at all, however, with the amount of training that user has had its no surprise. Im talking about 3000 ham and 3000 spam kinda stuff. I used the SA corpus. Simply doing dspam_train on 3.6.4 yielded very bad results, downgrading to 3.4.9 and using dspam_corpus yielded much better results (using the same corpus). You see? That's the point. You used dspam_corpus on 3.4.x and dspam_train on 3.6.x.
> 
> That are complete differend tools. dspam_corpus does FORCE all tokens to be inside the database, while dspam_train is more intelligent (and better in the long run) than dspam_corpus.
> 
>   *petrjanda wrote:*   I see. 

 

 *Quote:*   

> One other thing: How did you train your "global" user? With TEFT or TOE or TUM?
> 
> 

  *petrjanda wrote:*   

> Im pretty sure it was TEFT.

 Okay.

 *Quote:*   

> BTW: If you want, then I can send you my database in a compressed format. Or I could upload it on one of my servers and send you the link. This will sure enhance your filtering accurancy. You could merge my data from my "globaluser" user with your "global" user.
> 
> 

 If you could put your spam/ham corpuses gzipped on an ftp server so i could fetch them it would be really great![/quote]Well... I can't give you my ham corpus, because I asked all my users for their permission to use the data for training. I did not asked them to share it with any one else. And I will not share the corpus as long as I did not asked them for their permission. Spam corpus is another storry. I have no problem in sharing that. But I posted already a bunch of links to spam/ham and pure spam corpi. I don't see a big benefit in using my spam data.

What I was suggesting to provide you would be a dump of my data in MySQL. The data would be tokenized and you could then just load that data into your MySQL table and then you would have exactly my data as I have it here. And it would save you from doing months and months of training.

What do you think? Would that be a benefit for you?

But be warned! My data contains alot of different languages. Currently I have german, french, italian, english and russian ham data and alot of english spam and some german, french, italian, russian and asian spam.

 *Quote:*   

> BTW3: Did you enable the chained tokens ("chain") and did you enable Bayesian Noise Reduction in your 3.6 series of DSPAM? This enhances accurancy very very much.

  *petrjanda wrote:*   

> I enabled chains back then, not sure about BNR. However now i do have both enabled for sure.

 Great. But now you would need to do the retraining, since the new features will develop their full potential if you have tokens in your DSPAM storage wich where added to the storage with the enabled features.

 *petrjanda wrote:*   

> Thanks

 No problem

----------

## petrjanda

Ok, lets say, you send me the mysql data, how do i put the tokens into my mysql db?

Another question,

If people forward me their spam into a trap, how does DSPAM treat it? I mean, if you just forward all your spam to spam@xxx, it will have the "fwd" in the subject line, it will have more headers because it will be forwarded. wont that confuse dspam? So far ive been trying to get raw spam that hasnt been modified.

----------

## steveb

 *petrjanda wrote:*   

> Ok, lets say, you send me the mysql data, how do i put the tokens into my mysql db?

 You could create a new datbase on your MySQL server (lets call that db "steve") and then import my database dump into that database:

```
mysql --user=root --host=localhost -p steve < dspam.sql
```

After that you go and look up in your dspam database the uid of your "global" user and then you change in the steve.dspam_token_data table the uid from 1 (my "globaluser" has uid 1) to the target uid of your "global" user (lets say he has uid 4):

```
UPDATE `steve`.`dspam_token_data` SET `uid` = '4' WHERE `uid` = 1;
```

And then you copy the content of that table to your original DSPAM table:

```
INSERT INTO `dspam`.`dspam_token_data` SELECT * FROM `steve`.`dspam_token_data`;
```

Then you can drop the database "steve":

```
DROP DATABASE `steve`;
```

Another way of doing it could be to add another user (lets call him "dummy") into your DSPAM and then change the uid value in my MySQL dump to the uid of your "dummy" user and then import that data into your DSPAM MySQL table. Then use "dspam_merge" to merge "dummy" and "global".

 *petrjanda wrote:*   

> Another question,
> 
> If people forward me their spam into a trap, how does DSPAM treat it? I mean, if you just forward all your spam to spam@xxx, it will have the "fwd" in the subject line, it will have more headers because it will be forwarded. wont that confuse dspam? So far ive been trying to get raw spam that hasnt been modified.

 The additional headers will not confuse DSPAM alot. You could still exclude the additional headers with the "IgnoreHeader" directive in dspam.conf.

On the links I posted for the various ham/spam corpi: Most (if not all) of them are almost clean and in raw format. However... if you need to clean up mails wich have DSPAM headers, then maybe my clean-up script will be a help fo you:

```
for foo in *;do sed -i "/^X\-DSPAM\-Factors\:/,/^X\-DSPAM\-/d;s:[\!]\{0,1\}DSPAM\:[0-9]\{0,6\}[,]\{0,1\}[0-9a-z]*[\!]\{0,1\}::gI;/^[ >]*X\-DSPAM\-[a-zA-Z]*:\ .*$/d" ${foo};done
```

cheers

SteveB

----------

## petrjanda

I think it'll be best if I create my own corpuses, however send me the compressed mysql data too, i might use it if everything else fails. However, which of those websites would you say have the best ham/spam corpuses available? I work for an ISP and lot of the staff's emails are business mails which as it already happened confuse dspam to think they are spam. (like offering services etc which unfortunately tends to be a content of lot of spam)

----------

## steveb

 *petrjanda wrote:*   

> I think it'll be best if I create my own corpuses, however send me the compressed mysql data too, i might use it if everything else fails. However, which of those websites would you say have the best ham/spam corpuses available? I work for an ISP and lot of the staff's emails are business mails which as it already happened confuse dspam to think they are spam. (like offering services etc which unfortunately tends to be a content of lot of spam)

 Okay... the SA corpus seams to be the most accurate (hand selected. every message).

If you need alot of ham business messages, then look in the Enron Mail Data corpus or in the Trec 05 corpus. They have alot of business related stuff there. TREC 05 is machine sorted. That means that it is not error free. John Graham-Cumming has taken actions to get TREC 05 more accurate and has lunched the website SpamOrHam. As he writes in his newsletter #34, the TREC 05 corpus has probably around 10% missclassifications: *jgc's spam and anti-spam newsletter #34 wrote:*   

> SpamOrHam.org is going strong, but not strongly enough.  Please blog
> 
> and pass on the URL http://www.spamorham.org as much as you can.
> 
> What's especially interesting is that of the tens of thousands of
> ...

 

The ASSP ham/spam corpus seams to be as well very good.

The others are as well not that bad, but are full of virus and other malware. I would advice you to use an antivirus to clean the corpus before. Or if you run DSPAM with ClamAV enabled, then leave the corpus as it is and process everything with DSPAM.

cheers

SteveB

----------

## steveb

 *petrjanda wrote:*   

> I work for an ISP and lot of the staff's emails are business mails which as it already happened confuse dspam to think they are spam. (like offering services etc which unfortunately tends to be a content of lot of spam)

 One way of collecting (mostly) good ham messages is by using mails of all the users on the server (but ask them, if you are allowed!). The very easy way is by assuming that all the users on the server don't send spam. So everything (mostly) in their send folder (if they use IMAP) should be ham. I allone have on my send folder a message count of 11341. And I do archive and clean my send folder every one or two years.

Another good way of getting ham messages is to download messages from news groups (I normaly don't advise doing that, but it is a perfectly legimative way of getting high quantity of messages). I once used about 10'000 messages from a bunch of Swiss and German newsgroups (ch.admin, ch.soc.law, ch.soc.politics, ch.talk and de.etc.sprache.deutsch). Some of the news groups are full of spam. But some are mostly clean. The one dealing with languages (for example the de.etc.sprache.deutsch) are very diverse and have alot of messages (news groups are to much focused on one topic. For example the kernel mailing list is way to much about the kernel and learning that as ham is not the best thing to do. But de.etc.sprache.deutch for example is dealing with the german language and you will find there alot of messages about everything. It is focused on the language, but the language itself is so diverse, that it is not a big problem using those messages for ham learning).

In german we have a expression: Viele Wege führen nach Rom.

This means: Alot of paths lead to rome.

I am writing this, because I wanted to say, that there are manny ways to get your goal. You can collect ham/spam messages, you can download ham/spam corpi, you can download mails from newsgroups, etc, etc...

You can even process regular documents (yes! This is possible with DSPAM):

```
# DataSource: If you are using any type of data source that does not include

# email-like headers (such as documents), uncomment the line below. This

# will cause the entire input to be treated like a message "body"

#

#DataSource      document
```

DSPAM is just a toolset. You are the one making it working better. Not the tool it self is responsable for that. You bring the input and DSPAM will act on that input and produce result. If you put to much garbage into DSPAM, then don't expect perfect output. If you bring good input into DSPAM, then you can expect as well good output. It's all in your hands.  :Smile: 

cheers

SteveB

----------

## petrjanda

Steve,

Im trying to run dspam_train, but for some really weird reason i get permission denied after the training script tries to create directories in the dspam directory in /var/dspam/data/local. Basically /var/dspam has dspam:dspam ownership, then the train script creates a root:www folders and files in the working directory, and throws an error on creating the global's directory in /var/dspam/data/local. Any clues?

----------

## steveb

 *petrjanda wrote:*   

> Steve,
> 
> Im trying to run dspam_train, but for some really weird reason i get permission denied after the training script tries to create directories in the dspam directory in /var/dspam/data/local. Basically /var/dspam has dspam:dspam ownership, then the train script creates a root:www folders and files in the working directory, and throws an error on creating the global's directory in /var/dspam/data/local. Any clues?

 Can you post the output of:

```
dspam --version
```

cheers

SteveB

----------

## petrjanda

 *steveb wrote:*   

>  *petrjanda wrote:*   Steve,
> 
> Im trying to run dspam_train, but for some really weird reason i get permission denied after the training script tries to create directories in the dspam directory in /var/dspam/data/local. Basically /var/dspam has dspam:dspam ownership, then the train script creates a root:www folders and files in the working directory, and throws an error on creating the global's directory in /var/dspam/data/local. Any clues? Can you post the output of:
> 
> ```
> ...

 

Fixed it.

I needed to enable trusting nobody.

Anyway, ive collected around 500 spam and 550 ham from our staff so far. Ive ran the first 100 spam/100 ham through dspam_train, and these are my results.

```

h5n1# dspam_stats -H global

global:

                TP True Positives:             89

                TN True Negatives:             94

                FP False Positives:             9

                FN False Negatives:             4

                SC Spam Corpusfed:              0

                NC Nonspam Corpusfed:           0

                TL Training Left:            2397

                SHR Spam Hit Rate          95.70%

                HSR Ham Strike Rate:        8.74%

                OCA Overall Accuracy:      93.37%

```

Preference for global are very similar to yours.

```

h5n1# dspam_admin list preference global

enableBNR=on

enableWhitelist=off

ignoreGroups=off

localStore=global

makeCorpus=off

processorBias=off

showFactors=off

signatureLocation=header

spamAction=deliver

spamSubject=

statisticalSedation=0

trainingMode=TEFT

trainPristine=off

whitelistThreshold=9999999

```

----------

## steveb

 *petrjanda wrote:*   

> Fixed it.
> 
> I needed to enable trusting nobody.

  :Smile: 

 *petrjanda wrote:*   

> Anyway, ive collected around 500 spam and 550 ham from our staff so far. Ive ran the first 100 spam/100 ham through dspam_train, and these are my results.
> 
> ```
> 
> h5n1# dspam_stats -H global
> ...

 Well.... I would say that this is a good initial result. What do you say?

 *petrjanda wrote:*   

> Preference for global are very similar to yours.
> 
> ```
> 
> h5n1# dspam_admin list preference global
> ...

 They are almost the same.  :Wink: 

Keep some things in mind:You turned off "statisticalSedation":This changes the way DSPAM looks at spam/ham messagesIn production you will probably not have that turned offYou turned off "processorBias":This influences DSPAM very much when classifying (as long you don't have TL of zero)In production you will probably not have that turned offYou have turned off "enableWhitelist":This as well has a influence on DSPAMIn production you will probably have whitelisting turned onYou use a very high "whitelistThreshold":This influences DSPAM alot when classifyingIn production you will probably have that number around 10

All the preferences for your "global" user are good for training and I strongly recommend them. They allow your training to be alot dependend on the pure statistical algorithm and not on tweaks in DSPAM. The result will be not that perfect (in terms of the percentage) on your "global" user BUT will result in a much higher accurancy for all the user wich will share that tokens (in a group).

So! Train your "global" user with that settings but don't be dissapointed if the accurancy is not 99.9% for the "global" user. Remember: You are not trying to get the "global" user to have 99.9x%! You are trying to get a good base for all the others using those tokens from your "global" user. That's a very very important issue.

cheers

SteveB

----------

## steveb

What version of DSPAM are you using? Should I send you my 3.6.6 ebuild?

cheers

SteveB

----------

## petrjanda

 *steveb wrote:*   

> What version of DSPAM are you using? Should I send you my 3.6.6 ebuild?
> 
> cheers
> 
> SteveB

 

Dspam 3.6.5

I use pkgsrc  :Razz:  because Im not actually doing this on a gentoo box at the moment, however im planning to set up dspam on a gentoo box at home.

----------

## steveb

 *petrjanda wrote:*   

> Dspam 3.6.5

 Get NOW 3.6.6!! 3.6.5 has some nasty bugs!

 *petrjanda wrote:*   

> I use pkgsrc  because Im not actually doing this on a gentoo box at the moment, however im planning to set up dspam on a gentoo box at home.

 Okay.... I am just now doing some experiments with DSPAM 3.8.0 (this is the current CVS version):

```
DSPAM Anti-Spam Suite CVS (agent/library)

Copyright (c) 2002-2006 Jonathan A. Zdziarski

http://dspam.nuclearelephant.com

DSPAM may be copied only under the terms of the GNU General Public License,

a copy of which can be found with the DSPAM distribution kit.

Configuration parameters: --prefix=/usr --host=i686-pc-linux-gnu --mandir=/usr/share/man --infodir=/usr/share/info --datadir=/usr/share --sysconfdir=/etc --localstatedir=/var/lib --enable-long-username --with-delivery-agent=/usr/sbin/sendmail --enable-large-scale --with-dspam-home=/var/spool/dspam --sysconfdir=/etc/mail/dspam --with-mysql-includes=/usr/include/mysql --with-mysql-libraries=/usr/lib/mysql --enable-preferences-extension --enable-daemon --enable-virtual-users --with-storage-driver=mysql_drv,hash_drv --build=i686-pc-linux-gnu
```

This version has modular tokenizer! Finaly  :Wink: 

cheers

SteveB

----------

## steveb

 *petrjanda wrote:*   

> I use pkgsrc  because Im not actually doing this on a gentoo box at the moment

 Okay! No help any more from me! You are not using Gentoo  :Twisted Evil:   :Twisted Evil:   :Twisted Evil:   :Twisted Evil:   :Twisted Evil: 

Just joking....  :Smile: 

----------

## petrjanda

 *steveb wrote:*   

>  *petrjanda wrote:*   Dspam 3.6.5 Get NOW 3.6.6!! 3.6.5 has some nasty bugs!
> 
>  *petrjanda wrote:*   I use pkgsrc  because Im not actually doing this on a gentoo box at the moment, however im planning to set up dspam on a gentoo box at home. Okay.... I am just now doing some experiments with DSPAM 3.8.0 (this is the current CVS version):
> 
> ```
> ...

 

Ok, I will get 3.6.6 but im testing 3.6.5 now briefly.

ive trained dspam with 600 spam and 650 ham mails. do you think i should be getting some spam filtered now? Is it normal for dspam to have 0.99 result confidence on an email(ham) i sent through just to test it?

----------

## steveb

 *petrjanda wrote:*   

> Ok, I will get 3.6.6 but im testing 3.6.5 now briefly.

 Okay...

 *petrjanda wrote:*   

> ive trained dspam with 600 spam and 650 ham mails. do you think i should be getting some spam filtered now? Is it normal for dspam to have 0.99 result confidence on an email(ham) i sent through just to test it?

 0.99 is 99% confidence about the classification. What was the classification for that ham email? Was it innocent? If so, then I would say, that this is okay. Go ahead with training. Try to get at least the TL (Training Left) value to zero.

cheers

Steve

BTW: I archived my data and started today with DSPAM 3.8.0 (CVS). I am trying the new osb tokenizer. Can not tell much, since I just started. This is my current stats:

```
nautilus / # dspam_stats -H globaluser

globaluser:

                TP True Positives:            519

                TN True Negatives:            386

                FP False Positives:           111

                FN False Negatives:            12

                SC Spam Corpusfed:              0

                NC Nonspam Corpusfed:          49

                TL Training Left:            1954

                SHR Spam Hit Rate          97.74%

                HSR Ham Strike Rate:       22.33%

                OCA Overall Accuracy:      88.04%

nautilus / #
```

I have alot of mails to train. I will still keep the "dspam_clean -s0 -p0 globaluser" after each processing of a set (normaly a set has about 500 ham/spam messages). In some day's I can post more info. Right now DSPAM needs time to process those 500'000 mails I have in my corpus.

This is what Jonathan's gets on his test run against the Public SpamAssassin Corpus:

```
Various DSPAM Configurations Tested against SpamAssassin Corpus

All variables not mentioned are stock

Prefixes:               Flags:

  T - Tokenizer Mode      B - Use Bias (toward innocent)

  P - Probability Rule    N - No Bias

  C - Combination Rule

Tosb,Pmarkov,Cgraham+burton,teft,N   TP:  1730 TN:  4147 FP:     3 FN:   166

Tosb,Pbcr,Cgraham+burton,teft,N      TP:  1795 TN:  4143 FP:     7 FN:   101

Tchain,Pbcr,Cgraham+burton,teft,B    TP:  1693 TN:  4148 FP:     2 FN:   203

Tosb,Pmarkov,Cgraham+burton,tum,N    TP:  1730 TN:  4147 FP:     3 FN:   166

Tsbph,Pmarkov,Cgraham+burton,tum,N   TP:  1740 TN:  4133 FP:    17 FN:   156

Tword,Pbcr,Cgraham+burton,tum,B      TP:  1605 TN:  4147 FP:     3 FN:   291

Tchain,Pbcr,Cgraham+burton,tum,B     TP:  1735 TN:  4146 FP:     4 FN:   161

Tsbph,Pbcr,Cgraham+burton,tum,B      TP:  1757 TN:  4146 FP:     4 FN:   139

Tosb,Pbcr,Cgraham+burton,tum,B       TP:  1677 TN:  4147 FP:     3 FN:   219

Tosb,Pbcr,Cgraham+burton,tum,N       TP:  1808 TN:  4142 FP:     8 FN:    88

Tosb,Pbcr,Cburton,tum,B              TP:  1636 TN:  4148 FP:     2 FN:   260

Tchain,Pbcr,Cburton,tum,N            TP:  1804 TN:  4141 FP:     9 FN:    92

Tosb,Pbcr,Cburton,tum,N              TP:  1777 TN:  4143 FP:     7 FN:   119

Tword,Pbcr,Cgraham,tum,B             TP:  1592 TN:  4147 FP:     3 FN:   304

Tchain,Pbcr,Cgraham,tum,B            TP:  1669 TN:  4148 FP:     2 FN:   227

Tsbph,Pbcr,Cgraham,tum,B             TP:  1736 TN:  4147 FP:     3 FN:   160

Tosb,Pbcr,Cgraham,tum,B              TP:  1650 TN:  4148 FP:     2 FN:   246

Tword,Pbcr,Cgraham+burton,toe,B      TP:  1641 TN:  4345 FP:     8 FN:   262

Tchain,Pbcr,Cgraham+burton,toe,B     TP:  1720 TN:  4271 FP:     8 FN:   183

Tsbph,Pbcr,Cgraham+burton,toe,B      TP:  1824 TN:  4179 FP:    58 FN:   128

Tosb,Pbcr,Cgraham+burton,toe,B       TP:  1703 TN:  4292 FP:    15 FN:   208

Tsbph,Pmarkov,Cgraham+burton,toe,N   TP:  1919 TN:  4127 FP:    89 FN:    66

Tsbph,Pbcr,Cgraham+burton,toe,N      TP:  1899 TN:  4116 FP:    93 FN:    86

Tosb,Pmarkov,Cgraham+burton,toe,N    TP:  1898 TN:  4148 FP:    69 FN:    67

Tosb,Pbcr,Cgraham+burton,toe,N       TP:  1789 TN:  4220 FP:    36 FN:   143
```

----------

## steveb

Okay... restarting again. The start before was to bad. I started with my hardest corpus. Now starting with a more resonable corpus (Nr. 008) and then 009, 010, 004, 005, 006, 007, 001, 002, 003, 000. Currently have those statistics:

```
nautilus / # dspam_stats -H globaluser

globaluser:

                TP True Positives:            167

                TN True Negatives:            164

                FP False Positives:             8

                FN False Negatives:             5

                SC Spam Corpusfed:              0

                NC Nonspam Corpusfed:           3

                TL Training Left:            2325

                SHR Spam Hit Rate          97.09%

                HSR Ham Strike Rate:        4.65%

                OCA Overall Accuracy:      96.22%

nautilus / #  
```

----------

## petrjanda

Damn DSPAM. When I try to send an email to spam@xxxxx.xxx this is what i get from dspam.debug.

```

72545: [05/18/2006 17:39:34] DSPAM Instance Startup

72545: [05/18/2006 17:39:34] input args: /usr/pkg/bin/dspam --deliver=innocent,s

pam --user spam -i -f sysadmin@xxxxx.xxx -- spam@xxxxx.xxx 

72545: [05/18/2006 17:39:34] pass-thru args: /usr/sbin/sendmail -i -f sysadmin@xxxxx.xxx -- spam@xxxxx.xxx 

72545: [05/18/2006 17:39:34] processing user spam

72545: [05/18/2006 17:39:34] uid = 1003, euid = 1003, gid = 1004, egid = 1004

72545: [05/18/2006 17:39:34] loading preferences for user spam

72545: [05/18/2006 17:39:34] default preferences empty. reverting to dspam.conf 

preferences.

72545: [05/18/2006 17:39:34] Loading preferences from dspam.conf

72545: [05/18/2006 17:39:34] using /var/dspam/opt-in/local/spam.dspam as path

72545: [05/18/2006 17:39:34] using /var/dspam/opt-out/local/spam.nodspam as path

72545: [05/18/2006 17:39:34] sedation level set to: 5

72545: [05/18/2006 17:39:34] _mysql_drv_get_spamtotals: unable to _mysql_drv_get

pwnam(spam)

72545: [05/18/2006 17:39:34] unable to load totals.  using zero values.

72545: [05/18/2006 17:39:34] _mysql_drv_get_spamtotals: unable to _mysql_drv_get

pwnam(spam)

72545: [05/18/2006 17:39:34] DSPAM Instance Shutdown.  Exit Code: 0

```

I dont understand whats going on! sending emails to nospam@xxxxx.xxx and all others works fine. Its ridiculous. Any ideas Steve?

----------

## steveb

 *petrjanda wrote:*   

> Damn DSPAM. When I try to send an email to spam@xxxxx.xxx this is what i get from dspam.debug.
> 
> ```
> 
> 72545: [05/18/2006 17:39:34] DSPAM Instance Startup
> ...

 Looks like MySQL has hone away... and it looks like this stupid error, fixed in 3.6.6. Read the release notes of 3.6.6 about the fix.

cheers

SteveB

----------

## steveb

Those new tokenizers in DSPAM CVS do realy have a good impact on accurancy:

```
nautilus / # dspam_stats -H globaluser

globaluser:

                TP True Positives:          18407

                TN True Negatives:          25096

                FP False Positives:           496

                FN False Negatives:             6

                SC Spam Corpusfed:              4

                NC Nonspam Corpusfed:          44

                TL Training Left:               0

                SHR Spam Hit Rate          99.97%

                HSR Ham Strike Rate:        1.94%

                OCA Overall Accuracy:      98.86%

nautilus / #
```

Not bad. 99.97% SHR with no bias active is not bad at all...

----------

## steveb

 *petrjanda wrote:*   

> Damn DSPAM.

 Allow me to ask:Do you get that on every message?What MySQL version are you using?

cheers

SteveB

----------

## petrjanda

 *steveb wrote:*   

>  *petrjanda wrote:*   Damn DSPAM. When I try to send an email to spam@xxxxx.xxx this is what i get from dspam.debug.
> 
> ```
> 
> 72545: [05/18/2006 17:39:34] DSPAM Instance Startup
> ...

 

3.6.6 has the same error. I upgraded.

----------

## petrjanda

 *steveb wrote:*   

>  *petrjanda wrote:*   Damn DSPAM. Allow me to ask:Do you get that on every message?What MySQL version are you using?
> 
> cheers
> 
> SteveB

 

I only get that error when sending to spam@xxxxx.xxx. all others work   :Evil or Very Mad:  Note that spam@xxxxx.xxx is just like any other account. I use it to send spam to as a central place where to send spam that will go through dspam_train.

Im using 

mysql  Ver 14.12 Distrib 5.0.20

What the funny part is, doing dspam_train on user "spam" works fine too.

Can you please make available to me the compressed tokens? I would like to use them.

It seems the error has just disappeared, but now it is like this:

```

h5n1# less dspam.debug

79242: [05/19/2006 01:00:56] DSPAM Instance Startup

79242: [05/19/2006 01:00:56] input args: /usr/pkg/bin/dspam --deliver=innocent,s

pam --user spam -i -f elekktretterr@xxxxx -- spam@xxxxx

79242: [05/19/2006 01:00:56] pass-thru args: /usr/sbin/sendmail -i -f elekktrett

err@xxxxxxx -- spam@xxxxx

79242: [05/19/2006 01:00:56] processing user spam

79242: [05/19/2006 01:00:56] uid = 1003, euid = 1003, gid = 1004, egid = 1004

79242: [05/19/2006 01:00:56] loading preferences for user spam

79242: [05/19/2006 01:00:56] default preferences empty. reverting to dspam.conf

preferences.

79242: [05/19/2006 01:00:56] Loading preferences from dspam.conf

79242: [05/19/2006 01:00:56] using /var/dspam/opt-in/local/spam.dspam as path

79242: [05/19/2006 01:00:56] using /var/dspam/opt-out/local/spam.nodspam as path

79242: [05/19/2006 01:00:56] sedation level set to: 5

79242: [05/19/2006 01:00:56] DSPAM Instance Shutdown.  Exit Code: 0

```

And also, running /usr/pkg/bin/dspam --deliver=innocent,s

pam --user spam -i -f elekktretterr@xxxxx -- spam@xxxxx

manually works too. what the heck?

----------

## steveb

 *petrjanda wrote:*   

> I only get that error when sending to spam@xxxxx.xxx. all others work   Note that spam@xxxxx.xxx is just like any other account. I use it to send spam to as a central place where to send spam that will go through dspam_train.

 Okay. Now I get it. You use spam@xxxxxx.xxx as a spam trap. Is that right?

 *petrjanda wrote:*   

> Im using 
> 
> mysql  Ver 14.12 Distrib 5.0.20

 5.0.20? Okay... it's a 5.x release and you don't have the MySQL 4.x quoting bug. So everything is okay (from the viewpoint from DSPAM)

 *petrjanda wrote:*   

> What the funny part is, doing dspam_train on user "spam" works fine too.

 

 *petrjanda wrote:*   

> Can you please make available to me the compressed tokens? I would like to use them.

 I wiped off all my data this morning. But I have everything archived:

```
nautilus / # ls -lah /mnt/gentoo.scripts/spam-stuff/dspam_data/20060518*

-rw-r--r-- 1 root root 794 May 18 03:27 /mnt/gentoo.scripts/spam-stuff/dspam_data/20060518_dspam.dspam_stats.sql.bz2

-rw-r--r-- 1 root root 15M May 18 03:27 /mnt/gentoo.scripts/spam-stuff/dspam_data/20060518_dspam.dspam_token_data.sql.bz2

-rw-r--r-- 1 root root 15M May 18 03:26 /mnt/gentoo.scripts/spam-stuff/dspam_data/20060518_dspam.sql.bz2

-rw-r--r-- 1 root root 664 May 18 04:17 /mnt/gentoo.scripts/spam-stuff/dspam_data/20060518_var_spool_dspam.tbz2

nautilus / #
```

20060518_dspam.sql.bz2 contains every table of my DSPAM installation and 20060518_dspam.dspam_token_data.sql.bz2 contains only the tokens and 20060518_dspam.dspam_stats.sql.bz2 contains only the stats and 20060518_var_spool_dspam.tbz2 contains everything in /var/spool/dspam (except big log files).

You have my personal email address. Send me a mail, where I should upload the stuff or if I should send it as mail to you or if you want me that I put it somewhere on my server and send you then a link to the files.

 *petrjanda wrote:*   

> It seems the error has just disappeared, but now it is like this:
> 
> ```
> 
> h5n1# less dspam.debug
> ...

 Exit Code 0 is OKAY. Everything is working perfectly.

 *petrjanda wrote:*   

> And also, running /usr/pkg/bin/dspam --deliver=innocent,s
> 
> pam --user spam -i -f elekktretterr@xxxxx -- spam@xxxxx
> 
> manually works too. what the heck?

 It's working. Without errors. Error Code 0 == no error.

cheers

SteveB

----------

## petrjanda

 *Quote:*   

> 
> 
> Okay. Now I get it. You use spam@xxxxxx.xxx as a spam trap. Is that right?
> 
> 

 

Not quite. I just forward emails with attachments spam to that email address. Its not a spam trap.

I'll tell you where to upload it tomorrow.

 *Quote:*   

> 
> 
> It's working. Without errors. Error Code 0 == no error.
> 
> 

 

Well the obvious problem I see is that

1) It exits too early: doesnt go through the tokenizing phase

2) Does NOT get relayed to our backend server because /usr/sbin/sendmail is never called.

An debug of an email sent to nospam@xxxx looks sort of like this:

 *Quote:*   

> 
> 
> 79717: [05/19/2006 01:06:21] pass-thru args: /usr/sbin/sendmail -f elekktretterr
> 
> @xxx -- nospam@sxxxx
> ...

 

Ignore the errors in that log. it was just a test by running it from command line. Can you see the problem?

----------

## steveb

Do you have:

```
TrustedDeliveryAgent "/usr/sbin/sendmail"
```

in your dspam.conf?

Where does that "/usr/sbin/sendmail -f" come from? Do you have it in your master.cf? Can you post the relevant DSPAM part of master.cf?

Or does that come from a transport map? Can you post that as well?

cheers

SteveB

----------

## steveb

 *petrjanda wrote:*   

> 79717: [05/19/2006 01:06:21] loading preferences for user i

 

Why user "i"?? You have somewhere a error. I think the following line:

```
/usr/pkg/bin/dspam --deliver=innocent,spam --user spam -i -f elekktretterr@xxxxx -- spam@xxxxx
```

is not understandable. The "-i" parameter get's actualy to be used as username. Btw: Why the "--" on the command line? That is not needed. Only if you would have it in master.cf, then but not on the command line.

 *petrjanda wrote:*   

> 79717: [05/19/2006 01:06:21] decode.c:365: unexpected data: header string 'test' doesn't contains `:' character

 

What the heck is that "test" header??

 *petrjanda wrote:*   

> 79717: [05/19/2006 01:06:21] no tokens found in message

 

The message seams to have no content at all? Why? Is it empty?

Please post your master.cf or the part of your configuration wich is responsable in calling dspam for the alias you have set up.

cheers

SteveB

----------

## petrjanda

 *steveb wrote:*   

> Do you have:
> 
> ```
> TrustedDeliveryAgent "/usr/sbin/sendmail"
> ```
> ...

 

Im using this script:

```

if [ "$domain" = "$MYDOMAIN" ]; then

  $DSPAM --deliver=innocent,spam --user "$user" -i -f "$sender" -- "$recip"

elif [ "$domain" = "spam.$MYDOMAIN" ]; then

  $DSPAM --user "$user" --class=spam --source=error

elif [ "$domain" = "ham.$MYDOMAIN" ]; then

  $DSPAM --user "$user" --class=innocent --source=error

else

  $SENDMAIL -i -f "$sender" -- "$recip"

fi

```

Its not the whole script, but thatsa the main part. The script gets called from master.cf

```

dspam       unix        -       n       n       -      10     pipe

 flags=Rhq    user=dspam    argv=/usr/local/bin/dspamit ${sender} ${recipient}

```

----------

## petrjanda

 *steveb wrote:*   

>  *petrjanda wrote:*   79717: [05/19/2006 01:06:21] loading preferences for user i 
> 
> Why user "i"?? You have somewhere a error. I think the following line:
> 
> ```
> ...

 

I told you, dont worry about those errors. They are meaningless, what I did is basically copy/paste that from dspam.debug and ran it on command line, typed in some rubbish and EOFed out of it. The main point was that when sending to user spam, only spam, it never gets to the tokenizing stage, and delivery stage. All other users work as they should.

----------

## steveb

Why so complicated? Would it not be more easy to do some thing like mentioned here:Setup by Neale Pickett (look at this script for retraining)How to get Dspam, Postfix, and Procmail to play well together

In my setup I use some thing like this in master.cf for retraining:

```
dspam-retrain   unix   -      n       n       -       -       pipe

   flags=Rhq user=dspam:mail argv=/usr/bin/dspam

   --class=${nexthop}

   --source=error

   --deliver=spam,innocent

   --stdout
```

And since I have many domains on the server... I have set up a transport map for DSPAM:

```
# /etc/postfix/dspam_lerning_transport.pcre

#

##

## Training DSPAM with one master.cf entry. Signature

## needs to be present in message body. Else DSPAM

## will drop the message. dspam.conf needs to have

## the following entries:

##   Preference "signatureLocation=message"

## or

##   you can set in the preference extension the

##   signature to be included in the body of the

##   message

##

##  and:

##   PgSQLUIDInSignature on

##  or

##   MySQLUIDInSignature on

##

/^spam\@.*$/               dspam-retrain:spam

/^(notspam|ham)\@.*$/            dspam-retrain:innocent
```

And in main.cf:

```
transport_maps =

        ....

        pcre:/etc/postfix/dspam_lerning_transport.pcre

        ....
```

If you don't have the uid in the signature, then you would probably exchange dspam-retrain in master.cf to this:

```
dspam-retrain   unix   -      n       n       -       -       pipe

   flags=Rhq user=dspam:mail argv=/usr/bin/dspam

   --user ${sender}

   --class=${nexthop}

   --source=error

   --deliver=spam,innocent

   --stdout
```

cheers

SteveB

----------

## petrjanda

 *steveb wrote:*   

> Why so complicated? Would it not be more easy to do some thing like mentioned here:Setup by Neale Pickett (look at this script for retraining)How to get Dspam, Postfix, and Procmail to play well together
> 
> In my setup I use some thing like this in master.cf for retraining:
> 
> ```
> ...

 

I found it much easier to do it the way I did it   :Razz:  BTW, how to make dspam filter only incoming mail? It does now, but Im looking for a more flexible solution.

----------

## steveb

 *petrjanda wrote:*   

> I found it much easier to do it the way I did it   BTW, how to make dspam filter only incoming mail? It does now, but Im looking for a more flexible solution.

 

You mean something like this?

----------

## petrjanda

 *steveb wrote:*   

>  *petrjanda wrote:*   I found it much easier to do it the way I did it   BTW, how to make dspam filter only incoming mail? It does now, but Im looking for a more flexible solution. 
> 
> You mean something like this?

 

Cheers.

Just send the compressed files to my email. janda.petr@gmail.com

Dont have an ftp server available atm.

----------

## steveb

 *petrjanda wrote:*   

> Just send the compressed files to my email. janda.petr@gmail.com

 

```
May 19 03:35:40 mail postfix/smtp[3145]: 6310615B7D59: to=<janda.petr@gmail.com>, relay=gmail-smtp-in.l.google.com[64.233.183.114], delay=208, status=sent (250 2.0.0 OK 1148002539 o9si1560890nfa)
```

----------

## steveb

Write/Post, when you have recieved the mail.

cheers

SteveB

----------

## petrjanda

 *steveb wrote:*   

> Write/Post, when you have recieved the mail.
> 
> cheers
> 
> SteveB

 

Got it.

Just need one more advice. How would you go about cleaning everyone's except global's tokens and signatures?

----------

## steveb

Assuming your "global" user has uid 4, then do this to delete every uid not matching 4:

```
USE dspam;

DELETE FROM `dspam_preferences` WHERE `uid` != 4;

DELETE FROM `dspam_signature_data` WHERE `uid` != 4;

DELETE FROM `dspam_stats` WHERE `uid` != 4;

DELETE FROM `dspam_token_data` WHERE `uid` != 4;

DELETE FROM `dspam_virtual_uids` WHERE `uid` != 4;
```

Depending on your setup, you may have in the DSPAM home directory (for me it is /var/spool/dspam/) a directory called data and there you have as well data you could delete (if you want). The data is mainly used for the DSPAM Web-UI.

cheers

SteveB

----------

## magic919

Steveb,

I'm still tracking this post with interest.  I'm quite a fan of DSPAM and it's good to see it getting some airing.  Given the fact that Gentoo ebuild is only at 3.6.4 what are the chances of you letting us 'see' your ebuild?  Would help me for the half a dozen machines I run DSPAM on.

Tony

----------

## steveb

 *magic919 wrote:*   

> Given the fact that Gentoo ebuild is only at 3.6.4 what are the chances of you letting us 'see' your ebuild?

 Hallo Tony

I am a long time DSPAM user. If you look up in bugs.gentoo.org, then you will realize, that I was the one to introduce the configure part of the ebuild and that I was the one introducing the deamon mode and that I introduced the Oracle stuff and many other stuff. And you will realize, that Lim Swee Tat (the maintainer of the DSPAM ebuild) is way to slow to react on anything. Some time requests stay for months without any change. That's the reason I give up in posting to bugs.gentoo.org about DSPAM. It's like a black hole: You can scream, but no one hears you.  :Smile: 

I have nothing personal against Lim Swee Tat. I don't know why he is reacting so slow? Maybe he has 10'000 other things to do and maintaining the DSPAM ebuild is not one of his top priorities? I realy don't know.

Anyway... I am not keeping my ebuild as a secret. Everyone can have it. I have every release of DSPAM as a ebuild. Currently I run the CVS version, testing the various new tokenizer and submitting patches and other stuff to the DSPAM mailing list.

Here is my version of the 3.6.6 ebuild (you need to name it mail-filter/dspam/dspam-3.6.6.ebuild):

```
# Copyright 1999-2005 Gentoo Foundation

# Distributed under the terms of the GNU General Public License v2

# $Header: $

inherit eutils

DESCRIPTION="A statistical-algorithmic hybrid anti-spam filter"

SRC_URI="http://dspam.nuclearelephant.com/sources/${P}.tar.gz"

HOMEPAGE="http://dspam.nuclearelephant.com/"

LICENSE="GPL-2"

IUSE="clamav debug large-domain ldap logrotate mysql oci8 postgres sqlite sqlite3 virtual-users user-homedirs"

DEPEND="clamav? ( >=app-antivirus/clamav-0.86 )

                mysql? ( >=dev-db/mysql-3.23 )

                sqlite? ( <dev-db/sqlite-3 )

                sqlite3? ( =dev-db/sqlite-3* )

                postgres? ( >=dev-db/postgresql-7.4.3 )

                ldap? ( net-nds/openldap )

                "

RDEPEND="${DEPEND}

                sys-process/cronbase

                logrotate? ( app-admin/logrotate )

                "

KEYWORDS="~x86 ~ppc ~alpha ~amd64"

SLOT="0"

# some FHS-like structure

HOMEDIR="/var/spool/dspam"

CONFDIR="/etc/mail/dspam"

LOGDIR="/var/log/dspam"

pkg_setup() {

        has_version ">sys-kernel/linux-headers-2.6" || (

                einfo "To use the new DSPAM deamon mode, you need to emerge"

                einfo ">sys-kernel/linux-headers-2.6 and rebuild glibc to support NPTL"

        )

        if use virtual-users && use user-homedirs ; then

                ewarn "If the users are virtual, then they probably should not have home directories."

        fi

        if use user-homedirs ; then

                ewarn "WARNING: dspam-web will not work with user-homedirs. Disable this USE flag"

                ewarn "if you intend on using dspam-web."

        fi

        id dspam 2>/dev/null || enewgroup dspam 26

        id dspam 2>/dev/null || enewuser dspam 26 /bin/bash ${HOMEDIR} dspam

}

src_compile() {

        local myconf

        local driver_count=0

        local myconf_driver=""

        myconf="${myconf} --enable-long-username"

        # Override the default delivery agent. This sets only

        # the default, which may be changed in dspam.conf.

        if has_version "mail-mta/postfix"; then

                myconf="${myconf} --with-delivery-agent=/usr/sbin/sendmail"

        elif has_version "mail-mta/exim"; then

                myconf="${myconf} --with-delivery-agent=/usr/sbin/exim"

        elif has_version "mail-mta/sendmail"; then

                myconf="${myconf} --with-delivery-agent=/usr/sbin/sendmail"

        elif has_version "mail-filter/procmail"; then

                myconf="${myconf} --with-delivery-agent=/usr/bin/procmail"

        fi

        use large-domain && myconf="${myconf} --enable-large-scale" ||\

            myconf="${myconf} --enable-domain-scale"

        myconf="${myconf} --with-dspam-home=${HOMEDIR}"

        myconf="${myconf} --sysconfdir=${CONFDIR}"

        use user-homedirs && myconf="${myconf} --enable-homedir"

        use clamav && myconf="${myconf} --enable-clamav"

        # enables support for debugging (touch /etc/dspam/.debug to turn on)

        # optional: even MORE debugging output, use with extreme caution!

        use debug && myconf="${myconf} --enable-debug --enable-verbose-debug --enable-bnr-debug"

        use ldap && myconf="${myconf} --enable-ldap"

        # select storage driver

        if use sqlite ; then

                if [[ ${driver_count} -eq 0 ]] ; then

                        myconf_driver="--with-storage-driver=sqlite_drv"

                else

                        myconf_driver="${myconf_driver},sqlite_drv"

                fi

                myconf="${myconf} --enable-virtual-users"

                driver_count=$((${driver_count} + 1))

        fi

        if use sqlite3 ; then

                if [[ ${driver_count} -eq 0 ]] ; then

                        myconf_driver="--with-storage-driver=sqlite3_drv"

                else

                        myconf_driver="${myconf_driver},sqlite3_drv"

                fi

                myconf="${myconf} --enable-virtual-users"

                driver_count=$((${driver_count} + 1))

        fi

        if use mysql; then

                if [[ ${driver_count} -eq 0 ]] ; then

                        myconf_driver="--with-storage-driver=mysql_drv"

                else

                        myconf_driver="${myconf_driver},mysql_drv"

                fi

                myconf="${myconf} --with-mysql-includes=/usr/include/mysql"

                myconf="${myconf} --with-mysql-libraries=/usr/lib/mysql"

                myconf="${myconf} --enable-preferences-extension"

                driver_count=$((${driver_count} + 1))

                if has_version ">sys-kernel/linux-headers-2.6"; then

                        myconf="${myconf} --enable-daemon"

                fi

                use virtual-users && myconf="${myconf} --enable-virtual-users"

                # an experimental feature available with MySQL and PgSQL backend

        fi

        if use oci8 ; then

                if [[ ${driver_count} -eq 0 ]] ; then

                        myconf_driver="--with-storage-driver=ora_drv"

                else

                        myconf_driver="${myconf_driver},ora_drv"

                fi

                myconf="${myconf} --with-oracle-home=${ORACLE_HOME}"

                myconf="${myconf} --enable-virtual-users"

                # I am in no way a Oracle specialist. If someone knows

                # how to query the version of Oracle, then let me know.

                if (expr ${ORACLE_HOME/*\/} : 10 1>/dev/null 2>&1); then

                        myconf="${myconf} --with-oracle-version=10"

                fi

        fi

        if use postgres ; then

                if [[ ${driver_count} -eq 0 ]] ; then

                        myconf_driver="--with-storage-driver=pgsql_drv"

                else

                        myconf_driver="${myconf_driver},pgsql_drv"

                fi

                myconf="${myconf} --with-pgsql-includes=/usr/include/postgresql"

                myconf="${myconf} --with-pgsql-libraries=/usr/lib/postgresql"

                myconf="${myconf} --enable-preferences-extension"

                driver_count=$((${driver_count} + 1))

                if has_version ">sys-kernel/linux-headers-2.6"; then

                        myconf="${myconf} --enable-daemon"

                fi

                use virtual-users && myconf="${myconf} --enable-virtual-users"

                # an experimental feature available with MySQL and PgSQL backend

        fi

        if [[ ${driver_count} -eq 0 ]] ; then

                myconf_driver="--with-storage-driver=hash_drv"

                driver_count=$((${driver_count} + 1))

        fi

        myconf_driver="${myconf_driver},hash_drv"

        myconf="${myconf} ${myconf_driver}"

        econf ${myconf} || die

        emake || die

}

src_install () {

        # Fix issues with older dspam configuration

        CONFIG_PROTECT="${CONFIG_PROTECT} ${HOMEDIR} ${CONFDIR} /var/run/dspam"

        CONFIG_PROTECT_MASK="${CONFIG_PROTECTMASK/${HOMEDIR}/}"

        CONFIG_PROTECT_MASK="${CONFIG_PROTECTMASK/${CONFDIR}/}"

        # open up perms on $HOMEDIR

        diropts -m0775 -o dspam -g dspam

        dodir ${HOMEDIR}

        keepdir ${HOMEDIR}

        # keeps dspam data in $CONFDIR

        diropts -m0775 -o dspam -g dspam

        dodir ${CONFDIR}

        keepdir ${CONFDIR}

        # make install

        make DESTDIR=${D} install || die

        chmod o+s ${D}/usr/bin/dspam

        chmod o+s ${D}/usr/bin/dspam_stats

        # documentation

        dodoc CHANGELOG LICENSE README* RELEASE.NOTES UPGRADING

        docinto doc

        dodoc doc/*.txt

        doman man/dspam*

        # build some initial configuration data

        [ ! -f ${CONFDIR}/dspam.conf ] \

                && cp src/dspam.conf ${T}/dspam.conf

        if use mysql || use postgres; then

                if has_version ">sys-kernel/linux-headers-2.6"; then

                        # keeps dspam socket for deamon in /var/run/dspam

                        diropts -m0775 -o dspam -g dspam

                        dodir /var/run/dspam

                        keepdir /var/run/dspam

                        # We use sockets for the deamon instead of tcp port 24

                        sed -e 's:^#*\(ServerDomainSocketPath[\t ]\{1,\}\).*:\1\"/var/run/dspam/dspam.sock\":gI' \

                                -e 's:^#*\(ServerPID[\t ]\{1,\}\).*:\1/var/run/dspam/dspam.pid:gI' \

                                -i ${T}/dspam.conf

                        # dspam init script

                        exeinto /etc/init.d

                        exeopts -m0755 -o root -g root

                        newexe ${FILESDIR}/dspam.rc dspam

                fi

        fi

        # generate random password

        local PASSWORD="${RANDOM}${RANDOM}${RANDOM}${RANDOM}"

        # database related configuration and scripts

        if use sqlite; then

                insinto ${CONFDIR}

                insopts -m644 -o dspam -g dspam

                newins src/tools.sqlite_drv/purge-2.sql sqlite_purge.sql

        fi

        if use sqlite3; then

                insinto ${CONFDIR}

                insopts -m644 -o dspam -g dspam

                newins src/tools.sqlite_drv/purge-3.sql sqlite3_purge.sql

        fi

        if use mysql; then

                # Use existing configuration if possible

                if [[ -f ${ROOT}${CONFDIR}/mysql.data ]]; then

                        DSPAM_DB_DATA=( $(sed "s:^[\t ]*$:###:gI" "${ROOT}${CONFDIR}/mysql.data") )

                        for DB_DATA_INDEX in $(seq 0 $((${#DSPAM_DB_DATA[@]} - 1))); do

                                [[ "${DSPAM_DB_DATA[$DB_DATA_INDEX]}" = "###" ]] && DSPAM_DB_DATA[$DB_DATA_INDEX]=""

                        done

                else

                        DSPAM_DB_DATA[0]="/var/run/mysqld/mysqld.sock"

                        DSPAM_DB_DATA[1]=""

                        DSPAM_DB_DATA[2]="dspam"

                        DSPAM_DB_DATA[3]="${PASSWORD}"

                        DSPAM_DB_DATA[4]="dspam"

                        DSPAM_DB_DATA[5]="true"

                fi

                # Modify configuration and create mysql.data file

                sed -e "s:^#*\(MySQLServer[\t ]\{1,\}\).*:\1${DSPAM_DB_DATA[0]}:gI" \

                        -e "s:^#*\(MySQLPort[\t ]\{1,\}\).*:\1${DSPAM_DB_DATA[1]}:gI" \

                        -e "s:^#*\(MySQLUser[\t ]\{1,\}\).*:\1${DSPAM_DB_DATA[2]}:gI" \

                        -e "s:^#*\(MySQLPass[\t ]\{1,\}\).*:\1${DSPAM_DB_DATA[3]}:gI" \

                        -e "s:^#*\(MySQLDb[\t ]\{1,\}\).*:\1${DSPAM_DB_DATA[4]}:gI" \

                        -e "s:^#*\(MySQLCompress[\t ]\{1,\}\).*:\1${DSPAM_DB_DATA[5]}:gI" \

                        -i ${T}/dspam.conf

                for DB_DATA_INDEX in $(seq 0 $((${#DSPAM_DB_DATA[@]} - 1))); do

                        echo "${DSPAM_DB_DATA[$DB_DATA_INDEX]}" >> ${T}/mysql.data

                done

                insinto ${CONFDIR}

                insopts -m644 -o dspam -g dspam

                doins ${T}/mysql.data

                newins src/tools.mysql_drv/mysql_objects-space.sql mysql_objects-space.sql

                newins src/tools.mysql_drv/mysql_objects-speed.sql mysql_objects-speed.sql

                newins src/tools.mysql_drv/mysql_objects-4.1.sql mysql_objects-4.1.sql

                newins src/tools.mysql_drv/virtual_users.sql mysql_virtual_users.sql

                newins src/tools.mysql_drv/purge.sql mysql_purge.sql

                newins src/tools.mysql_drv/purge-4.1.sql mysql_purge-4.1.sql

        fi

        if use oci8 ; then

                # Use existing configuration if possible

                if [ -f ${ROOT}${CONFDIR}/oracle.data ]; then

                        DSPAM_DB_DATA=( $(cat "${ROOT}${CONFDIR}/oracle.data") )

                        for DB_DATA_INDEX in $(seq 0 $((${#DSPAM_DB_DATA[@]} - 1))); do

                                [[ "${DSPAM_DB_DATA[$DB_DATA_INDEX]}" = "###" ]] && DSPAM_DB_DATA[$DB_DATA_INDEX]=""

                        done

                else

                        DSPAM_DB_DATA[0]="(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=127.0.0.1)(PORT=1521))(CONNECT_DATA=(SID=PROD)))"

                        DSPAM_DB_DATA[1]="dspam"

                        DSPAM_DB_DATA[2]="${PASSWORD}"

                        DSPAM_DB_DATA[3]="dspam"

                fi

                # Modify configuration and create oracle.data file

                sed -e "s:^#*\(OraServer[\t ]\{1,\}\).*:\1${DSPAM_DB_DATA[0]}:gI" \

                        -e "s:^\(OraUser[\t ]\{1,\}\).*:\1${DSPAM_DB_DATA[1]}:gI" \

                        -e "s:^\(OraPass[\t ]\{1,\}\).*:\1${DSPAM_DB_DATA[2]}:gI" \

                        -e "s:^\(OraSchema[\t ]\{1,\}\).*:\1${DSPAM_DB_DATA[3]}:gI"\

                        -i ${T}/dspam.conf

                for DB_DATA_INDEX in $(seq 0 $((${#DSPAM_DB_DATA[@]} - 1))); do

                        echo "${DSPAM_DB_DATA[$DB_DATA_INDEX]}" >> ${T}/oracle.data

                done

                insinto ${CONFDIR}

                insopts -m644 -o dspam -g dspam

                doins ${T}/oracle.data

                newins src/tools.ora_drv/oral_objects.sql ora_objects.sql

                newins src/tools.ora_drv/virtual_users.sql ora_virtual_users.sql

                newins src/tools.ora_drv/purge.sql ora_purge.sql

        fi

        if use postgres ; then

                # Use existing configuration if possible

                if [ -f ${ROOT}${CONFDIR}/pgsql.data ]; then

                        DSPAM_DB_DATA=( $(cat "${ROOT}${CONFDIR}/pgsql.data") )

                        for DB_DATA_INDEX in $(seq 0 $((${#DSPAM_DB_DATA[@]} - 1))); do

                                [[ "${DSPAM_DB_DATA[$DB_DATA_INDEX]}" = "###" ]] && DSPAM_DB_DATA[$DB_DATA_INDEX]=""

                        done

                else

                        DSPAM_DB_DATA[0]="127.0.0.1"

                        DSPAM_DB_DATA[1]="5432"

                        DSPAM_DB_DATA[2]="dspam"

                        DSPAM_DB_DATA[3]="${PASSWORD}"

                        DSPAM_DB_DATA[4]="dspam"

                fi

                # Modify configuration and create pgsql.data file

                sed -e "s:^#*\(PgSQLServer[\t ]\{1,\}\).*:\1${DSPAM_DB_DATA[0]}:gI" \

                        -e "s:^#*\(PgSQLPort[\t ]\{1,\}\).*:\1${DSPAM_DB_DATA[1]}:gI" \

                        -e "s:^#*\(PgSQLUser[\t ]\{1,\}\).*:\1${DSPAM_DB_DATA[2]}:gI" \

                        -e "s:^#*\(PgSQLPass[\t ]\{1,\}\).*:\1${DSPAM_DB_DATA[3]}:gI" \

                        -e "s:^#*\(PgSQLDb[\t ]\{1,\}\).*:\1${DSPAM_DB_DATA[4]}:gI" \

                        -e "s:^#*\(PgSQLConnectionCache[\t ]*.\):\1:gI" \

                        -i ${T}/dspam.conf

                for DB_DATA_INDEX in $(seq 0 $((${#DSPAM_DB_DATA[@]} - 1))); do

                        echo "${DSPAM_DB_DATA[$DB_DATA_INDEX]}" >> ${T}/pgsql.data

                done

                insinto ${CONFDIR}

                insopts -m644 -o dspam -g dspam

                doins ${T}/pgsql.data

                newins src/tools.pgsql_drv/pgsql_objects.sql pgsql_objects.sql

                newins src/tools.pgsql_drv/virtual_users.sql pgsql_virtual_users.sql

                newins src/tools.pgsql_drv/purge.sql pgsql_purge.sql

        fi

        sed -e "s:^\(Purge.*\):###\1:g" \

                -e "s:^#\(Purge.*\):\1:g" \

                -e "s:^###\(Purge.*\):#\1:g" \

                -i ${T}/dspam.conf

        insinto ${CONFDIR}

        insopts -m644 -o dspam -g dspam

        doins ${T}/dspam.conf

        # installs the notification messages

        diropts -m0775 -o dspam -g dspam

        dodir ${HOMEDIR}/txt

        keepdir ${HOMEDIR}/txt

        insinto ${HOMEDIR}/txt

        insopts -m644 -o dspam -g dspam

        doins ${S}/txt/*.txt

        # Create the opt-in / opt-out directories

        diropts -m0775 -o dspam -g dspam

        dodir ${HOMEDIR}/opt-in

        keepdir ${HOMEDIR}/opt-in

        dodir ${HOMEDIR}/opt-out

        keepdir ${HOMEDIR}/opt-out

        # logrotation scripts

        diropts -m0755 -o dspam -g dspam

        dodir /etc/logrotate.d

        keepdir /etc/logrotate.d

        insinto /etc/logrotate.d

        insopts -m0644 -o dspam -g dspam

        newins ${FILESDIR}/logrotate.dspam dspam

        # dspam cron job

        diropts -m0755 -o dspam -g dspam

        dodir /etc/cron.daily

        keepdir /etc/cron.daily

        exeinto /etc/cron.daily

        exeopts -m0755 -o dspam -g dspam

        cp ${FILESDIR}/${PV}-dspam.cron ${T}/dspam.cron

        doexe ${T}/dspam.cron

        # dspam enviroment

        echo -ne "CONFIG_PROTECT=\"${HOMEDIR}/txt ${HOMEDIR}/opt-in ${HOMEDIR}/opt-out /var/run/dspam\"\n\n" > ${T}/40dspam

        doenvd ${T}/40dspam || die

}

pkg_postinst() {

        env-update

        if use mysql || use postgres || use oci8; then

                echo

                einfo "To setup DSPAM to run out-of-the-box on your system, run:"

                einfo " ebuild /var/db/pkg/${CATEGORY}/${PF}/${PF}.ebuild config"

                echo

        fi

        if use mysql || use postgres; then

                if has_version ">sys-kernel/linux-headers-2.6"; then

                        echo

                        einfo "If you want to run DSPAM in the new deamon mode. Remember"

                        einfo "to make the DSPAM daemon start durig boot:"

                        einfo "  rc-update add dspam default"

                        echo

                fi

        fi

        echo

        einfo "Don't forgett to add dspam_logrotoate to your cron."

        einfo "For example (purging log entries older than 30 days):"

        einfo "0 0 * * *  root  dspam_logrotate -a 30 -d ${HOMEDIR}/data"

        echo

}

pkg_config () {

        local storage_drivers_count=-1

        local storage_drivers=( $(dspam --version | sed -n "s:^.*\-\-with\-storage\-driver=\([^ ]*\) \-.*:\1:gIp" | sed "s:,: :gI") )

        einfo "  Please select what driver you would like to configure"

        for foo in ${storage_drivers[@]};

        do

                let storage_drivers_count="(( ${storage_drivers_count} + 1 ))"

                einfo "    [${storage_drivers_count}] ${foo}"

        done

        einfo

        einfo "    [Q] Quit"

        while true

        do

                read -n 1 -s -p "" DSPAM_CONFIGURE_DRIVER

                [[ "${DSPAM_CONFIGURE_DRIVER}" == "Q" || "${DSPAM_CONFIGURE_DRIVER}" == "q" ]] && echo && exit 0

                [[ "${DSPAM_CONFIGURE_DRIVER}" -ge "0" && "${DSPAM_CONFIGURE_DRIVER}" -le "${storage_drivers_count}" ]] && echo && break

        done

        case "${storage_drivers[${DSPAM_CONFIGURE_DRIVER}]}" in

                "mysql_drv" )

                        DSPAM_MySQL_COMMAND=""

                        DSPAM_DB_DATA=( $(sed "s:^[\t ]*$:###:gI" "${ROOT}${CONFDIR}/mysql.data") )

                        for DB_DATA_INDEX in $(seq 0 $((${#DSPAM_DB_DATA[@]} - 1))); do

                                [[ "${DSPAM_DB_DATA[$DB_DATA_INDEX]}" = "###" ]] && DSPAM_DB_DATA[$DB_DATA_INDEX]=""

                        done

                        DSPAM_MySQL_USER="${DSPAM_DB_DATA[2]}"

                        DSPAM_MySQL_PWD="${DSPAM_DB_DATA[3]}"

                        DSPAM_MySQL_DB="${DSPAM_DB_DATA[4]}"

                        DSPAM_MySQL_COMMAND="CREATE DATABASE ${DSPAM_MySQL_DB};USE ${DSPAM_MySQL_DB};"

                        einfo "DSPAM MySQL tables for data objects"

                        einfo "  Please select what kind of object database you like to use:"

                        einfo "    [1] Space optimized database"

                        einfo "    [2] Speed optimized database"

                        einfo

                        while true

                        do

                                read -n 1 -s -p "  Press 1 or 2 on the keyboard to select database" DSPAM_MySQL_DB_Type

                                [[ "${DSPAM_MySQL_DB_Type}" == "1" || "${DSPAM_MySQL_DB_Type}" == "2" ]] && echo && break

                        done

                        if [ "${DSPAM_MySQL_DB_Type}" == "1" ]

                        then

                                DSPAM_MySQL_COMMAND="${DSPAM_MySQL_COMMAND};$(grep -v "^\-\-\|^[\t ]*$\|^#" ${CONFDIR}/mysql_objects-space.sql)"

                        else

                                DSPAM_MySQL_COMMAND="${DSPAM_MySQL_COMMAND};$(grep -v "^\-\-\|^[\t ]*$\|^#" ${CONFDIR}/mysql_objects-speed.sql)"

                        fi

                        if use virtual-users ; then

                                DSPAM_MySQL_COMMAND="${DSPAM_MySQL_COMMAND};$(grep -v "^\-\-\|^[\t ]*$\|^#" ${CONFDIR}/mysql_virtual_users.sql)"

                        fi

                        DSPAM_MySQL_COMMAND="${DSPAM_MySQL_COMMAND};GRANT SELECT,INSERT,UPDATE,DELETE ON ${DSPAM_MySQL_DB}.* TO ${DSPAM_MySQL_USER}@localhost IDENTIFIED BY '${DSPAM_MySQL_PWD}';FLUSH PRIVILEGES;"

                        ewarn "When prompted for a password, please enter your MySQL root password"

                        ewarn

                        /usr/bin/mysql --user=root --host=localhost -p -e "${DSPAM_MySQL_COMMAND}"

                        ;;

                "ora_drv" )

                        einfo "We have not enought Oracle knowledge to configure Oracle"

                        einfo "automatically. If you know how, please post a message in"

                        einfo "Gentoo Bugzilla."

                        echo

                        einfo "You need manually to create the Oracle user for DSPAM and"

                        einfo "the necessary database."

                        einfo "But the DSPAM configuration file dspam.conf and oracle.data"

                        einfo "was already configured with the necessary information to"

                        einfo "access the database."

                        einfo "Please read your dspam.conf, oracle.data and the README for"

                        einfo "more info on how to setup DSPAM with Oracle."

                        einfo "objects for each user upon first use of DSPAM by that user."

                        echo

                        ;;

                "pgsql_drv" )

                        DSPAM_DB_DATA=( $(sed "s:^[\t ]*$:###:gI" "${ROOT}${CONFDIR}/pgsql.data") )

                        for DB_DATA_INDEX in $(seq 0 $((${#DSPAM_DB_DATA[@]} - 1))); do

                                [[ "${DSPAM_DB_DATA[$DB_DATA_INDEX]}" = "###" ]] && DSPAM_DB_DATA[$DB_DATA_INDEX]=""

                        done

                        DSPAM_PgSQL_USER="${DSPAM_DB_DATA[2]}"

                        DSPAM_PgSQL_PWD="${DSPAM_DB_DATA[3]}"

                        DSPAM_PgSQL_DB="${DSPAM_DB_DATA[4]}"

                        ewarn "When prompted for a password, please enter your PgSQL postgres password"

                        ewarn

                        einfo "Creating DSPAM PostgreSQL database \"${DSPAM_PgSQL_DB}\" and user \"${DSPAM_PgSQL_USER}\""

                        /usr/bin/psql -h localhost -d template1 -U postgres -c "CREATE USER ${DSPAM_PgSQL_USER} WITH PASSWORD '${DSPAM_PgSQL_PWD}' NOCREATEDB NOCREATEUSER; CREATE DATABASE ${DSPAM_PgSQL_DB}; GRANT ALL PRIVILEGES ON DATABASE ${DSPAM_PgSQL_DB} TO ${DSPAM_PgSQL_USER}; GRANT ALL PRIVILEGES ON SCHEMA public TO ${DSPAM_PgSQL_USER}; UPDATE pg_database SET datdba=(SELECT usesysid FROM pg_shadow WHERE usename='${DSPAM_PgSQL_USER}') WHERE datname='${DSPAM_PgSQL_DB}';"

                        einfo "Creating DSPAM PostgreSQL tables"

                        PGUSER=${DSPAM_PgSQL_USER} PGPASSWORD=${DSPAM_PgSQL_PWD} /usr/bin/psql -d ${DSPAM_PgSQL_DB} -U ${DSPAM_PgSQL_USER} -f ${CONFDIR}/pgsql_objects.sql 1>/dev/null 2>&1

                        if use virtual-users ; then

                                einfo "Creating DSPAM PostgreSQL database for virtual-users users"

                                PGUSER=${DSPAM_PgSQL_USER} PGPASSWORD=${DSPAM_PgSQL_PWD} /usr/bin/psql -d ${DSPAM_PgSQL_DB} -U ${DSPAM_PgSQL_USER} -f ${CONFDIR}/pgsql_virtual-users.sql 1>/dev/null 2>&1

                        fi

                        ;;

                "sqlite_drv" | "sqlite3_drv" | "hash_drv" )

                        einfo "The selected driver does not need to be configured."

                        echo

                        ;;

                * )

                        einfo "Unknown driver ${storage_drivers[${DSPAM_CONFIGURE_DRIVER}]}."

                        echo

                        ;;

        esac

}
```

mail-filter/dspam/files/3.6.6-dspam.cron:

```
#!/bin/bash

# Copyright 1999-2005 Gentoo Foundation

# Distributed under the terms of the GNU General Public License v2

#

# Remove old signatures and unimportant tokens from the DSPAM database

#

#

# Function to run dspam_clean

#

run_dspam_clean() {

        if [[ ! -f "/usr/bin/dspam_clean" ]]

        then

                echo "/usr/bin/dspam_clean not found!"

                return 1

        else

                /usr/bin/dspam_clean -s -p -u >/dev/null 2>&1

                return 0

        fi

}

#

# Function to check if we have all needed tools

#

check_for_tools() {

        local myrc=0

        for foo in awk head tail cut sed

        do

                DSPAM_Check_App="$(${foo} --version 2>&1)"

                if [[ "${DSPAM_Check_App/ *}" == "bash:" ]]

                then

                        echo "Command ${foo} not found!"

                        myrc=1

                fi

        done

        return ${myrc}

}

#

# Check for needed tools

#

check_for_tools

if [[ "$?" -ne "0" ]]

then

        # We have not all needed tools installed. Run just the dspam_clean part.

        run_dspam_clean

        exit $?

fi

#

# Try to get DSPAM home directory

#

DSPAM_HOMEDIR="$(grep ^dspam /etc/passwd|awk -F : '{print $6}')"

if [ ! -f ${DSPAM_HOMEDIR}/*.data ]

then

        # Something is wrong in passwd! Check if /etc/mail/dspam exists instead.

        if [ -f /etc/mail/dspam/*.data ]

        then

                DSPAM_HOMEDIR="/etc/mail/dspam"

        fi

fi

if [[ -f "${DSPAM_HOMEDIR}/mysql.data" ]]

then

        if [[ ! -f "/usr/bin/mysql_config" ]]

        then

                echo "Can not run MySQL purge script:"

                echo "  /usr/bin/mysql_config does not exist"

                run_dspam_clean

                exit 1

        fi

        DSPAM_MySQL_PURGE_SQL=""

        DSPAM_MySQL_VER="$(/usr/bin/mysql_config --version | sed "s:\([^0-9\.]*\)::g")"

        DSPAM_MySQL_MAJOR="$(echo "${DSPAM_MySQL_VER}" | cut -d. -f1)"

        DSPAM_MySQL_MINOR="$(echo "${DSPAM_MySQL_VER}" | cut -d. -f2)"

        DSPAM_MySQL_MICRO="$(echo "${DSPAM_MySQL_VER}" | cut -d. -f3)"

        DSPAM_MySQL_INT="$((DSPAM_MySQL_MAJOR * 65536 + DSPAM_MySQL_MINOR * 256 + DSPAM_MySQL_MICRO))"

        # For MySQL >= 4.1 use the new purge script

        if [[ "${DSPAM_MySQL_INT}" -ge "262400" ]]

        then

                if [[ -f "${DSPAM_HOMEDIR}/config/mysql_purge-4.1-optimized.sql" || -f "${DSPAM_HOMEDIR}/mysql_purge-4.1-optimized.sql" ]]

                then

                        [[ -f "${DSPAM_HOMEDIR}/config/mysql_purge-4.1-optimized.sql" ]] && DSPAM_MySQL_PURGE_SQL="${DSPAM_HOMEDIR}/config/mysql_purge-4.1-optimized.sql"

                        [[ -f "${DSPAM_HOMEDIR}/mysql_purge-4.1-optimized.sql" ]] && DSPAM_MySQL_PURGE_SQL="${DSPAM_HOMEDIR}/mysql_purge-4.1-optimized.sql"

                else

                        [[ -f "${DSPAM_HOMEDIR}/config/mysql_purge-4.1.sql" ]] && DSPAM_MySQL_PURGE_SQL="${DSPAM_HOMEDIR}/config/mysql_purge-4.1.sql"

                        [[ -f "${DSPAM_HOMEDIR}/mysql_purge-4.1.sql" ]] && DSPAM_MySQL_PURGE_SQL="${DSPAM_HOMEDIR}/mysql_purge-4.1.sql"

                fi

        else

                [[ -f "${DSPAM_HOMEDIR}/config/mysql_purge.sql" ]] && DSPAM_MySQL_PURGE_SQL="${DSPAM_HOMEDIR}/config/mysql_purge.sql"

                [[ -f "${DSPAM_HOMEDIR}/mysql_purge.sql" ]] && DSPAM_MySQL_PURGE_SQL="${DSPAM_HOMEDIR}/mysql_purge.sql"

        fi

        if [[ "${DSPAM_MySQL_PURGE_SQL}" == "" ]]

        then

                echo "Can not run MySQL purge script:"

                echo "  No mysql_purge SQL script found"

                run_dspam_clean

                exit 1

        fi

        if [[ ! -f "/usr/bin/mysql" ]]

        then

                echo "Can not run MySQL purge script:"

                echo "  /usr/bin/mysql does not exist"

                run_dspam_clean

                exit 1

        fi

        # Get DSPAM MySQL username and password

        DSPAM_MySQL_HOST="$(cat ${DSPAM_HOMEDIR}/mysql.data|head -n 1|tail -n 1)"

        DSPAM_MySQL_PORT="$(cat ${DSPAM_HOMEDIR}/mysql.data|head -n 2|tail -n 1)"

        DSPAM_MySQL_USER="$(cat ${DSPAM_HOMEDIR}/mysql.data|head -n 3|tail -n 1)"

        DSPAM_MySQL_PWD="$(cat ${DSPAM_HOMEDIR}/mysql.data|head -n 4|tail -n 1)"

        DSPAM_MySQL_DB="$(cat ${DSPAM_HOMEDIR}/mysql.data|head -n 5|tail -n 1)"

        # Run the MySQL purge script

        (/usr/bin/mysql --user="${DSPAM_MySQL_USER}" --password="${DSPAM_MySQL_PWD}" ${DSPAM_MySQL_DB} < ${DSPAM_MySQL_PURGE_SQL}) 1>/dev/null 2>&1

        # Run the dspam_clean command

        run_dspam_clean

        # Optimize the MySQL tables for DSPAM

        for foo in $(/usr/bin/mysql --user="${DSPAM_MySQL_USER}" --password="${DSPAM_MySQL_PWD}" --silent --skip-column-names --batch ${DSPAM_MySQL_DB} -e 'SHOW TABLES;' 2>&1)

        do

                (/usr/bin/mysql --user="${DSPAM_MySQL_USER}" --password="${DSPAM_MySQL_PWD}" ${DSPAM_MySQL_DB} -e "OPTIMIZE TABLE ${foo};") 1>/dev/null 2>&1

        done

        exit 0

elif [[ -f "${DSPAM_HOMEDIR}/pgsql.data" ]]

then

        DSPAM_PgSQL_PURGE_SQL=""

        [[ -f "${DSPAM_HOMEDIR}/config/pgsql_purge.sql" ]] && DSPAM_PgSQL_PURGE_SQL="${DSPAM_HOMEDIR}/config/pgsql_purge.sql"

        [[ -f "${DSPAM_HOMEDIR}/pgsql_purge.sql" ]] && DSPAM_PgSQL_PURGE_SQL="${DSPAM_HOMEDIR}/pgsql_purge.sql"

        if [[ "${DSPAM_PgSQL_PURGE_SQL}" == "" ]]

        then

                echo "Can not run PostgreSQL purge script:"

                echo "  No pgsql_purge SQL script found"

                run_dspam_clean

                exit 1

        fi

        if [[ ! -f "/usr/bin/psql" ]]

        then

                echo "Can not run PostgreSQL purge script:"

                echo "  /usr/bin/psql does not exist"

                run_dspam_clean

                exit 1

        fi

        # Get DSPAM PostgreSQL username and password

        DSPAM_PgSQL_HOST="$(cat ${DSPAM_HOMEDIR}/pgsql.data|head -n 1|tail -n 1)"

        DSPAM_PgSQL_PORT="$(cat ${DSPAM_HOMEDIR}/pgsql.data|head -n 2|tail -n 1)"

        DSPAM_PgSQL_USER="$(cat ${DSPAM_HOMEDIR}/pgsql.data|head -n 3|tail -n 1)"

        DSPAM_PgSQL_PWD="$(cat ${DSPAM_HOMEDIR}/pgsql.data|head -n 4|tail -n 1)"

        DSPAM_PgSQL_DB="$(cat ${DSPAM_HOMEDIR}/pgsql.data|head -n 5|tail -n 1)"

        # Run the PostgreSQL purge script

        (PGUSER=${DSPAM_PgSQL_USER} PGPASSWORD=${DSPAM_PgSQL_PWD} /usr/bin/psql -U ${DSPAM_PgSQL_USER} -d ${DSPAM_PgSQL_DB} -p ${DSPAM_PgSQL_PORT} -h ${DSPAM_PgSQL_HOST} -f ${DSPAM_PgSQL_PURGE_SQL}) 1>/dev/null 2>&1

        # Run the dspam_clean command

        run_dspam_clean

        exit 0

elif [[ -f "${DSPAM_HOMEDIR}/oracle.data" ]]

then

        DSPAM_Oracle_PURGE_SQL=""

        [[ -f "${DSPAM_HOMEDIR}/config/ora_purge.sql" ]] && DSPAM_Oracle_PURGE_SQL="${DSPAM_HOMEDIR}/config/ora_purge.sql"

        [[ -f "${DSPAM_HOMEDIR}/ora_purge.sql" ]] && DSPAM_Oracle_PURGE_SQL="${DSPAM_HOMEDIR}/ora_purge.sql"

        if [[ "${DSPAM_Oracle_PURGE_SQL}" == "" ]]

        then

                echo "Can not run Oracle purge script:"

                echo "  No ora_purge SQL script found"

                run_dspam_clean

                exit 1

        fi

        if [[ ! -f "/usr/bin/sqlplus" ]]

        then

                echo "Can not run PostgreSQL purge script:"

                echo "  /usr/bin/sqlplus does not exist"

                run_dspam_clean

                exit 1

        fi

        # Get DSPAM PostgreSQL username and password

        DSPAM_Oracle_DBLINK="$(cat ${DSPAM_HOMEDIR}/oracle.data|head -n 1|tail -n 1)"

        DSPAM_Oracle_USER="$(cat ${DSPAM_HOMEDIR}/oracle.data|head -n 2|tail -n 1)"

        DSPAM_Oracle_PWD="$(cat ${DSPAM_HOMEDIR}/oracle.data|head -n 3|tail -n 1)"

        DSPAM_Oracle_SCHEMA="$(cat ${DSPAM_HOMEDIR}/oracle.data|head -n 4|tail -n 1)"

        # Run the Oracle purge script

        (/usr/bin/sqlplus -s ${DSPAM_Oracle_USER}/${DSPAM_Oracle_PWD} @${DSPAM_Oracle_PURGE_SQL}) 1>/dev/null 2>&1

        # Run the dspam_clean command

        run_dspam_clean

        exit 0

else

        run_dspam_clean

        exit $?

fi
```

mail-filter/dspam/files/dspam.rc:

```
#!/sbin/runscript

# Copyright 1999-2005 Gentoo Foundation

# Distributed under the terms of the GNU General Public License v2

# $Header: Exp $

depend() {

        use logger

        need net

        before mta

        after pg_autovacuum postgresql mysql

}

checkconfig() {

        if [ ! -f "/etc/mail/dspam/dspam.conf" ]

        then

                eerror "You need a DSPAM configuration in /etc/mail/dspam/dspam.conf"

                return 1

        fi

        if (! grep -q "^ServerPID" /etc/mail/dspam/dspam.conf); then

                eerror "ServerPID missing in DSPAM configuration /etc/mail/dspam/dspam.conf"

                return 1

        fi

}

start() {

        checkconfig || return 1

        ebegin "Starting DSPAM"

        start-stop-daemon --start --quiet --background \

                --exec /usr/bin/dspam -- --daemon

        eend ${?}

}

stop() {

        checkconfig || return 1

        local DSPAM_PID="$(grep "^ServerPID" /etc/mail/dspam/dspam.conf)"

        DSPAM_PID="${DSPAM_PID/ServerPID/}"

        ebegin "Stopping DSPAM"

        start-stop-daemon --stop --quiet --pidfile ${DSPAM_PID}

        eend ${?}

}
```

mail-filter/dspam/files/logrotate.dspam:

```
/var/log/dspam/sql.errors /var/log/dspam/system.log /var/log/dspam/dspam.debug /var/log/dspam/dspam.messages {

        weekly

        compress

        create 0644 dspam dspam

}
```

Some short info about the ebuild and what new is:The neural network stuff is gone, since Jonathan does not support it any moreMultiple dynamic storage drivers supportSimplified the MySQL part in configure (you only need once to type in the password)Fixed serval file and directory layout errorsRemoved Berjkey DB stuff, since Jonathan does not support it any more (soon Oracle will go away as well)Removed the DSPAM SA trainer (since Jonathan does not support it any more. Use dspam_train if you need/want to train DSPAM)etc...

You can verify the 3.6.6 binary by executing:

```
dspam --version
```

The result should look like this:

```
DSPAM Anti-Spam Suite 3.6.6 (agent/library)

Copyright (c) 2002-2006 Jonathan A. Zdziarski

http://dspam.nuclearelephant.com

DSPAM may be copied only under the terms of the GNU General Public License,

a copy of which can be found with the DSPAM distribution kit.

Configuration parameters: --prefix=/usr --host=i686-pc-linux-gnu --mandir=/usr/share/man --infodir=/usr/share/info --datadir=/usr/share --sysconfdir=/etc --localstatedir=/var/lib --enable-long-username --with-delivery-agent=/usr/bin/procmail --enable-large-scale --with-dspam-home=/var/spool/dspam --sysconfdir=/etc/mail/dspam --enable-virtual-users --with-mysql-includes=/usr/include/mysql --with-mysql-libraries=/usr/lib/mysql --enable-preferences-extension --enable-daemon --enable-virtual-users --with-storage-driver=sqlite_drv,mysql_drv,hash_drv --build=i686-pc-linux-gnu
```

See the the multiple with-storage-driver entires?

Do you want me to post the CVS ebuild as well?

cheers

SteveB

----------

## magic919

SteveB,

That's plenty for me to digest I'd say.  Better than my mere updates of the ebuild.  I've just run through to 3.6.6 this morning but I'll do again with your ebuild.  I've also updated the webui and that really is a laugh when you use the ebuild and then spend time copying files around.

I'll get back on here once I've taken this all on board.  Then I'll stick it into my wiki article, if that's ok.

Cheers,

Tony

----------

## steveb

 *magic919 wrote:*   

> Then I'll stick it into my wiki article, if that's ok.

 No problem with me...

----------

## steveb

I have as well some other stuff for DSPAM. For example I have a script for mass retraining of forwarded mails. If some one is using Postfix and interessed in the solution, then let me know.

cheers

SteveB

----------

## steveb

 *magic919 wrote:*   

> I'll get back on here once I've taken this all on board.

 Let me know if you have trouble or need more help (you can post success as well  :Smile: ).

cheers

SteveB

----------

## petrjanda

Steve,

I need a bit of a postfix advice.

Let say MX1 has IP 1.1.1.1 and MX2 has IP 2.2.2.2.

MX2 is a backend smtp server. MX1 has DSPAM on it and only does filtering for domain aaaa.com. We host around 200 domains besides aaaa.com which is the company domain. Not 1 of those 200 domains emails go through MX1 but go directly to MX2(this is on purpose). What I would like to do is configure postfix on MX2 to: reject mails for aaaa.com unless it was relayed by MX1 while still allowing MX2 to receive mail for the other 200 domains.

The reason for this: spammers still send a lot of spam (to @aaaa.com customers) directly to MX2, thus bypassing MX1 with dspam on it.

Any ideas?

----------

## steveb

 *petrjanda wrote:*   

> Steve,
> 
> I need a bit of a postfix advice.
> 
> Let say MX1 has IP 1.1.1.1 and MX2 has IP 2.2.2.2.
> ...

 

You could do that with Restriction Classes in Postfix. I post quickly a solution using PCRE and CIDR for the table. But you could use hash or anything other your Postfix allows.

File /etc/postfix/check_domain_aaaa_com_recipient_access.pcre:

```
## /etc/postfix/check_domain_aaaa_com_recipient_access.pcre

#

# main.cf:

# smtpd_restriction_classes =

#   ...

#   check_if_comming_from_mx1

#   ...

#

# check_if_comming_from_mx1 =

#   check_client_access cidr:/etc/postfix/mx1.cidr

#   reject

#

# smtpd_recipient_restrictions =

#   ...

#   check_recipient_access pcre:/etc/postfix/check_domain_aaaa_com_recipient_access.pcre

#   ...

##

# Check for recipients in domain aaaa.com, if the mail is comming from MX1

/^.*\@aaaa\.com$/         check_if_comming_from_mx1
```

File /etc/postfix/mx1.cidr:

```
# /etc/postfix/mx1.cidr

#

# Netblock used by MX1

1.1.1.1/32            OK
```

If you don't want a delivery to MX2 from the outside to any address in domain aaaa.com to be just rejected, but rather want a message to be given back to the connecting MTA, then you could use the code below instead the one above.

File /etc/postfix/mx1.cidr:

```
# /etc/postfix/mx1.cidr

#

# Netblock used by MX1

1.1.1.1/32            OK

# Everyone else

0.0.0.0/0            REJECT Please use server MX1 for mail delivery
```

In the header of the file /etc/postfix/check_domain_aaaa_com_recipient_access.pcre you can see, how to glue all that together in Postfix. Do you understand how to do it? Do you need more info?

cheers

SteveB

----------

## petrjanda

 *steveb wrote:*   

>  *petrjanda wrote:*   Steve,
> 
> I need a bit of a postfix advice.
> 
> Let say MX1 has IP 1.1.1.1 and MX2 has IP 2.2.2.2.
> ...

 

Cheers,

I think I understand it. I'll let you know how it worked out.

----------

## steveb

 *petrjanda wrote:*   

> I think I understand it. I'll let you know how it worked out.

 Okay.... me waiting here for your response  :Smile: 

----------

## steveb

New version of mail-filter/dspam/dspam-3.6.6.ebuild. This ebuild forces latin1 as character set and latin1_general_ci as collation on the DSPAM tables. The reason for that is, that newer versions of MySQL do use utf8 and utf8 needs much more space in MySQL and there is no benefit for DSPAM using utf8:

```
# Copyright 1999-2005 Gentoo Foundation

# Distributed under the terms of the GNU General Public License v2

# $Header: $

inherit eutils

DESCRIPTION="A statistical-algorithmic hybrid anti-spam filter"

SRC_URI="http://dspam.nuclearelephant.com/sources/${P}.tar.gz"

HOMEPAGE="http://dspam.nuclearelephant.com/"

LICENSE="GPL-2"

IUSE="clamav debug large-domain ldap logrotate mysql oci8 postgres sqlite sqlite3 virtual-users user-homedirs"

DEPEND="clamav? ( >=app-antivirus/clamav-0.86 )

                mysql? ( >=dev-db/mysql-3.23 )

                sqlite? ( <dev-db/sqlite-3 )

                sqlite3? ( =dev-db/sqlite-3* )

                postgres? ( >=dev-db/postgresql-7.4.3 )

                ldap? ( net-nds/openldap )

                "

RDEPEND="${DEPEND}

                sys-process/cronbase

                logrotate? ( app-admin/logrotate )

                "

KEYWORDS="~x86 ~ppc ~alpha ~amd64"

SLOT="0"

# some FHS-like structure

HOMEDIR="/var/spool/dspam"

CONFDIR="/etc/mail/dspam"

LOGDIR="/var/log/dspam"

pkg_setup() {

        has_version ">sys-kernel/linux-headers-2.6" || (

                einfo "To use the new DSPAM deamon mode, you need to emerge"

                einfo ">sys-kernel/linux-headers-2.6 and rebuild glibc to support NPTL"

        )

        if use virtual-users && use user-homedirs ; then

                ewarn "If the users are virtual, then they probably should not have home directories."

        fi

        if use user-homedirs ; then

                ewarn "WARNING: dspam-web will not work with user-homedirs. Disable this USE flag"

                ewarn "if you intend on using dspam-web."

        fi

        id dspam 2>/dev/null || enewgroup dspam 26

        id dspam 2>/dev/null || enewuser dspam 26 /bin/bash ${HOMEDIR} dspam

}

src_compile() {

        local myconf

        local driver_count=0

        local myconf_driver=""

        myconf="${myconf} --enable-long-username"

        # Override the default delivery agent. This sets only

        # the default, which may be changed in dspam.conf.

        if has_version "mail-mta/postfix"; then

                myconf="${myconf} --with-delivery-agent=/usr/sbin/sendmail"

        elif has_version "mail-mta/exim"; then

                myconf="${myconf} --with-delivery-agent=/usr/sbin/exim"

        elif has_version "mail-mta/sendmail"; then

                myconf="${myconf} --with-delivery-agent=/usr/sbin/sendmail"

        elif has_version "mail-filter/procmail"; then

                myconf="${myconf} --with-delivery-agent=/usr/bin/procmail"

        fi

        use large-domain && myconf="${myconf} --enable-large-scale" ||\

            myconf="${myconf} --enable-domain-scale"

        myconf="${myconf} --with-dspam-home=${HOMEDIR}"

        myconf="${myconf} --sysconfdir=${CONFDIR}"

        use user-homedirs && myconf="${myconf} --enable-homedir"

        use clamav && myconf="${myconf} --enable-clamav"

        # enables support for debugging (touch /etc/dspam/.debug to turn on)

        # optional: even MORE debugging output, use with extreme caution!

        use debug && myconf="${myconf} --enable-debug --enable-verbose-debug --enable-bnr-debug"

        use ldap && myconf="${myconf} --enable-ldap"

        # select storage driver

        if use sqlite ; then

                if [[ ${driver_count} -eq 0 ]] ; then

                        myconf_driver="--with-storage-driver=sqlite_drv"

                else

                        myconf_driver="${myconf_driver},sqlite_drv"

                fi

                myconf="${myconf} --enable-virtual-users"

                driver_count=$((${driver_count} + 1))

        fi

        if use sqlite3 ; then

                if [[ ${driver_count} -eq 0 ]] ; then

                        myconf_driver="--with-storage-driver=sqlite3_drv"

                else

                        myconf_driver="${myconf_driver},sqlite3_drv"

                fi

                myconf="${myconf} --enable-virtual-users"

                driver_count=$((${driver_count} + 1))

        fi

        if use mysql; then

                if [[ ${driver_count} -eq 0 ]] ; then

                        myconf_driver="--with-storage-driver=mysql_drv"

                else

                        myconf_driver="${myconf_driver},mysql_drv"

                fi

                myconf="${myconf} --with-mysql-includes=/usr/include/mysql"

                myconf="${myconf} --with-mysql-libraries=/usr/lib/mysql"

                myconf="${myconf} --enable-preferences-extension"

                driver_count=$((${driver_count} + 1))

                if has_version ">sys-kernel/linux-headers-2.6"; then

                        myconf="${myconf} --enable-daemon"

                fi

                use virtual-users && myconf="${myconf} --enable-virtual-users"

                # an experimental feature available with MySQL and PgSQL backend

        fi

        if use oci8 ; then

                if [[ ${driver_count} -eq 0 ]] ; then

                        myconf_driver="--with-storage-driver=ora_drv"

                else

                        myconf_driver="${myconf_driver},ora_drv"

                fi

                myconf="${myconf} --with-oracle-home=${ORACLE_HOME}"

                myconf="${myconf} --enable-virtual-users"

                # I am in no way a Oracle specialist. If someone knows

                # how to query the version of Oracle, then let me know.

                if (expr ${ORACLE_HOME/*\/} : 10 1>/dev/null 2>&1); then

                        myconf="${myconf} --with-oracle-version=10"

                fi

        fi

        if use postgres ; then

                if [[ ${driver_count} -eq 0 ]] ; then

                        myconf_driver="--with-storage-driver=pgsql_drv"

                else

                        myconf_driver="${myconf_driver},pgsql_drv"

                fi

                myconf="${myconf} --with-pgsql-includes=/usr/include/postgresql"

                myconf="${myconf} --with-pgsql-libraries=/usr/lib/postgresql"

                myconf="${myconf} --enable-preferences-extension"

                driver_count=$((${driver_count} + 1))

                if has_version ">sys-kernel/linux-headers-2.6"; then

                        myconf="${myconf} --enable-daemon"

                fi

                use virtual-users && myconf="${myconf} --enable-virtual-users"

                # an experimental feature available with MySQL and PgSQL backend

        fi

        if [[ ${driver_count} -eq 0 ]] ; then

                myconf_driver="--with-storage-driver=hash_drv"

                driver_count=$((${driver_count} + 1))

        fi

        myconf_driver="${myconf_driver},hash_drv"

        myconf="${myconf} ${myconf_driver}"

        econf ${myconf} || die

        emake || die

}

src_install () {

        # Fix issues with older dspam configuration

        CONFIG_PROTECT="${CONFIG_PROTECT} ${HOMEDIR} ${CONFDIR} /var/run/dspam"

        CONFIG_PROTECT_MASK="${CONFIG_PROTECTMASK/${HOMEDIR}/}"

        CONFIG_PROTECT_MASK="${CONFIG_PROTECTMASK/${CONFDIR}/}"

        # open up perms on $HOMEDIR

        diropts -m0775 -o dspam -g dspam

        dodir ${HOMEDIR}

        keepdir ${HOMEDIR}

        # keeps dspam data in $CONFDIR

        diropts -m0775 -o dspam -g dspam

        dodir ${CONFDIR}

        keepdir ${CONFDIR}

        # make install

        make DESTDIR=${D} install || die

        chmod o+s ${D}/usr/bin/dspam

        chmod o+s ${D}/usr/bin/dspam_stats

        # documentation

        dodoc CHANGELOG LICENSE README* RELEASE.NOTES UPGRADING

        docinto doc

        dodoc doc/*.txt

        doman man/dspam*

        # build some initial configuration data

        [ ! -f ${CONFDIR}/dspam.conf ] \

                && cp src/dspam.conf ${T}/dspam.conf

        if use mysql || use postgres; then

                if has_version ">sys-kernel/linux-headers-2.6"; then

                        # keeps dspam socket for deamon in /var/run/dspam

                        diropts -m0775 -o dspam -g dspam

                        dodir /var/run/dspam

                        keepdir /var/run/dspam

                        # We use sockets for the deamon instead of tcp port 24

                        sed -e 's:^#*\(ServerDomainSocketPath[\t ]\{1,\}\).*:\1\"/var/run/dspam/dspam.sock\":gI' \

                                -e 's:^#*\(ServerPID[\t ]\{1,\}\).*:\1/var/run/dspam/dspam.pid:gI' \

                                -i ${T}/dspam.conf

                        # dspam init script

                        exeinto /etc/init.d

                        exeopts -m0755 -o root -g root

                        newexe ${FILESDIR}/dspam.rc dspam

                fi

        fi

        # generate random password

        local PASSWORD="${RANDOM}${RANDOM}${RANDOM}${RANDOM}"

        # database related configuration and scripts

        if use sqlite; then

                insinto ${CONFDIR}

                insopts -m644 -o dspam -g dspam

                newins src/tools.sqlite_drv/purge-2.sql sqlite_purge.sql

        fi

        if use sqlite3; then

                insinto ${CONFDIR}

                insopts -m644 -o dspam -g dspam

                newins src/tools.sqlite_drv/purge-3.sql sqlite3_purge.sql

        fi

        if use mysql; then

                # Use existing configuration if possible

                if [[ -f ${ROOT}${CONFDIR}/mysql.data ]]; then

                        DSPAM_DB_DATA=( $(sed "s:^[\t ]*$:###:gI" "${ROOT}${CONFDIR}/mysql.data") )

                        for DB_DATA_INDEX in $(seq 0 $((${#DSPAM_DB_DATA[@]} - 1))); do

                                [[ "${DSPAM_DB_DATA[$DB_DATA_INDEX]}" = "###" ]] && DSPAM_DB_DATA[$DB_DATA_INDEX]=""

                        done

                else

                        DSPAM_DB_DATA[0]="/var/run/mysqld/mysqld.sock"

                        DSPAM_DB_DATA[1]=""

                        DSPAM_DB_DATA[2]="dspam"

                        DSPAM_DB_DATA[3]="${PASSWORD}"

                        DSPAM_DB_DATA[4]="dspam"

                        DSPAM_DB_DATA[5]="true"

                fi

                # Modify configuration and create mysql.data file

                sed -e "s:^#*\(MySQLServer[\t ]\{1,\}\).*:\1${DSPAM_DB_DATA[0]}:gI" \

                        -e "s:^#*\(MySQLPort[\t ]\{1,\}\).*:\1${DSPAM_DB_DATA[1]}:gI" \

                        -e "s:^#*\(MySQLUser[\t ]\{1,\}\).*:\1${DSPAM_DB_DATA[2]}:gI" \

                        -e "s:^#*\(MySQLPass[\t ]\{1,\}\).*:\1${DSPAM_DB_DATA[3]}:gI" \

                        -e "s:^#*\(MySQLDb[\t ]\{1,\}\).*:\1${DSPAM_DB_DATA[4]}:gI" \

                        -e "s:^#*\(MySQLCompress[\t ]\{1,\}\).*:\1${DSPAM_DB_DATA[5]}:gI" \

                        -i ${T}/dspam.conf

                for DB_DATA_INDEX in $(seq 0 $((${#DSPAM_DB_DATA[@]} - 1))); do

                        echo "${DSPAM_DB_DATA[$DB_DATA_INDEX]}" >> ${T}/mysql.data

                done

                insinto ${CONFDIR}

                insopts -m644 -o dspam -g dspam

                doins ${T}/mysql.data

                newins src/tools.mysql_drv/mysql_objects-space.sql mysql_objects-space.sql

                newins src/tools.mysql_drv/mysql_objects-speed.sql mysql_objects-speed.sql

                newins src/tools.mysql_drv/mysql_objects-4.1.sql mysql_objects-4.1.sql

                newins src/tools.mysql_drv/virtual_users.sql mysql_virtual_users.sql

                newins src/tools.mysql_drv/purge.sql mysql_purge.sql

                newins src/tools.mysql_drv/purge-4.1.sql mysql_purge-4.1.sql

        fi

        if use oci8 ; then

                # Use existing configuration if possible

                if [ -f ${ROOT}${CONFDIR}/oracle.data ]; then

                        DSPAM_DB_DATA=( $(cat "${ROOT}${CONFDIR}/oracle.data") )

                        for DB_DATA_INDEX in $(seq 0 $((${#DSPAM_DB_DATA[@]} - 1))); do

                                [[ "${DSPAM_DB_DATA[$DB_DATA_INDEX]}" = "###" ]] && DSPAM_DB_DATA[$DB_DATA_INDEX]=""

                        done

                else

                        DSPAM_DB_DATA[0]="(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=127.0.0.1)(PORT=1521))(CONNECT_DATA=(SID=PROD)))"

                        DSPAM_DB_DATA[1]="dspam"

                        DSPAM_DB_DATA[2]="${PASSWORD}"

                        DSPAM_DB_DATA[3]="dspam"

                fi

                # Modify configuration and create oracle.data file

                sed -e "s:^#*\(OraServer[\t ]\{1,\}\).*:\1${DSPAM_DB_DATA[0]}:gI" \

                        -e "s:^\(OraUser[\t ]\{1,\}\).*:\1${DSPAM_DB_DATA[1]}:gI" \

                        -e "s:^\(OraPass[\t ]\{1,\}\).*:\1${DSPAM_DB_DATA[2]}:gI" \

                        -e "s:^\(OraSchema[\t ]\{1,\}\).*:\1${DSPAM_DB_DATA[3]}:gI"\

                        -i ${T}/dspam.conf

                for DB_DATA_INDEX in $(seq 0 $((${#DSPAM_DB_DATA[@]} - 1))); do

                        echo "${DSPAM_DB_DATA[$DB_DATA_INDEX]}" >> ${T}/oracle.data

                done

                insinto ${CONFDIR}

                insopts -m644 -o dspam -g dspam

                doins ${T}/oracle.data

                newins src/tools.ora_drv/oral_objects.sql ora_objects.sql

                newins src/tools.ora_drv/virtual_users.sql ora_virtual_users.sql

                newins src/tools.ora_drv/purge.sql ora_purge.sql

        fi

        if use postgres ; then

                # Use existing configuration if possible

                if [ -f ${ROOT}${CONFDIR}/pgsql.data ]; then

                        DSPAM_DB_DATA=( $(cat "${ROOT}${CONFDIR}/pgsql.data") )

                        for DB_DATA_INDEX in $(seq 0 $((${#DSPAM_DB_DATA[@]} - 1))); do

                                [[ "${DSPAM_DB_DATA[$DB_DATA_INDEX]}" = "###" ]] && DSPAM_DB_DATA[$DB_DATA_INDEX]=""

                        done

                else

                        DSPAM_DB_DATA[0]="127.0.0.1"

                        DSPAM_DB_DATA[1]="5432"

                        DSPAM_DB_DATA[2]="dspam"

                        DSPAM_DB_DATA[3]="${PASSWORD}"

                        DSPAM_DB_DATA[4]="dspam"

                fi

                # Modify configuration and create pgsql.data file

                sed -e "s:^#*\(PgSQLServer[\t ]\{1,\}\).*:\1${DSPAM_DB_DATA[0]}:gI" \

                        -e "s:^#*\(PgSQLPort[\t ]\{1,\}\).*:\1${DSPAM_DB_DATA[1]}:gI" \

                        -e "s:^#*\(PgSQLUser[\t ]\{1,\}\).*:\1${DSPAM_DB_DATA[2]}:gI" \

                        -e "s:^#*\(PgSQLPass[\t ]\{1,\}\).*:\1${DSPAM_DB_DATA[3]}:gI" \

                        -e "s:^#*\(PgSQLDb[\t ]\{1,\}\).*:\1${DSPAM_DB_DATA[4]}:gI" \

                        -e "s:^#*\(PgSQLConnectionCache[\t ]*.\):\1:gI" \

                        -i ${T}/dspam.conf

                for DB_DATA_INDEX in $(seq 0 $((${#DSPAM_DB_DATA[@]} - 1))); do

                        echo "${DSPAM_DB_DATA[$DB_DATA_INDEX]}" >> ${T}/pgsql.data

                done

                insinto ${CONFDIR}

                insopts -m644 -o dspam -g dspam

                doins ${T}/pgsql.data

                newins src/tools.pgsql_drv/pgsql_objects.sql pgsql_objects.sql

                newins src/tools.pgsql_drv/virtual_users.sql pgsql_virtual_users.sql

                newins src/tools.pgsql_drv/purge.sql pgsql_purge.sql

        fi

        sed -e "s:^\(Purge.*\):###\1:g" \

                -e "s:^#\(Purge.*\):\1:g" \

                -e "s:^###\(Purge.*\):#\1:g" \

                -i ${T}/dspam.conf

        insinto ${CONFDIR}

        insopts -m644 -o dspam -g dspam

        doins ${T}/dspam.conf

        # installs the notification messages

        diropts -m0775 -o dspam -g dspam

        dodir ${HOMEDIR}/txt

        keepdir ${HOMEDIR}/txt

        insinto ${HOMEDIR}/txt

        insopts -m644 -o dspam -g dspam

        doins ${S}/txt/*.txt

        # Create the opt-in / opt-out directories

        diropts -m0775 -o dspam -g dspam

        dodir ${HOMEDIR}/opt-in

        keepdir ${HOMEDIR}/opt-in

        dodir ${HOMEDIR}/opt-out

        keepdir ${HOMEDIR}/opt-out

        # Create the log directory

        diropts -m0775 -o dspam -g dspam

        dodir ${HOMEDIR}/log

        keepdir ${HOMEDIR}/log

        # logrotation scripts

        diropts -m0755 -o dspam -g dspam

        dodir /etc/logrotate.d

        keepdir /etc/logrotate.d

        insinto /etc/logrotate.d

        insopts -m0644 -o dspam -g dspam

        newins ${FILESDIR}/logrotate.dspam dspam

        # dspam cron job

        diropts -m0755 -o dspam -g dspam

        dodir /etc/cron.daily

        keepdir /etc/cron.daily

        exeinto /etc/cron.daily

        exeopts -m0755 -o dspam -g dspam

        cp ${FILESDIR}/${PV}-dspam.cron ${T}/dspam.cron

        doexe ${T}/dspam.cron

        # dspam enviroment

        echo -ne "CONFIG_PROTECT=\"${HOMEDIR}/txt ${HOMEDIR}/opt-in ${HOMEDIR}/opt-out /var/run/dspam\"\n\n" > ${T}/40dspam

        doenvd ${T}/40dspam || die

}

pkg_postinst() {

        env-update

        if use mysql || use postgres || use oci8; then

                echo

                einfo "To setup DSPAM to run out-of-the-box on your system, run:"

                einfo " ebuild /var/db/pkg/${CATEGORY}/${PF}/${PF}.ebuild config"

                echo

        fi

        if use mysql || use postgres; then

                if has_version ">sys-kernel/linux-headers-2.6"; then

                        echo

                        einfo "If you want to run DSPAM in the new deamon mode. Remember"

                        einfo "to make the DSPAM daemon start durig boot:"

                        einfo "  rc-update add dspam default"

                        echo

                fi

        fi

        echo

        einfo "Don't forgett to add dspam_logrotoate to your cron."

        einfo "For example (purging log entries older than 30 days):"

        einfo "0 0 * * *  root  dspam_logrotate -a 30 -d ${HOMEDIR}/data"

        echo

}

pkg_config () {

        local storage_drivers_count=-1

        local storage_drivers=( $(dspam --version | sed -n "s:^.*\-\-with\-storage\-driver=\([^ ]*\) \-.*:\1:gIp" | sed "s:,: :gI") )

        einfo "  Please select what driver you would like to configure"

        for foo in ${storage_drivers[@]};

        do

                let storage_drivers_count="(( ${storage_drivers_count} + 1 ))"

                einfo "    [${storage_drivers_count}] ${foo}"

        done

        einfo

        einfo "    [Q] Quit"

        while true

        do

                read -n 1 -s -p "" DSPAM_CONFIGURE_DRIVER

                [[ "${DSPAM_CONFIGURE_DRIVER}" == "Q" || "${DSPAM_CONFIGURE_DRIVER}" == "q" ]] && echo && exit 0

                [[ "${DSPAM_CONFIGURE_DRIVER}" -ge "0" && "${DSPAM_CONFIGURE_DRIVER}" -le "${storage_drivers_count}" ]] && echo && break

        done

        case "${storage_drivers[${DSPAM_CONFIGURE_DRIVER}]}" in

                "mysql_drv" )

                        DSPAM_MySQL_COMMAND=""

                        DSPAM_DB_DATA=( $(sed "s:^[\t ]*$:###:gI" "${ROOT}${CONFDIR}/mysql.data") )

                        for DB_DATA_INDEX in $(seq 0 $((${#DSPAM_DB_DATA[@]} - 1))); do

                                [[ "${DSPAM_DB_DATA[$DB_DATA_INDEX]}" = "###" ]] && DSPAM_DB_DATA[$DB_DATA_INDEX]=""

                        done

                        DSPAM_MySQL_USER="${DSPAM_DB_DATA[2]}"

                        DSPAM_MySQL_PWD="${DSPAM_DB_DATA[3]}"

                        DSPAM_MySQL_DB="${DSPAM_DB_DATA[4]}"

                        DSPAM_MySQL_COMMAND="CREATE DATABASE ${DSPAM_MySQL_DB};USE ${DSPAM_MySQL_DB};"

                        einfo "DSPAM MySQL tables for data objects"

                        einfo "  Please select what kind of object database you like to use:"

                        einfo "    [1] Space optimized database"

                        einfo "    [2] Speed optimized database"

                        einfo

                        while true

                        do

                                read -n 1 -s -p "  Press 1 or 2 on the keyboard to select database" DSPAM_MySQL_DB_Type

                                [[ "${DSPAM_MySQL_DB_Type}" == "1" || "${DSPAM_MySQL_DB_Type}" == "2" ]] && echo && break

                        done

                        if [ "${DSPAM_MySQL_DB_Type}" == "1" ]

                        then

                                DSPAM_MySQL_COMMAND="${DSPAM_MySQL_COMMAND};$(grep -v "^\-\-\|^[\t ]*$\|^#" ${CONFDIR}/mysql_objects-space.sql)"

                        else

                                DSPAM_MySQL_COMMAND="${DSPAM_MySQL_COMMAND};$(grep -v "^\-\-\|^[\t ]*$\|^#" ${CONFDIR}/mysql_objects-speed.sql)"

                        fi

                        if use virtual-users ; then

                                DSPAM_MySQL_COMMAND="${DSPAM_MySQL_COMMAND};$(grep -v "^\-\-\|^[\t ]*$\|^#" ${CONFDIR}/mysql_virtual_users.sql)"

                        fi

                        DSPAM_MySQL_COMMAND="${DSPAM_MySQL_COMMAND};GRANT SELECT,INSERT,UPDATE,DELETE ON ${DSPAM_MySQL_DB}.* TO ${DSPAM_MySQL_USER}@localhost IDENTIFIED BY '${DSPAM_MySQL_PWD}';FLUSH PRIVILEGES;"

                        ewarn "When prompted for a password, please enter your MySQL root password"

                        ewarn

                        /usr/bin/mysql --user=root --host=localhost -p -e "${DSPAM_MySQL_COMMAND}"

                        # Force latin1 as character set and collation. DSPAM does not need anything more

                        # and latin1 is using less space and is faster then utf8 or other character set

                        # and/or collation.

                        DSPAM_MySQL_COMMAND="ALTER DATABASE ${DSPAM_MySQL_DB} DEFAULT CHARACTER SET latin1 COLLATE latin1_general_ci;"

                        /usr/bin/mysql  --user=${DSPAM_MySQL_USER} --password=${DSPAM_MySQL_PWD} --host=localhost --silent --skip-column-names --batch -e "USE ${DSPAM_MySQL_DB};SHOW TABLES;" | while read foo

                        do

                                DSPAM_MySQL_COMMAND="USE ${DSPAM_MySQL_DB};ALTER TABLE ${foo} CONVERT TO CHARACTER SET latin1 COLLATE latin1_general_ci;"

                        done

                        /usr/bin/mysql --user=${DSPAM_MySQL_USER} --password=${DSPAM_MySQL_PWD} --host=localhost -e "${DSPAM_MySQL_COMMAND}"

                        ;;

                "ora_drv" )

                        einfo "We have not enought Oracle knowledge to configure Oracle"

                        einfo "automatically. If you know how, please post a message in"

                        einfo "Gentoo Bugzilla."

                        echo

                        einfo "You need manually to create the Oracle user for DSPAM and"

                        einfo "the necessary database."

                        einfo "But the DSPAM configuration file dspam.conf and oracle.data"

                        einfo "was already configured with the necessary information to"

                        einfo "access the database."

                        einfo "Please read your dspam.conf, oracle.data and the README for"

                        einfo "more info on how to setup DSPAM with Oracle."

                        einfo "objects for each user upon first use of DSPAM by that user."

                        echo

                        ;;

                "pgsql_drv" )

                        DSPAM_DB_DATA=( $(sed "s:^[\t ]*$:###:gI" "${ROOT}${CONFDIR}/pgsql.data") )

                        for DB_DATA_INDEX in $(seq 0 $((${#DSPAM_DB_DATA[@]} - 1))); do

                                [[ "${DSPAM_DB_DATA[$DB_DATA_INDEX]}" = "###" ]] && DSPAM_DB_DATA[$DB_DATA_INDEX]=""

                        done

                        DSPAM_PgSQL_USER="${DSPAM_DB_DATA[2]}"

                        DSPAM_PgSQL_PWD="${DSPAM_DB_DATA[3]}"

                        DSPAM_PgSQL_DB="${DSPAM_DB_DATA[4]}"

                        ewarn "When prompted for a password, please enter your PgSQL postgres password"

                        ewarn

                        einfo "Creating DSPAM PostgreSQL database \"${DSPAM_PgSQL_DB}\" and user \"${DSPAM_PgSQL_USER}\""

                        /usr/bin/psql -h localhost -d template1 -U postgres -c "CREATE USER ${DSPAM_PgSQL_USER} WITH PASSWORD '${DSPAM_PgSQL_PWD}' NOCREATEDB NOCREATEUSER; CREATE DATABASE ${DSPAM_PgSQL_DB}; GRANT ALL PRIVILEGES ON DATABASE ${DSPAM_PgSQL_DB} TO ${DSPAM_PgSQL_USER}; GRANT ALL PRIVILEGES ON SCHEMA public TO ${DSPAM_PgSQL_USER}; UPDATE pg_database SET datdba=(SELECT usesysid FROM pg_shadow WHERE usename='${DSPAM_PgSQL_USER}') WHERE datname='${DSPAM_PgSQL_DB}';"

                        einfo "Creating DSPAM PostgreSQL tables"

                        PGUSER=${DSPAM_PgSQL_USER} PGPASSWORD=${DSPAM_PgSQL_PWD} /usr/bin/psql -d ${DSPAM_PgSQL_DB} -U ${DSPAM_PgSQL_USER} -f ${CONFDIR}/pgsql_objects.sql 1>/dev/null 2>&1

                        if use virtual-users ; then

                                einfo "Creating DSPAM PostgreSQL database for virtual-users users"

                                PGUSER=${DSPAM_PgSQL_USER} PGPASSWORD=${DSPAM_PgSQL_PWD} /usr/bin/psql -d ${DSPAM_PgSQL_DB} -U ${DSPAM_PgSQL_USER} -f ${CONFDIR}/pgsql_virtual-users.sql 1>/dev/null 2>&1

                        fi

                        ;;

                "sqlite_drv" | "sqlite3_drv" | "hash_drv" )

                        einfo "The selected driver does not need to be configured."

                        echo

                        ;;

                * )

                        einfo "Unknown driver ${storage_drivers[${DSPAM_CONFIGURE_DRIVER}]}."

                        echo

                        ;;

        esac

}
```

----------

## Asmod

Steveb,

Have you seen any problems with 3.6.8? Mine works fine but every now and then it segfaults. Is there anyway to enable to some more detailed logging from dspam?

----------

## petrjanda

 *Asmod wrote:*   

> Steveb,
> 
> Have you seen any problems with 3.6.8? Mine works fine but every now and then it segfaults. Is there anyway to enable to some more detailed logging from dspam?

 

enable verbose debug. with --enable-verbose-debug configure flag.

----------

## Asmod

I have verbose debug enabled but I still can't see why it segfaults. It does it quite often too. Maybe 25% of the time. The problem is I can't replicate this fault from the command line. If I push the exact same message back through DSPAM running it in GDB or valgrind it doesn't segfault.

I've tried to make sure my system is set to do core dumps which it is.  kill -SEGV dspam  generates a core file but when it segfaults in operation it doesn't dump one  :Sad: 

I'm using

 *Quote:*   

> mysql  Ver 14.7 Distrib 4.1.20, for pc-linux-gnu (i386) using readline 5.1

 

 *Quote:*   

> Linux daisy 2.6.15-gentoo-r1 #3 SMP PREEMPT Tue Jul 25 09:19:26 BST 2006 i686 Pentium II (Klamath) GNU/Linux

 

 *Quote:*   

> Exim version 4.60

 

and

 *Quote:*   

> DSPAM Anti-Spam Suite 3.6.8 (agent/library)
> 
> Copyright (c) 2002-2006 Jonathan A. Zdziarski
> 
> http://dspam.nuclearelephant.com
> ...

 

----------

## BlinkEye

Asmod, could you solve the problem? Maybe I have the same problem. I used 3.6.6 successfully but then the libhash_drv choked since my main user had over 1.3GB signature data and every  time he got a mail dspam tried to parse a couple of minutes until finally it may crash. So, I thought I could now upgrade to 3.6.8 and then use the mysql driver. But no matter what I do I cannot train anymore. 

I posted to the list but haven't got any solution yet:

```

On 12/07/2006 12:35 PM,  Belgarath wrote:

> On Thursday 07 December 2006 11:30, you wrote:

>> Hello guys,

>> 

>> I can't get dspam-3.6.8 to work (3.6.6 worked flawlessly with the 

>> libhash_drv but since my main users account gets so many SPAMs the 

>> signature directory got over 1.3Gb and caused dspam-3.6.6 to choke). No

>>  matter whether I compile 3.6.8 with mysql support or not I get the 

>> following error (it doesn't matter if I set the storage driver to the 

>> libhash_drv.so or not):

>> 

>> # dspam dspam: error while loading shared libraries:

>> libmysqlclient.so.14: cannot open shared object file: No such file or

>> directory

>> 

>> I'm using mysql-5.0.26-r1. When I downgrade to 3.6.6 the 

>> libmysqlclient.so.14 will be created. With 3.6.8 I have only:

>> 

>> # ls /usr/lib/mysql/libmysqlclient* /usr/lib/mysql/libmysqlclient.a

>> /usr/lib/mysql/libmysqlclient_r.a /usr/lib/mysql/libmysqlclient_r.so

>>  /usr/lib/mysql/libmysqlclient_r.so.15.0.0 

>> /usr/lib/mysql/libmysqlclient.so.15 /usr/lib/mysql/libmysqlclient.la 

>> /usr/lib/mysql/libmysqlclient_r.la

>> /usr/lib/mysql/libmysqlclient_r.so.15 /usr/lib/mysql/libmysqlclient.so

>>  /usr/lib/mysql/libmysqlclient.so.15.0.0

>> 

>> I'm actually lost - could someone give me any pointer or information

>> what I'm missing or how your setup looks like? How did you ./configure?

>> Are you using the vanilla version?

>> 

>> Thanks a lot, BlinkEye

> 

> Hello,

> 

> As it is complaining about specific version of the lib make a symlink 

> from /usr/lib/mysql/libmysqlclient.so to

> /usr/lib/mysql/libmysqlclient.so.14 that should solve your problem

> 

> Belgarath

Hello Belgarath,

thanks for your input!

I tried that already and removed it later. Now, doing it again it gives me:

# dspam

dspam: /usr/lib/mysql/libmysqlclient.so.14: version `libmysqlclient_14' not found

(required by /usr/local/lib/libdspam.so.7)

revealing old leftovers. I removed them and rebuild dspam-3.6.8. It now correctly

uses the libs in 

# /usr/local/lib/

libdspam.a         libdspam.la        libdspam.so        libdspam.so.7     

libdspam.so.7.0.0

But since I made that symlink I cannot train anything at all (so, I'm back where I

was yesterday). Neither with the libhash_drv nor the libmysql_drv:

> /usr/bin/dspam_train username /home/username/.maildir/.Spam/new/

> /home/username/.maildir/cur/ Taking Snapshot... username TP:     0 TN:

> 0 FP:     0 FN:     0 SC:     0 NC:     0 Training

> /home/username/.maildir/cur/ / /home/username/.maildir/.Spam/new/

> corpora... [test: nonspam] 1165262335.20937_1.username:2,S  result: =====

> WOAH THERE ===== I was unable to parse the result. Test Broken.

Even if I compile dspam without mysql support I cannot train. I'm quite sure it has

something to do with that libmysqlclient.so.14, but have no idea what to do from

here on.
```

I don't understand why the 3.6.8.ebuild from portage wants that libmysqlclient.so.14 and I have to symlink it. On my desktop and my working station I can emerge dspam-3.6.8 without the need of libmysqlclient.so.14 and on both computers which are (almost) as up-to-date as the server I can use dspam_train (haven't yet tried to use it there with mysql support).

----------

## Asmod

I fixed it by not using the ebuild and compiling from source. If you haven't tried that then give it ago and let me know what happens.

Asmod

----------

## Asmod

Actually thinking about it I had a lot of problems due with timeouts and all kinds of stuff. The main problem for me was I had 1 account and ran a company of 600+ people through it plus a few hundred mailing/distro's on top. The dspam machine was a P200 with 256mb ram!!! It just couldn't cope.

I moved it onto a dual 1Ghz scsi box with 512mb of ram and things were instantly 10x better but I did compile from source and these are the options I used

```

./configure --prefix=/usr --sysconfdir=/etc --localstatedir=/var/lib --with-dspam-home=/var/spool/dspam --sysconfdir=/etc/mail/dspam --with-mysql-includes=/usr/include/mysql --with-mysql-libraries=/usr/lib/mysql --enable-preferences-extension --enable-daemon --enable-virtual-users --with-storage-driver=mysql_drv,hash_drv
```

----------

## BlinkEye

thanks a lot for your help. I downloaded the vanilla 3.6.8 version and successfully configured it with

```
./configure --prefix=/usr --sysconfdir=/etc --localstatedir=/var/lib --with-logdir=/var/spool/dspam/log/ --with-logfile=/tmp/dspam.log --with-dspam-home=/var/spool/dspam --sysconfdir=/etc/mail/dspam --with-mysql-includes=/usr/include/mysql --with-mysql-libraries=/usr/lib/mysql --enable-preferences-extension --with-storage-driver=mysql_drv,hash_drv --enable-syslog --enable-debug --enable-verbose-debug
```

I was able to insert about 13Mb of data into the (newly created) database and then broke it off - happy about it working. Next time I try it fails on me. Since then, even though I reconfigured, recompiled and remade the database it ALWAYS fails on any message. It's like old times: even though once working, it just bails out from that time on. 

I created a strace file when calling dspam with the following command:

```
/usr/bin/dspam_train username /home/username/.maildir/.Spam/cur/ /home/username/.maildir/cur/
```

It can be found here.

I tried different folders of course, smaller ones - with no avail. I can set the storage driver to the libhash_drv (and even shut down the mysql database to make sure it does not try to use the mysql connection) and it bails out too. If using the mysql driver AND shutting down the db I get the expected message:

```
Taking Snapshot...

15247: [12/08/2006 01:35:46] Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)

15247: [12/08/2006 01:35:46] unable to attach dspam context

Training /home/username/.maildir/cur/ / /home/username/.maildir/.Spam/cur/ corpora...

[test: nonspam] 1165262335.20937_1.blinkeye:2,S  result: 15249: [12/08/2006 01:35:47] Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)

15249: [12/08/2006 01:35:47] unable to initialize tools context

15249: [12/08/2006 01:35:47] Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)

15249: [12/08/2006 01:35:47] unable to initialize tools context

15249: [12/08/2006 01:35:47] Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)

15249: [12/08/2006 01:35:47] unable to initialize tools context

===== WOAH THERE =====

I was unable to parse the result. Test Broken.

======================
```

I'm wondering why I don't get any debug output altough I built it with debug enabled and set the debug settings in dspam.conf.

I'm really lost ...

----------

## BlinkEye

I don't believe it!

t was the config option 

```
Opt in
```

Every now and then I set that and when you don't have the corresponding username.dspam in /var/spool/dspam/opt-in/ it will just bail out when you try to train for that user.

----------

## m4chine

I'm trying to get a grasp on the dspam group file... I would like to setup merged groups for multiple domain. Do I need multiple group files and should they be placed in the domain.com dir in data dir? Or can I have a single group file for all domains? And do i append the @domain.com to the global user and usernames?

The script SteveB posted for getting the path the group file should reside in gave: /var/spool/dspam/data

Which has directories for each of my domains.

thanks,

----------

## magic919

Group file lives in DSPAM home as above.

Content on mine is

```

nameofglobalgroup:shared,managed:*@example.com,*@domain.com,*@website.com

```

So they all train as nameofglobalgroup and share the one lot of DSPAM training.

----------

