# high server load due to email [solved]

## meldon

Help!

I'm running a server that hosts web sites and delivers email and such. I've been experiencing very high load averages and I'm pretty sure it is email related, possibly spam filtering. I've done many google and gentoo forum searches but I can't figure out what is going on so any help is much appreciated.

I'm using qmail and spamassassin. I've had load averages of over 30 for the last few days. When I stop qmail it settles right down to ~0.01, which assures me that this is not a hardware issue (correct me if i'm wrong).

Here is some top readouts of when it's really high:

```
top - 08:36:04 up 14 min,  3 users,  load average: 103.26, 66.66, 33.88

Tasks: 265 total,  70 running, 188 sleeping,   0 stopped,   7 zombie

Cpu(s): 95.1% us,  2.3% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.3% hi,  2.3% si

Mem:    507712k total,   450388k used,    57324k free,    16860k buffers

Swap:  1004052k total,     8752k used,   995300k free,    55424k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND

12590 qmaild    18   0  2856 1308 1064 R 52.3  0.3   0:01.81 qmail-smtpd

12589 qmaild    18   0  2852 1312 1064 R 39.9  0.3   0:01.74 qmail-smtpd

12795 root      19   0  1484   76    4 R  1.9  0.0   0:00.06 qmail-lspawn

10877 dhn       16   0  2388 1312  848 R  1.3  0.3   0:02.74 top

11044 root      18   0  1484  348  276 S  1.3  0.1   0:00.13 qmail-lspawn

  792 root      15   0     0    0    0 S  0.6  0.0   0:25.20 kjournald

12039 qscand    17   0  6984 4848 1696 D  0.3  1.0   0:00.40 qmail-scanner-q

    1 root      16   0  1484  464  440 S  0.0  0.1   0:00.67 init

    2 root      34  19     0    0    0 R  0.0  0.0   0:00.00 ksoftirqd/0

    3 root      10  -5     0    0    0 S  0.0  0.0   0:10.72 events/0

    4 root      10  -5     0    0    0 S  0.0  0.0   0:00.02 khelper

    5 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 kthread

    7 root      10  -5     0    0    0 S  0.0  0.0   0:01.16 kblockd/0

    8 root      20  -5     0    0    0 S  0.0  0.0   0:00.00 kacpid

   84 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 khubd

  133 root      15   0     0    0    0 S  0.0  0.0   0:01.29 pdflush

  134 root      15   0     0    0    0 S  0.0  0.0   0:01.72 pdflush

  136 root      14  -5     0    0    0 S  0.0  0.0   0:00.00 aio/0

  135 root      15   0     0    0    0 S  0.0  0.0   0:00.85 kswapd0

  722 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 kseriod

  745 root      12  -5     0    0    0 S  0.0  0.0   0:00.00 kpsmoused

  763 root      11  -5     0    0    0 S  0.0  0.0   0:00.00 ata/0

  766 root      16   0     0    0    0 S  0.0  0.0   0:00.00 khpsbpkt

  988 root      14  -4  1712  324  304 S  0.0  0.1   0:00.38 udevd

 7213 root      15   0  1824  576  392 S  0.0  0.1   0:00.80 syslog-ng

 7708 mysql     17   0  115m  16m 1244 S  0.0  3.4   0:01.26 mysqld

 7780 postgres  16   0 16708 2296 1860 S  0.0  0.5   0:00.79 postmaster

 7833 root      16   0  3440  832  672 S  0.0  0.2   0:00.00 sshd

 8094 postgres  16   0 16708 1576 1092 S  0.0  0.3   0:00.13 postmaster

 8095 postgres  16   0  7624  864  436 S  0.0  0.2   0:00.00 postmaster

top - 08:36:29 up 14 min,  3 users,  load average: 83.76, 65.18, 34.27

Tasks: 140 total,  15 running, 118 sleeping,   0 stopped,   7 zombie

Cpu(s): 73.8% us,  9.3% sy,  0.0% ni,  0.0% id,  0.0% wa,  7.9% hi,  8.9% si

Mem:    507712k total,   201564k used,   306148k free,    17580k buffers

Swap:  1004052k total,     8752k used,   995300k free,    54060k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND

13095 qmaild    17   0  2852 1308 1064 R 49.9  0.3   0:01.51 qmail-smtpd

  792 root      16   0     0    0    0 S  6.9  0.0   0:27.22 kjournald

13141 root      18   0  1472  380  328 R  5.6  0.1   0:00.17 start-stop-daem

12939 qmaild    19   0  2852 1304 1064 R  5.0  0.3   0:02.53 qmail-smtpd

12923 qmaild    19   0  2856 1312 1064 R  3.6  0.3   0:02.65 qmail-smtpd

12924 qmaild    19   0  2852 1304 1064 R  1.7  0.3   0:02.62 qmail-smtpd

13118 root      18   0  2632  628  484 S  1.0  0.1   0:00.03 tcpserver

10877 dhn       16   0  2388 1312  848 R  0.7  0.3   0:02.99 top

 8702 dhn       16   0  6420 1348  948 R  0.3  0.3   0:00.14 sshd

 9637 root      17   0     0    0    0 Z  0.3  0.0   0:00.01 supervise

<defunct>

11850 root      15   0  2668 1456  980 S  0.3  0.3   0:00.13 runscript.sh

13119 qscand    16   0  3468 1020  812 S  0.3  0.2   0:00.01 spamc

13120 qscand    16   0  3468 1024  812 S  0.3  0.2   0:00.01 spamc

    1 root      16   0  1484  464  440 S  0.0  0.1   0:00.68 init

    2 root      34  19     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/0

    3 root      10  -5     0    0    0 S  0.0  0.0   0:10.72 events/0

    4 root      10  -5     0    0    0 S  0.0  0.0   0:00.02 khelper

    5 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 kthread

    7 root      10  -5     0    0    0 S  0.0  0.0   0:01.16 kblockd/0

    8 root      20  -5     0    0    0 S  0.0  0.0   0:00.00 kacpid

   84 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 khubd

  133 root      15   0     0    0    0 S  0.0  0.0   0:01.68 pdflush

  134 root      15   0     0    0    0 S  0.0  0.0   0:01.72 pdflush

  136 root      14  -5     0    0    0 S  0.0  0.0   0:00.00 aio/0

  135 root      15   0     0    0    0 S  0.0  0.0   0:00.85 kswapd0

  722 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 kseriod

  745 root      12  -5     0    0    0 S  0.0  0.0   0:00.00 kpsmoused

  763 root      11  -5     0    0    0 S  0.0  0.0   0:00.00 ata/0

  766 root      16   0     0    0    0 S  0.0  0.0   0:00.00 khpsbpkt

  988 root      14  -4  1712  324  304 S  0.0  0.1   0:00.38 udevd

 7213 root      15   0  1824  576  392 S  0.0  0.1   0:01.06 syslog-ng

 7708 mysql     17   0  115m  16m 1244 S  0.0  3.4   0:01.26 mysqld
```

I have to do various series of "killall -9 qmail-smtpd" and such to get the server under control. My big fear is that the server is owned and someone is using my qmail to send spam. I have not found evidence of this though.

I'm also wondering if these are normal sizes for things in /var/spool/qmailscan:

```
/var/spool/qmailscan # du -sch * .??*

16K     archive

606M    mailstats.csv

562M    qmail-queue.log

570M    qmail-queue.log.1

4.0K    qmail-scanner-queue-version.txt

758M    quarantine

12K     quarantine-attachments.db

8.0K    quarantine-attachments.txt

3.6M    quarantine.log

448K    tmp

0       viruses.log

288K    working

0       .keep

8.0K    .pyzor

95M     .razor

208M    .spamassassin

2.8G    total
```

Where do i go from here? What should i be checking?Last edited by meldon on Thu Jan 11, 2007 1:07 am; edited 1 time in total

----------

## indynet

 *meldon wrote:*   

> Where do i go from here? What should i be checking?

 

Hello, you should have a look at mail queue and log files. There you should find informations about all emails that you are sending or was send by your system.

----------

## meldon

I've been monitoring (tail -f) /var/log/qmail/qmail-smtpd and /var/log/qmail/qmail-send and it looks like the normal traffic that I usually see. I've also looked at some random stuff in /var/qmail/queue and it looks normal as well. A lot of bounces and such to addresses that don't exist. Spammers guessing at address i think but i've seen lots of that when the load was low too.

What would cause these huge load averages for so long?

----------

## meldon

 *meldon wrote:*   

> I've been monitoring (tail -f) /var/log/qmail/qmail-smtpd and /var/log/qmail/qmail-send

 

Here is a sample from /var/log/qmail/qmail-smtpd:

```
@400000004505812a02db9b04 tcpserver: pid 28095 from 199.239.138.72

@400000004505812a0c40b60c tcpserver: ok 28095 secure.gv.ca:66.219.59.41:25 content119b.lga2.nytimes.com:199.239.138.72::4508

@400000004505813a1f9ecf7c tcpserver: end 28095 status 0

@400000004505813a1f9eeebc tcpserver: status: 39/40

@400000004505813a1f9efe5c tcpserver: status: 40/40

@400000004505813a1f9f11e4 tcpserver: pid 28161 from 211.62.35.104

@400000004505813b15f4db74 tcpserver: ok 28161 secure.gv.ca:66.219.59.41:25 mailgate5.kt-idc.co.kr:211.62.35.104::57975

@400000004505815006625c7c tcpserver: end 28161 status 0

@4000000045058150066277d4 tcpserver: status: 39/40

@400000004505815006628774 tcpserver: status: 40/40

@40000000450581500662932c tcpserver: pid 28263 from 218.111.182.98

@400000004505815018e8750c tcpserver: ok 28263 secure.gv.ca:66.219.59.41:25 :218.111.182.98::1802

@400000004505815018e89834 tcpserver: end 28263 status 256

@400000004505815018e8a7d4 tcpserver: status: 39/40

@400000004505815018e8b38c tcpserver: status: 40/40

@400000004505815018e8c32c tcpserver: pid 28270 from 192.220.126.50

@4000000045058150195ff6cc tcpserver: ok 28270 secure.gv.ca:66.219.59.41:25 serbus.virtualfocus.com:192.220.126.50::4461
```

and from /var/log/qmail/qmail-send:

```

@400000004505822720a2aaec new msg 999564

@400000004505822720a2d9cc info msg 999564: bytes 4535 from <> qp 28632 uid 210

@400000004505822721082764 starting delivery 706: msg 999564 to local example.org-EllisnNaParsons@example.org

@400000004505822721084a8c status: local 1/10 remote 1/20

@400000004505822721085a2c delivery 706: failure: Sorry,_no_mailbox_here_by_that_name._(#5.1.1)/

@4000000045058227210869cc status: local 0/10 remote 1/20

@400000004505822725d8a6ec bounce msg 999564 qp 28636

@400000004505822725d8c62c end msg 999564

@400000004505814f304a443c new msg 999629

@400000004505814f304a731c info msg 999629: bytes 4079 from <#@[]> qp 28257 uid 206

@400000004505814f34ae7534 starting delivery 697: msg 999629 to local example.ca-postmaster@example.ca

@400000004505814f34ae908c status: local 1/10 remote 0/20

@400000004505814f38963a9c new msg 999564

@400000004505814f3896714c info msg 999564: bytes 4201 from <#@[]> qp 28260 uid 440

@400000004505815001acd734 delivery 697: success: did_0+1+0/qp_28260/

@400000004505815001acf28c status: local 0/10 remote 0/20

@400000004505815001ad022c starting delivery 698: msg 999564 to local example.ca-postmaster@example.ca

@400000004505815001ad11cc status: local 1/10 remote 0/20

@400000004505815001ad216c end msg 999629

@40000000450581500663a884 new msg 999629

@40000000450581500663bc0c info msg 999629: bytes 4309 from <#@[]> qp 28262 uid 440

@40000000450581500b256e84 starting delivery 699: msg 999629 to local example.com-postmaster@example.com

@40000000450581500b25a534 status: local 2/10 remote 0/20

@40000000450581500b25c08c delivery 698: success: did_0+1+0/qp_28262/

@40000000450581500b25dbe4 status: local 1/10 remote 0/20

@40000000450581500b25f73c end msg 999564

@40000000450581500eda44dc delivery 699: success: Pattern_option_'w'_is_valid_only_when_MAILDROP_OLD_REGEXP_is_set /postmaster/maildropfilter(5): _Syntax_error_in_/pattern/.//did_0+0+1/

@40000000450581500eda641c status: local 0/10 remote 0/20

@40000000450581500eda77a4 end msg 999629
```

(i've replaced the server names with example)

----------

## meldon

still high load

```
 load average: 33.87, 34.11, 34.68
```

I niced the qmail processes so that hopefully it will finish what it's doing and everything else can proceed normally. Any further help is appreciated.  :Smile: 

----------

## gerdesj

 *meldon wrote:*   

> still high load
> 
> ```
>  load average: 33.87, 34.11, 34.68
> ```
> ...

 

Its been a while since I did this lot (I now use Exim to do those mail routing and manipulation tasks that QMail can't).  It looks like Life with QMail's install in a ebuild.

Use 

```
svc -d <path to service>
```

 to stop processes when you are using Daemontools rather than killall -9 because supervise will just create another one!

Try running qmail on its own without qmailscanner, which I hope you can change by stopping qmail-smtpd and then clearing out the QMAILSCANNER environment variable in /service/..../env or in the run script for it.  If that is OK then you know its Spamassassin that is causing the problem.

If spamd is running under supervise try running it "normally" and see if it is crashing and being respawned.

Hope this helps

Cheers

Jon

----------

## meldon

 *gerdesj wrote:*   

> Use 
> 
> ```
> svc -d <path to service>
> ```
> ...

 

I do the  svc -d /service/qmail-smtpd to stop smtpd but there are still ~22 qmail-smtpd processes running. they keep running for a long time and I eventually killall -9 them so that the load goes down.

I'm currently pouring over my configuration and various howto's such as http://gentoo-wiki.com/QmailRocksOnGentoo and checking everything. I don't understand how this started though. I didn't change anything and it was fine last week.

----------

## meldon

I found this page to be informative and helpful: http://qmail.jms1.net/clamav-qms.shtml I checked the permissions and users and it all seemed correct (using qscand)

I believe the issue is with clamav. I don't have clamd running and it seems okay for now. Thinking of trying qmail-scanner-2.01

----------

## meldon

not bad. down below 1 anyway:    load average: 0.55, 0.36, 0.41

I've been watching the log files and every now and then it gets totally flooded with spam   :Evil or Very Mad:   to one of the domains to addresses that don't exist. The filters then seem to get so bogged down which causes the high load average. In the last day I've recompiled and configured most of the email system and filters but I still saw load averages above 20.

----------

## npalmer76

I have been having the same problem recently ever since I upgraded to the latest qmail but I tried netqmail with the same problem.

I have spamc running against a spamd on another server.

I think the problem may be with clamav taking a long time to scan the message. I have turned on DEBUG in qmail-scanner-queue.pl in an attempt to see if that makes a difference and upgraded clamav to the latest version. I will watch the log created by qmail-scanner-queue.pl to see if that helps. You might try the same.

----------

## meldon

Problem solved

 *Quote:*   

> load average: 0.00, 0.03, 0.00

 

I can with confidence say that my server load problem was due to spam filtering. I've installed ASSP and spam filtering is working famously. I still have spam assassin with clamav running but I guess it has much less to do.

I've had ASSP running for almost a month and forgot about this thread. Will mark solved in the subject.

----------

