# Good hw diagnostic package? Mysql 5.0.44 killing server.

## SweepingOar

My server keeps grinding to a halt. First getting dragged down to 50% idle but doing nothing (maybe one core idle, but not being used for some reason), then it goes to 0.0% idle and nothing happens. Need to see if there's trouble with the mb or processor or something. Thanks.

----------

## venquessa2

Assuming you actually 'expect' some load and you are asking it to do something and it's not, while reporting idle.

Run "top" on the machine.  Pay particular attention to 

```

 0.0%wa,  0.2%hi,  0.1%si,  0.0%st

```

theses.

Are any of them high?  Is the load average high, as displayed by "top" or "uptime" ?

Also, multi-core systems will from time to time run 100% on one core rather than spread it around.  It's a long discussion as to why, but basically one process may and usually will only use one core.  That core may change from time to time, but only proper multiprocess apps will use multi cores.  Thats the short and half wrong answer, but it'll do.

----------

## SweepingOar

Yeah, it gets a bit of traffic, but nothing huge. I took the machine out of service and put in a backup machine for the time being, but I'm still trying to figure out what's wrong with the original machine. It was working fine for a few years now with uptime of over a year and a load average below 20%. Now I've got it in my office and all service are running (mail, web) and it's top is at around 100% idle with occasional drops to around 90% idle. It is completely isolated from the web and is just on our lan so no outside requests can reach it, only me.

It used to handle a decent amount of traffic to a small number of web sites. Now, when I make a web page request (with a few mysql querys providing content for the page), the idle drops to 50% and wa goes up to around 50% and it takes forever for the page to load. When the page is finally done being served up, the idle goes back to around 100%. This is drastically worse performance than I've ever seen from any machine and this one is fairly new (C2D running at over 2ghz).

The raid is showing UU, I'm getting some spamd errors on boot, but I don't see why apache or mysql is choking so badly. I don't see anything to indicate a problem in /var/log/mysqld or messages or dmesg.

 *venquessa2 wrote:*   

> Assuming you actually 'expect' some load and you are asking it to do something and it's not, while reporting idle.
> 
> Run "top" on the machine.  Pay particular attention to 
> 
> ```
> ...

 

----------

## venquessa2

The two things I can think of are.

1.  Broken disk drive.

2.  DMA switched off.

Could be something completely different.

----------

## ksp7498

It could possibly be a memory leak that's causing the system to thrash the swap partition like mad (after having enough time to fill up the normal ram first).  How does your memory usage look?  What's the output of "free -m" ?

It's hard to tell at this point if it's a hardware problem (e.g. broken dma, failing disk drive, overheating, etc.) or a software problem (bug or misconfiguration in apache/mysql, memory leak, etc.).

----------

## SweepingOar

Does this look odd?

```
user@localhost ~ $ free -m

             total       used       free     shared    buffers     cached

Mem:           944        860         84          0        101        572

-/+ buffers/cache:        186        758

Swap:         1960          0       1960
```

DMA is turned on in the kernel, but hdparm has 

```
dev # hdparm -d /dev/hda

/dev/hda:

 using_dma     =  0 (off)
```

otoh, it's a raid, so maybe it should be off?

```
# cat /proc/mdstat

Personalities : [raid1] 

md1 : active raid1 hdc1[1] hda1[0]

      40064 blocks [2/2] [UU]

      

md2 : active raid1 hdc2[1] hda2[0]

      2008000 blocks [2/2] [UU]

      

md3 : active raid1 hdc3[1] hda3[0]

      9775488 blocks [2/2] [UU]

      

md4 : active raid1 hdc4[1] hda4[0]

      233287808 blocks [2/2] [UU]
```

----------

## ksp7498

Your memory usage looks normal, there's no issues there.  That's good I guess.  DMA does indeed appear to be disabled though, so that may (at least partly) explain what's happening.  What is the result of "hdparm -tT /dev/hda"? (of course, substitute whichever drive you're most concerned about)  If it's a moderately recent desktop drive you should be seeing at least 30-40 MB/s for buffered disk reads, if its below 10 MB/s or so then DMA is almost certainly disabled.

----------

## SweepingOar

Now I'm getting similar cpu usage under mysql on the replacement machine (which is set up identically to the original except has older hw: P4, IDE software raid instead of C2D and SATA raid). I read in a few places (including here) that you shouldn't use dma with a software raid.

```
hdparm -tT /dev/hda

/dev/hda:

 Timing cached reads:   878 MB in  2.02 seconds = 435.62 MB/sec

 Timing buffered disk reads:   10 MB in  3.90 seconds =   2.56 MB/sec
```

Here's top:

```
top - 09:45:40 up 3 days, 10:49,  2 users,  load average: 0.65, 0.59, 0.47

Tasks:  79 total,   2 running,  77 sleeping,   0 stopped,   0 zombie

Cpu(s): 31.4%us,  4.7%sy,  0.0%ni, 61.5%id,  0.3%wa,  0.7%hi,  1.3%si,  0.0%st

Mem:    967624k total,   950516k used,    17108k free,   173340k buffers

Swap:  2007992k total,      188k used,  2007804k free,   585732k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                             

23048 mysql     15   0  149m  41m 4632 S 26.6  4.4 109:30.48 mysqld                                              

  959 root      10  -5     0    0    0 S  3.3  0.0  58:31.74 md3_raid1                                           

27423 apache    15   0 23264  10m 3332 S  1.7  1.1   0:01.08 apache2                                             

    5 root      10  -5     0    0    0 S  0.7  0.0   0:40.79 events/0                                            

  956 root      10  -5     0    0    0 S  0.7  0.0  29:55.62 md4_raid1                                           

28171 kevins     15   0  2244 1104  848 R  0.3  0.1   0:00.05 top                                                 

    1 root      15   0  1592  492  432 S  0.0  0.1   0:09.40 init                 
```

...but there's nothing weird in the mysql processlist:

```
mysql> show processlist;

+-------+------+-----------+------+---------+------+-------+------------------+

| Id    | User | Host      | db   | Command | Time | State | Info             |

+-------+------+-----------+------+---------+------+-------+------------------+

| 39195 | root | localhost | NULL | Query   |    0 | NULL  | show processlist | 

+-------+------+-----------+------+---------+------+-------+------------------+
```

----------

## ksp7498

wow, thanks for posting that top printout.  That shows pretty clearly what's going on here.  mysql and raid are sucking up insane amounts of cpu time.  Now to be fair, I have not used software raid and I do not know about the dma vs. softraid issue so maybe that is to be expected.  But those disk benchmarks are very very low, and that is a LOT of cpu time spend on mysql and mdraid.

What is the performance of the raid array as a whole?  run hdparm on the array (/dev/md0 or whatever) and see what kind of throughput it has.  My guess, as of now, is that the array is slow and mysql is spending loads of time just waiting for the array to do stuff.

----------

## bunder

 *Quote:*   

> DMA is turned on in the kernel, but hdparm has
> 
> dev # hdparm -d /dev/hda
> 
> /dev/hda:
> ...

 

hdparm -d1 /dev/hda

(along with the other devices in the array)

does it still lag after that?

cheers

----------

## SweepingOar

I'm afraid to just turn on dma on either raid device (or the logical drive if that's even possible) because I've read about errors happening with raid and dma.

```
root@localhost # hdparm /dev/md0

/dev/md0:

 readonly      =  0 (off)

 readahead     = 256 (on)

 geometry      = 0/2/4, sectors = 0, start = 0

root@localhost # hdparm /dev/md1

/dev/md1:

 readonly      =  0 (off)

 readahead     = 256 (on)

 geometry      = 10016/2/4, sectors = 80128, start = 0

root@localhost # hdparm /dev/md2

/dev/md2:

 readonly      =  0 (off)

 readahead     = 256 (on)

 geometry      = 43248/2/4, sectors = 4016000, start = 0

root@localhost # hdparm /dev/md3

/dev/md3:

 readonly      =  0 (off)

 readahead     = 256 (on)

 geometry      = 19040/2/4, sectors = 19550976, start = 0

root@localhost # hdparm /dev/md4

/dev/md4:

 readonly      =  0 (off)

 readahead     = 256 (on)

 geometry      = 60448/2/4, sectors = 466575616, start = 0

root@localhost  # hdparm /dev/md5
```

The timing of the raid seems to be about the same as hda:

```
/dev/md0:

read() hit EOF - device too small

 Timing buffered disk reads:  read() hit EOF - device too small

BLKFLSBUF failed: No such device

HDIO_DRIVE_CMD(null) (wait for flush complete) failed: No such device

root@localhost # hdparm -tT /dev/md4

/dev/md1:

 Timing cached reads:   928 MB in  2.00 seconds = 463.67 MB/sec

 Timing buffered disk reads:    6 MB in  3.26 seconds =   1.84 MB/sec

 

/dev/md2:

 Timing cached reads:   934 MB in  2.00 seconds = 466.35 MB/sec

 Timing buffered disk reads:    6 MB in  3.38 seconds =   1.78 MB/sec

root@localhost # hdparm -tT /dev/md1

/dev/md3:

 Timing cached reads:   450 MB in  2.03 seconds = 221.40 MB/sec

 Timing buffered disk reads:   10 MB in  3.67 seconds =   2.73 MB/sec

root@localhost # hdparm -tT /dev/md2

/dev/md4:

 Timing cached reads:   964 MB in  2.00 seconds = 481.76 MB/sec

 Timing buffered disk reads:    8 MB in  3.08 seconds =   2.60 MB/sec

root@localhost # hdparm -tT /dev/md3
```

----------

## ksp7498

Unfortunately, I do not know enough about software raid to give any suggestions about this.  However, I can say almost certainly that this is the issue.  Your array is performing VERY slowly, 1-2 MB/s is exceptionally slow, even for drives running in PIO mode.

However, I can't think of any way in which dma would hurt the array.  Do you happen to have any links to the discussions that addressed the dma vs. softraid debate?  I'm not doubting you, I just want to fully understand what the potential downside of it is.  I've never known dma to be a bad thing.

My (potentially dangerous) advice is to backup everything important on the array, and then try enabling dma and see what happens.  However, if the kernel isn't enabling dma on its own to begin with, that may indicate that you're missing the right kernel drivers for your particular chipset.  But whatever you do, I'd imagine the array is probably unusably slow right now as it stands.

----------

## jcat

I agree that the disk test indicate that the array is seriously under performing, but wouldn't we see much more wait I/O in top if that disk was holding things up?

Cheers,

jcat

----------

## SweepingOar

Ok, I downloaded the latest gentoo-sources, enabled the VIA chipset and recompiled the kernel and turned dma on for both drives. The drives sped up and I'm not seeing raid in top anymore, but mysqld is still hogging an unbelievable amount of cpu:

```
top - 15:00:12 up  2:22,  1 user,  load average: 0.18, 0.21, 0.19

Tasks:  95 total,   1 running,  94 sleeping,   0 stopped,   0 zombie

Cpu(s): 31.8%us,  2.0%sy,  0.0%ni, 64.9%id,  1.3%wa,  0.0%hi,  0.0%si,  0.0%st

Mem:    967840k total,   542432k used,   425408k free,   113380k buffers

Swap:  2007992k total,        0k used,  2007992k free,   263080k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND

 5082 mysql     20   0  145m  39m 4488 S 26.6  4.2  16:40.51 mysqld

13785 apache    20   0 25696  11m 2948 S  2.0  1.2   0:00.12 apache2

13786 apache    20   0 23672 9372 2964 S  1.3  1.0   0:00.04 apache2

13787 apache    20   0 25200  10m 2988 S  1.3  1.1   0:00.14 apache2

13774 kevins     20   0  2244 1128  852 R  0.7  0.1   0:00.08 top

    1 root      20   0  1596  548  476 S  0.0  0.1   0:00.32 init

    2 root      15  -5     0    0    0 S  0.0  0.0   0:00.00 kthreadd

    3 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 migration/0

    4 root      15  -5     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/0

    5 root      15  -5     0    0    0 S  0.0  0.0   0:00.30 events/0

    6 root      15  -5     0    0    0 S  0.0  0.0   0:00.00 khelper

   82 root      15  -5     0    0    0 S  0.0  0.0   0:00.06 kblockd/0

   85 root      15  -5     0    0    0 S  0.0  0.0   0:00.00 kacpid

   86 root      15  -5     0    0    0 S  0.0  0.0   0:00.00 kacpi_notify

  163 root      15  -5     0    0    0 S  0.0  0.0   0:00.00 ata/0

  164 root      15  -5     0    0    0 S  0.0  0.0   0:00.00 ata_aux

  165 root      15  -5     0    0    0 S  0.0  0.0   0:00.00 ksuspend_usbd

  171 root      15  -5     0    0    0 S  0.0  0.0   0:00.00 khubd

  174 root      15  -5     0    0    0 S  0.0  0.0   0:00.00 kseriod
```

Here's the drive speed now if you're curious:

```
root@localhost # hdparm -tT /dev/md1

/dev/md1:

 Timing cached reads:   754 MB in  2.01 seconds = 374.69 MB/sec

 Timing buffered disk reads:   38 MB in  0.72 seconds =  52.73 MB/sec

root@localhost # hdparm -tT /dev/md2

/dev/md2:

 Timing cached reads:   850 MB in  2.00 seconds = 424.15 MB/sec

 Timing buffered disk reads:  168 MB in  3.09 seconds =  54.30 MB/sec

root@localhost # hdparm -tT /dev/md3

/dev/md3:

 Timing cached reads:   680 MB in  2.00 seconds = 339.61 MB/sec

 Timing buffered disk reads:  168 MB in  3.01 seconds =  55.77 MB/sec

root@localhost # hdparm -tT /dev/md4

/dev/md4:

 Timing cached reads:   528 MB in  2.01 seconds = 263.07 MB/sec

 Timing buffered disk reads:  168 MB in  3.00 seconds =  55.95 MB/sec
```

----------

## ksp7498

ah yes that disk performance looks much better, and it's good that we're not seeing the raid array sucking up cpu time.  I have no idea about the mysql performance though, I guess it is something unrelated.  I have practically zero experience with using mysql, unfortunately.  Does the system still suffer from the issues that you described in your original post?  Getting dma working should have made at least some noticeable difference in the machine's performance.

----------

## danomac

I can only think of a couple of things that could cause mysql to do that.

You say this is a newer machine, are the drives actually ide or are they sata? If they are indeed sata drives you are using the wrong driver in the kernel. You need to use the proper sata drivers (you'll have to change a few things, as your drives will go from /dev/hd? to /dev/sd?.) There's been a lot of complaints that DMA does not work properly with the pata drivers and sata drives.

The second thing that comes to mind is that mysql is being artificially choked - maybe it can't keep enough tables in memory due to something like the restriction on the number of open files for the mysql process. This would create a hell of a lot of disk activity as it would constantly be opening and closing things on the array which would cause the raid drivers and mysql to use a lot of cpu time waiting for disk activity.

When things aren't idle, have you tried to get a status from mysql? Logging into the mysql console and doing a 'SHOW ENGINE INNODB STATUS' command may tell you something.

----------

## SweepingOar

Thanks. I think ksp and others who posted helped me solve the dma issue. The drives are IDE and I think the correct drivers are selected in the kernel (the drive performance seems ok now compared to other performance numbers I've seen posted here). Unfortunately, enabling dma hasn't affected the mysql performance noticeably.

Yes, I've also been thinking it might be a memory or other configuration in either mysql or php, but I really don't know anything about mysql and php configuration, I just emerge it and set up the passwords and permissions and that's about it. I think it's more likely to be mysql though because I ran a perl script with a lot of update statements in it last night and that also crushed the machine (down to zero idle for a few seconds). I looked at a few mysql performance statistics, but I don't know how to analyze them. Here's the one you suggested if this tells you anything:

```
=====================================

080327  9:50:55 INNODB MONITOR OUTPUT

=====================================

Per second averages calculated from the last 7 seconds

----------

SEMAPHORES

----------

OS WAIT ARRAY INFO: reservation count 2, signal count 2

Mutex spin waits 0, rounds 0, OS waits 0

RW-shared spins 4, OS waits 2; RW-excl spins 0, OS waits 0

------------

TRANSACTIONS

------------

Trx id counter 0 1792

Purge done for trx's n:o < 0 0 undo n:o < 0 0

History list length 0

Total number of lock structs in row lock hash table 0

LIST OF TRANSACTIONS FOR EACH SESSION:

---TRANSACTION 0 0, not started, process no 5082, OS thread id 2950278032

MySQL thread id 35330, query id 969462 localhost root

SHOW ENGINE INNODB STATUS

--------

FILE I/O

--------

I/O thread 0 state: waiting for i/o request (insert buffer thread)

I/O thread 1 state: waiting for i/o request (log thread)

I/O thread 2 state: waiting for i/o request (read thread)

I/O thread 3 state: waiting for i/o request (write thread)

Pending normal aio reads: 0, aio writes: 0,

 ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0

Pending flushes (fsync) log: 0; buffer pool: 0

25 OS file reads, 3 OS file writes, 3 OS fsyncs

0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s

-------------------------------------

INSERT BUFFER AND ADAPTIVE HASH INDEX

-------------------------------------

Ibuf: size 1, free list len 0, seg size 2,

0 inserts, 0 merged recs, 0 merges

Hash table size 69257, used cells 0, node heap has 0 buffer(s)

0.00 hash searches/s, 0.00 non-hash searches/s

---

LOG

---

Log sequence number 0 43665

Log flushed up to   0 43665

Last checkpoint at  0 43665

0 pending log writes, 0 pending chkp writes

8 log i/o's done, 0.00 log i/o's/second

----------------------

BUFFER POOL AND MEMORY

----------------------

Total memory allocated 34467384; in additional pool allocated 1388544

Buffer pool size   1024

Free buffers       989

Database pages     35

Modified db pages  0

Pending reads 0

Pending writes: LRU 0, flush list 0, single page 0

Pages read 35, created 0, written 0

0.00 reads/s, 0.00 creates/s, 0.00 writes/s

No buffer pool page gets since the last printout

--------------

ROW OPERATIONS

--------------

0 queries inside InnoDB, 0 queries in queue

1 read views open inside InnoDB

Main thread process no. 5082, id 2989108112, state: waiting for server activity

Number of rows inserted 0, updated 0, deleted 0, read 0

0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s

----------------------------

END OF INNODB MONITOR OUTPUT
```

----------

## danomac

Hmm. The only thing I can think of now is to emerge innotop and see if you can get real time stats.

----------

## SweepingOar

It's masked but I'm installing it on the out of service server to try to figure out how to use it. One thing I noticed with mysql 5.0.44 is that only one mysqld process ever seems to run. In 5.0.26 there were multiple processes always spawning. Is this the way it's supposed to work now or is something wrong with my config?

----------

