# weired iowait problem

## morlix

Hello,

i have a really weired problem. Sometimes the response time of my servers file systems is very slow (from 1 up to 30s).

First i thought the problem is my constellation of 4 harddisks from which i made a raid 5. This raid i have partioned with lvm into 3 LVs which then are encrypted.

I tested severall things, but now after looking at the below iostat output i think my harddisk /dev/sda is defunct.

What do you think? I just want to know what you are thinking of this huge await times!

```

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           6,00    0,00    2,33   91,67    0,00    0,00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util

sda               0,00     0,00    0,33    0,00     1,33     0,00     8,00     4,41 6773,00 3000,00 100,00

sdb               0,00     0,00    0,33    0,00    13,33     0,00    80,00     0,00    2,00   2,00   0,07

sdc               0,00     0,00    0,00    0,00     0,00     0,00     0,00     0,00    0,00   0,00   0,00

sdd               0,00     0,00    0,33    0,00     8,00     0,00    48,00     0,00    2,00   2,00   0,07

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           4,68    0,00    1,00   94,31    0,00    0,00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util

sda               0,00     0,00    0,33    0,33     1,34     1,34     8,00     2,32 8822,50 1500,00 100,33

sdb               0,00     0,00    0,00    0,00     0,00     0,00     0,00     0,00    0,00   0,00   0,00

sdc               0,00     0,00    0,00    0,00     0,00     0,00     0,00     0,00    0,00   0,00   0,00

sdd               0,00     0,00    0,00    0,00     0,00     0,00     0,00     0,00    0,00   0,00   0,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

          15,33    0,00    4,67   80,00    0,00    0,00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util

sda               2,67     4,00    0,33    1,00    21,33    20,00    62,00     2,63 4720,75 750,00 100,00

sdb               0,00     4,00    0,33    2,00     1,33    24,00    21,71     0,09   39,29  14,14   3,30

sdc               2,67     0,00    0,67    1,33    13,33     5,33    18,67     0,70  349,83  88,83  17,77

sdd               0,00     4,00    0,00    3,00     0,00    28,00    18,67     0,07   25,00  12,67   3,80

```

regards

morlix

----------

## eccerr0r

It is possible.... you should download smartmontools and see if that drive's SMART data shows anything.  Also if there are indications in your dmesg? (then again, the disk should be kicked from the raid soon)

Each of the 4 disks in my RAID5s are about the same in terms of load taken, some have more just due to randomness but still 'similar'.

```
avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           1.86    0.04    0.78    1.36    0.00   95.96

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util

hda              18.94     1.35    1.26    1.17   170.11    23.45    79.63     0.08   31.05   4.22   1.03

hdc              18.94     1.34    1.23    1.12   169.99    23.01    82.05     0.17   72.38   6.13   1.44

hde              18.54     1.33    1.66    1.13   170.16    23.01    69.22     0.20   70.96   3.45   0.96

hdg              18.54     1.32    1.64    1.09   170.13    22.61    70.42     0.23   83.91   3.70   1.01

```

Not sure if it's possible to have disks to spin down immediately after servicing a request...that would be a long wait ....

----------

## morlix

Neither the smart status nor the hdparm flags are different between sda and the other disks in the raid.

Is it possible that such sometimes occuring problems is caused by a corrupt sata cable?

These problems doesn´t occour all the time. It is very often, but sometimes i get a good throughput of about 40MB/s (encrypted disks).

----------

## eccerr0r

Funny you should speak of it, one of my disks last night in one of my RAIDs is now generating tons of errors, not sure if it's due to disk issues or cable issues.  No errors are showing up in the SMART log, but a large number of reallocates are there... not a good sign.

I replaced the cable as this cable has been causing problems in the past.  A bad cable can cause problems but they usually should show up as CRC errors in your dmesg.

----------

