# 80% I/O wait on hd access [SOLVED]

## knobbo

edit for solution: reiserfs is the culprit! Its performance apparently sucks when it's being used on a nearly-full partition that sees much data modification - like mldonkey's temp and incoming directories  :Rolling Eyes: 

Original post starts here...

As decribed in this thread, I have a small linux server (600 MHz Via C3) running gentoo which acts as a fileserver and internet gateway for a small LAN (4 Windows XP machines). 

Whenever I copy a file from the server (using samba or proftpd) or even locally (à la cp /path/to/big_file /tmp), system wait goes up to about 80%:

```
top - 14:15:07 up  1:38,  2 users,  load average: 0.39, 0.17, 0.69

Tasks:  48 total,   2 running,  46 sleeping,   0 stopped,   0 zombie

Cpu(s):  0.3% us,  5.0% sy,  0.3% ni,  3.0% id, 88.0% wa,  1.0% hi,  2.3% si

Mem:    499696k total,   493944k used,     5752k free,    79996k buffers

Swap:   506036k total,       48k used,   505988k free,   318324k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND

 4836 nobody    16   0  7748 2140 1484 R  6.0  0.4   0:04.80 smbd

 4792 p2p       25  10 32924  20m 4340 S  1.0  4.2   0:25.84 mlnet

 4835 root      16   0  2188 1064  824 R  0.3  0.2   0:00.17 top

    1 root      16   0  1608  544  472 S  0.0  0.1   0:00.46 init

    2 root      34  19     0    0    0 S  0.0  0.0   0:00.01 ksoftirqd/0

    3 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 events/0

    4 root      13  -5     0    0    0 S  0.0  0.0   0:00.05 khelper

    5 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 kthread

    7 root      10  -5     0    0    0 S  0.0  0.0   0:00.13 kblockd/0

    8 root      20  -5     0    0    0 S  0.0  0.0   0:00.00 kacpid

   85 root      15  -5     0    0    0 S  0.0  0.0   0:00.00 kseriod

   88 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 khubd

  127 root      15   0     0    0    0 S  0.0  0.0   0:00.00 pdflush

  128 root      15   0     0    0    0 S  0.0  0.0   0:01.70 pdflush

  129 root      15   0     0    0    0 S  0.0  0.0   0:00.38 kswapd0

  130 root      18  -5     0    0    0 S  0.0  0.0   0:00.00 aio/0

  727 root      17  -5     0    0    0 S  0.0  0.0   0:00.00 kpsmoused
```

Although hdparm shows good thoughput (~30MB/s buffered disk reads), real world performance is about 1.5-2.5 MB/s.

DMA is enabled; I have already posted this and more system info (incl. dmesg and kernel config) in the other thread. 

Might anyone have a clue as to what's going wrong here?Last edited by knobbo on Tue Sep 26, 2006 8:37 pm; edited 1 time in total

----------

## desultory

Regarding the hardware, you might want to indicate the type or types of drives and the model of the interface to which each is attached. Depending on the hardware involved the performance you describe may be quite typical.

Regarding the new topic, why not ask to have the old topic relocated?

----------

## knobbo

There is just one drive: An IDE/PATA drive (UDMA 133, 7200 rpm, 8MB cache), Maxtor 7Y250P0 attached to the onboard Via disk controller. I'd think that this drive should be more than capable to pump at least those 8 MB/s I'm seeing when copying from one Windows PC to the other.

Could this have something to do with the filesystem, since hdparm -tT shows normal values?

```
mrslave ~ # fdisk -l /dev/hda

Disk /dev/hda: 251.0 GB, 251000193024 bytes

255 heads, 63 sectors/track, 30515 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System

/dev/hda1               1           5       40131   83  Linux

/dev/hda2               6          68      506047+  82  Linux swap / Solaris

/dev/hda3              69        1936    15004710   83  Linux

/dev/hda4            1937       30515   229560817+  83  Linux

mrslave ~ # mount

/dev/hda3 on / type reiserfs (rw,noatime)

proc on /proc type proc (rw)

sysfs on /sys type sysfs (rw,nosuid,nodev,noexec)

udev on /dev type tmpfs (rw,nosuid)

devpts on /dev/pts type devpts (rw,nosuid,noexec)

/dev/hda4 on /mnt/storage type reiserfs (rw,noatime)

none on /dev/shm type tmpfs (rw)

usbfs on /proc/bus/usb type usbfs (rw,noexec,nosuid,devmode=0664,devgid=85)

binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,noexec,nosuid,nodev)

```

As to moving the thread: I didn't know this was customary, sorry. I thought that since it doesn't seem to be a network problem anymore, a new topic would be adequate.

----------

## desultory

 *knobbo wrote:*   

> There is just one drive: An IDE/PATA drive (UDMA 133, 7200 rpm, 8MB cache), Maxtor 7Y250P0 attached to the onboard Via disk controller. I'd think that this drive should be more than capable to pump at least those 8 MB/s I'm seeing when copying from one Windows PC to the other. 

 

According to Maxtor, the drive should support nearly twice the data rate that hdparm is reporting, so the choke point there would appear to be the interface, not the drive. More tweaking might yield significant dividends, or not.

 *knobbo wrote:*   

> Could this have something to do with the filesystem, since hdparm -tT shows normal values?

 

Yes, especially considering that the partitions under scrutiny here are formatted with reiserfs and have probably had significant data modification. Before the previous sentence starts a flame war, that opinion comes from direct experience.

```
/dev/hda3 on / type reiserfs (rw,noatime)
```

```
/dev/hda4 on /mnt/storage type reiserfs (rw,noatime)
```

 *knobbo wrote:*   

> As to moving the thread: I didn't know this was customary, sorry. I thought that since it doesn't seem to be a network problem anymore, a new topic would be adequate.

 

Nothing to worry about, you did more than many by indicating that there was another topic to refer to. Consider it to be for future reference. And yes a new topic is adequate, just bear in mind that it might get merged or one or both might get moved depending upon what moderators decide.

----------

## knobbo

 *desultory wrote:*   

> According to Maxtor, the drive should support nearly twice the data rate that hdparm is reporting, so the choke point there would appear to be the interface, not the drive. More tweaking might yield significant dividends, or not.

  Since the 30MB that hdparm reports are already more than what would fit through the network interface, I don't really care about that value being too low. I would just love to see more than 1.5MB/s dripping through the pipe.

 *desultory wrote:*   

>  *knobbo wrote:*   Could this have something to do with the filesystem, since hdparm -tT shows normal values? 
> 
> Yes, especially considering that the partitions under scrutiny here are formatted with reiserfs and have probably had significant data modification. Before the previous sentence starts a flame war, that opinion comes from direct experience.

 What can I do about that? I imagine that, for a test, I could delete the swap partition (the machine never swaps anyway), create an ext2 partition there (linux filesystems don't get much simpler than ext2, do they?) and try copying something from there.

If that helps, is there any way to convert the FS or to optimizie it? Or do I have to copy 200GB off that partition and recreate it?

 *desultory wrote:*   

>  *knobbo wrote:*   As to moving the thread: I didn't know this was customary, sorry. I thought that since it doesn't seem to be a network problem anymore, a new topic would be adequate. 
> 
> Nothing to worry about, you did more than many by indicating that there was another topic to refer to. Consider it to be for future reference. And yes a new topic is adequate, just bear in mind that it might get merged or one or both might get moved depending upon what moderators decide.

 Alright!  :Smile: 

----------

## desultory

 *knobbo wrote:*   

> Since the 30MB that hdparm reports are already more than what would fit through the network interface, I don't really care about that value being too low. I would just love to see more than 1.5MB/s dripping through the pipe.

 

Understood, that was more for the if all else fails case than for immediate attention.

 *knobbo wrote:*   

> I imagine that, for a test, I could delete the swap partition (the machine never swaps anyway), create an ext2 partition there (linux filesystems don't get much simpler than ext2, do they?) and try copying something from there.

 

Just renumber the partition in place, swapoff /dev/hda2, then fdisk /dev/hda and change the partition type number to 83, from 82, then mkfs -t ext2 /dev/hda2 (I recommend trying ext3 as well), then mount it and run the tests, when you want swap back just renumber /dev/hda2 with fdisk then mkswap /dev/hda2 then swapon /dev/hda2.

 *knobbo wrote:*   

> If that helps, is there any way to convert the FS or to optimizie it?

 

I doubt that the partition can be converted automatically, reliably or otherwise, with the data in place and intact. As for optimization, searching the forums should yield some tips.

 *knobbo wrote:*   

> Or do I have to copy 200GB off that partition and recreate it?

 

That is what I would recommend.

----------

## DieselPower

I am having the same problem here, but it is not because of fragmentation. I clone quite a few machines off this computer by mounting the new drive over usb and doing a "cp -a /bin /mnt/gentoo/" etc. It used to work great, but I upgraded my hardware and now I have to wait for up to 30 minutes with no IO activity just to copy a small (20Mb) file to the usb drive. Sometimes it is very fast and other times I just wait, wait, wait..... This is a fresh install of gentoo (2 weeks old) running reiserfs and a freshly formated IDE disk over usb on reiserfs. It takes ages to mount and unmount some usb partitions. Copying localy on the machines disk is very fast and the machine works perfectly otherwise. It has a Nforce4 chipset and a duel athlon 64 X2 cpu. This system was formated from the 2005.1 livecd and the usb disk was formated from my up-to-date system. I am wondering if there is a problem with the current version of mkreiserfs?

```

top - 19:34:33 up  4:01,  4 users,  load average: 7.84, 7.84, 7.28

Tasks: 107 total,   3 running, 104 sleeping,   0 stopped,   0 zombie

Cpu(s): 20.6% us, 29.5% sy,  0.0% ni,  3.2% id, 46.4% wa,  0.2% hi,  0.2% si

Mem:   1034696k total,  1016480k used,    18216k free,    13816k buffers

Swap:  3028244k total,      208k used,  3028036k free,   814764k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND

27785 root      25   0  2132 1112  832 R   94  0.1  33:47.78 top

    4 root      RT   0     0    0    0 S    5  0.0   1:40.15 migration/1

22364 lawrence  15   0  124m  34m  23m S    2  3.4   1:06.80 amarokapp

28070 root      16   0  2136 1120  832 R    0  0.1   0:00.14 top

    1 root      16   0  1516  540  472 S    0  0.1   0:00.56 init

    2 root      RT   0     0    0    0 S    0  0.0   0:00.01 migration/0

    3 root      34  19     0    0    0 R    0  0.0   0:00.40 ksoftirqd/0

    5 root      34  19     0    0    0 S    0  0.0   0:00.14 ksoftirqd/1

    6 root      10  -5     0    0    0 S    0  0.0   0:00.04 events/0

    7 root      10  -5     0    0    0 S    0  0.0   0:00.00 events/1

    8 root      10  -5     0    0    0 S    0  0.0   0:00.01 khelper

    9 root      10  -5     0    0    0 S    0  0.0   0:00.00 kthread

   12 root      10  -5     0    0    0 S    0  0.0   0:00.04 kblockd/0

   13 root      10  -5     0    0    0 S    0  0.0   0:00.00 kblockd/1

   14 root      10  -5     0    0    0 S    0  0.0   0:00.00 kacpid

  149 root      10  -5     0    0    0 S    0  0.0   0:00.00 kseriod

  150 root      10  -5     0    0    0 S    0  0.0   0:00.02 kgameportd

  153 root      10  -5     0    0    0 S    0  0.0   0:00.00 khubd

  217 root      15   0     0    0    0 S    0  0.0   0:00.64 kswapd0

  218 root      11  -5     0    0    0 S    0  0.0   0:00.00 aio/0

  219 root      11  -5     0    0    0 S    0  0.0   0:00.00 aio/1

  806 root       6 -10     0    0    0 S    0  0.0   0:00.14 vesafb

  837 root      11  -5     0    0    0 S    0  0.0   0:00.00 kpsmoused

  873 root      11  -5     0    0    0 S    0  0.0   0:00.00 ata/0

  874 root      11  -5     0    0    0 S    0  0.0   0:00.00 ata/1

  877 root      15   0     0    0    0 S    0  0.0   0:00.00 khpsbpkt

  881 root      16   0     0    0    0 S    0  0.0   0:00.00 knodemgrd_0

  906 root      15   0     0    0    0 S    0  0.0   0:00.00 kirqd

  910 root      10  -5     0    0    0 D    0  0.0   0:00.30 reiserfs/0

  911 root      10  -5     0    0    0 S    0  0.0   0:00.01 reiserfs/1

 1110 root      11  -4  1736  536  352 S    0  0.1   0:00.24 udevd

 5865 root      15   0  1840  628  404 S    0  0.1   0:00.05 syslog-ng

 5936 messageb  15   0  3300  888  680 S    0  0.1   0:00.22 dbus-daemon

 6002 root      15   0  1728  388  320 S    0  0.0   0:00.00 gpm

 6070 haldaemo  15   0  6788 5356 1568 S    0  0.5   0:00.84 hald

 6071 root      16   0  2628  960  820 S    0  0.1   0:00.02 hald-runner

 6077 haldaemo  16   0  1892  784  668 S    0  0.1   0:00.00 hald-addon-acpi

 6087 root      15   0  1788  616  532 S    0  0.1   0:01.89 hald-addon-stor

 6089 root      15   0  1784  612  532 S    0  0.1   0:00.08 hald-addon-stor

 6168 ivman     15   0  4436 1644 1240 S    0  0.2   0:00.14 ivman

 6697 root      15   0  1536  440  372 S    0  0.0   0:00.00 ifplugd

 7232 root      15   0  2856  656  524 S    0  0.1   0:00.00 kdm

 7235 root      15   0 54968  31m 4124 S    0  3.1   1:38.69 X          

```

----------

## desultory

 *DieselPower wrote:*   

> I am having the same problem here, but it is not because of fragmentation. I clone quite a few machines off this computer by mounting the new drive over usb and doing a "cp -a /bin /mnt/gentoo/" etc. It used to work great, but I upgraded my hardware and now I have to wait for up to 30 minutes with no IO activity just to copy a small (20Mb) file to the usb drive. Sometimes it is very fast and other times I just wait, wait, wait.....

 

Unless this also happens when copying files between devices which are not connected to USB ports, I would suspect USB configuration more that filesystem performance in this case.

----------

## knobbo

Since I won't be at home for a week or so, I'll have to find my flatmates online in order to open up the ssh server to the intenet before I can try anything. I'll report back when I have tried some other filesystems.

----------

## tgh

While I'm not 100% awake, I do have a VIA C3 600MHz system as well.  I've hooked mine up to a pair of notebook hard drives (60GB? 5400RPM?).  The first few partitions on each disk are RAID1, with the latter part of each disk done as a separate standalone disk.  This allows me to RAID my O/S and swap partitions, but to setup 2 different backup partitions (/backup1 and /backup2).

Using ext2/ext3 for everything.  /boot is 128MB, root partition is 8GB, swap is 2GB, and the rest of the disk used for the 2 LVM sets is ~48GB.

```
nezumi backup1 # df -h

Filesystem            Size  Used Avail Use% Mounted on

/dev/md2              7.6G  3.0G  4.2G  42% /

udev                  442M  2.5M  440M   1% /dev

/dev/vgbackup1/backup1

                       16G  397M   15G   3% /backup1

/dev/vgbackup2/backup2

                       16G  397M   15G   3% /backup2

shm                   442M     0  442M   0% /dev/shm

nezumi backup1 # pvscan

  PV /dev/hdc4   VG vgbackup2   lvm2 [46.21 GB / 30.21 GB free]

  PV /dev/hda4   VG vgbackup1   lvm2 [46.21 GB / 30.21 GB free]

  Total: 2 [92.41 GB] / in use: 2 [92.41 GB] / in no VG: 0 [0   ]

nezumi backup1 # lvscan

  ACTIVE            '/dev/vgbackup2/backup2' [16.00 GB] inherit

  ACTIVE            '/dev/vgbackup1/backup1' [16.00 GB] inherit

nezumi backup1 # vgscan

  Reading all physical volumes.  This may take a while...

  Found volume group "vgbackup2" using metadata type lvm2

  Found volume group "vgbackup1" using metadata type lvm2

nezumi backup1 # cat /proc/partitions

major minor  #blocks  name

   3     0   58605120 hda

   3     1     136521 hda1

   3     2    2008125 hda2

   3     3    8008402 hda3

   3     4   48452040 hda4

  22     0   58605120 hdc

  22     1     136521 hdc1

  22     2    2008125 hdc2

  22     3    8008402 hdc3

  22     4   48452040 hdc4

   9     0     136448 md0

   9     2    8008320 md2

   9     1    2008000 md1

 253     0   16777216 dm-0

 253     1   16777216 dm-1

nezumi backup1 # 
```

Now for some performance numbers:

```
nezumi backup1 # hdparm -tT /dev/md2

/dev/md2:

 Timing cached reads:   276 MB in  2.02 seconds = 136.88 MB/sec

 Timing buffered disk reads:  106 MB in  3.06 seconds =  34.61 MB/sec

nezumi backup1 # hdparm -tT /dev/hda2

/dev/hda2:

 Timing cached reads:   276 MB in  2.02 seconds = 136.78 MB/sec

 Timing buffered disk reads:   90 MB in  3.04 seconds =  29.58 MB/sec

nezumi backup1 # hdparm -tT /dev/hdc3

/dev/hdc3:

 Timing cached reads:   276 MB in  2.02 seconds = 136.87 MB/sec

 Timing buffered disk reads:   88 MB in  3.04 seconds =  28.91 MB/sec

nezumi backup1 # hdparm -tT /dev/md2 

/dev/md2:

 Timing cached reads:   276 MB in  2.02 seconds = 136.49 MB/sec

 Timing buffered disk reads:  108 MB in  3.03 seconds =  35.65 MB/sec

nezumi backup1 # hdparm -tT /dev/md2

/dev/md2:

 Timing cached reads:   276 MB in  2.02 seconds = 136.72 MB/sec

 Timing buffered disk reads:  108 MB in  3.00 seconds =  35.99 MB/sec
```

And some bonnie...

```
nezumi / # bonnie -s 2047 -m nezumi-raid1

File './Bonnie.22025', size: 2146435072

Writing with putc()...done

Rewriting...done

Writing intelligently...done

Reading with getc()...done

Reading intelligently...done

              -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---

Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU

nezumi-r 2047  2637 94.2 27421 57.8 12256 18.9  2665 93.7 29279 25.7 231.7  5.9
```

The C3 motherboard isn't all that speedy on memory (136MB/s bandwidth), more modern systems are in the 2GB/s range for memory bandwidth.  Still, you should be seeing data rates in the 12-30MB/s range.

If you need, I can dig into my configuration a bit more (such as the config.gz file).

----------

## knobbo

I'm back at home and I've tried ext2 and ext3. The results are: I get 7MB/s copying with samba from ext2 and 8MB/s with ext3. Raw speed (dd if=/some/file of=/dev/null) is about 20MB/s (2.4 MB/s on the reiser partition), so I might try some samba tweaking when those 200GB have been copied of the reiserfs partition and back to the to-be-created ext3 partition.

Thanks a lot everyone! How about putting some info in the gentoo install docs that reiserfs ist not at all suited for a nearly-full partition being used by mldonkey  :Smile: 

----------

