# Bad performance on copying/moving/unt(r)aring files

## zehnan1

I have a single SATA drive on my amd64 3200+ system (1GB ram). Overall the system is fast and responsive, except when having high disk/cpu activity such as moving/copying files and especially untaring/raring files. If I start rox when copying a large file it takes about 30 seconds to start - normally it takes like 0.5 secs. If a start opera or firefox when unraring, it is a good bet it wont happen until the rar is finished... which could mean up to two minutes. Both browsers start quite fast without disk activity. MP3's occasionally skip on high disk activity, but that's not so common. 

hdarm output is fine:

```

/dev/sda:

 Timing cached reads:   3088 MB in  2.00 seconds = 1543.00 MB/sec

 Timing buffered disk reads:  172 MB in  3.00 seconds =  57.29 MB/sec

```

I've read somewhere about using dstat to track IO activity. This is the output when copying a large file. The copy started at line 5 when wai raised up to 69, is such high wai normal?

```

----total-cpu-usage---- -disk/total -net/total- ---paging-- ---system--

usr sys idl wai hiq siq|_read write|_recv _send|__in_ _out_|_int_ _csw_

  3   2  91   3   0   0| 407k  409k|   0     0 | 0.9B  5.9B|1440  4134 

  1   2  97   0   0   0|   0     0 | 244B  170B|   0     0 |1222  2808 

  3   1  96   0   0   0|   0     0 | 665B  569B|   0     0 |1243  3106 

  2   1  97   0   0   0|   0     0 | 206B  218B|   0     0 |1200  2794 

  2  28   1  69   0   0|  19M  160k|  62B   54B|   0     0 |1943  5609 

  2  33   0  65   0   0|  24M   92k| 288B  331B|   0     0 |1728  4700 

  2  37   0  61   0   0|  26M   68k|  77B  105B|   0     0 |1725  4712 

  1  34   0  65   0   0|  24M  128k|   0     0 |   0     0 |1760  4847 

  2  21   0  77   0   0|  13M 4172k| 154B  210B|   0     0 |1586  4266 

  2  27   0  71   0   0|  12M   12M|   0     0 |   0     0 |1688  5301 

  2  25   0  73   0   0|  13M   11M|   0     0 |   0     0 |1686  4879 

  2  25   0  73   0   0|  13M   10M| 712B  560B|   0     0 |1728  4826 

  2  24   0  74   0   0|  13M   12M| 139B  159B|   0     0 |1711  4566 

  3  19   0  78   0   0| 9.9M   11M| 131B  151B|   0     0 |1636  4292 

  1  20   0  79   0   0|  11M 7512k| 124B  108B|   0     0 |1624  4230 

  4  16   0  80   0   0|6484k   13M| 195B  174B|   0     0 |1609  4110 

  1  22   0  77   0   0|  11M 9876k|4420B   54B|   0     0 |1652  8043 

  2  23   0  75   0   0|  12M   12M| 343B  213B|   0     0 |1727  5096 

  2  28   0  70   0   0|  14M   12M| 154B  210B|   0     0 |1771  4754 

  2  25   0  73   0   0|  12M   14M|   0     0 |   0     0 |1726  4590 

  2  27   0  71   0   0|  13M   14M| 139B  159B|   0     0 |1728  5045 

  3  28   0  69   0   0|  15M   11M| 208B  256B|   0     0 |1757  4668 

  1  24   0  75   0   0|  12M   14M| 338B  346B|   0     0 |1738  5402 

```

This is the output for unraring.

```

----total-cpu-usage---- -disk/total -net/total- ---paging-- ---system--

usr sys idl wai hiq siq|_read write|_recv _send|__in_ _out_|_int_ _csw_

  3   2  91   3   0   0| 414k  416k|   0     0 | 0.9B  5.9B|1440  4129 

  3   1  96   0   0   0|4096B    0 | 412B  310B|   0     0 |1249  3066 

  3   4  50  44   0   0| 990k    0 | 501B  170B|   0     0 |1430  3853 

  2   1  97   0   0   0|   0     0 | 830B  538B|   0     0 |1205  2717 

 38  16  30  17   0   0|  11M   22k|  77B  105B|   0     0 |1972  5787 

 67  21   0  12   0   0|  17M   64k|   0     0 |   0     0 |1517  3919 

 66  20   0  14   0   0|  18M   40k| 738B  590B|   0     0 |1523  3961 

 64  20   0  16   0   0|  17M   56k| 218B  170B|   0     0 |1534  4000 

 66  20   0  14   0   0|  17M  136k| 387B  367B|   0     0 |1577  4102 

 68  19   0  13   0   0|  17M  124k| 155B  159B|   0     0 |1504  3885 

 46  27   0  27   0   0|  12M   11M|  71B   66B|   0     0 |1777  8107 

 43  26   0  32   0   0|  12M   13M|  77B  105B|   0     0 |1833  4811 

 47  24   0  29   0   0|  12M   12M|   0     0 |   0     0 |1682  4391 

 47  26   0  27   0   0|  13M   11M| 154B  272B|   0     0 |1887  5076 

 38  23   0  39   0   0| 9.9M   13M|  69B   97B|   0     0 |1666  5202 

 42  22   0  36   0   0|  11M   12M| 518B  488B|   0     0 |1680  4366 

 38  21   0  41   0   0|  10M   10M| 139B  159B|   0     0 |1712  4462 

 35  18   0  47   0   0|9186k   10M| 248B  216B|   0     0 |1645  6081 

 44  23   0  33   0   0|  11M   12M| 293B  369B|   0     0 |1809  5162 

 45  22   0  33   0   0|  11M   12M| 501B  391B|   0     0 |1704  4604 

 50  24   0  27   0   0|  13M   12M|5033B  498B|   0     0 |1826  5542 

 47  26   0  26   0   0|  13M   12M| 120B  108B|   0     0 |1776  4750 

 48  25   0  28   0   0|  12M   16M| 146B  202B|   0     0 |1773  5694 

```

I have a reiserFS on root, but tried also copying from and to FAT. No difference.

Can anything be done regarding this?

----------

## NeddySeagoon

zehnan1,

What preempt options do you have in your kernel ?

Post the output of 

```
grep PREEMPT /usr/src/linux/.config
```

----------

## zehnan1

 *NeddySeagoon wrote:*   

> zehnan1,
> 
> What preempt options do you have in your kernel ?
> 
> Post the output of 
> ...

 

Right now I'm using a realtime kernel for my audio work. However, I had the same issues with my old kernel. I'll boot my old kernel, and post dstat again.

```

# CONFIG_PREEMPT_NONE is not set

# CONFIG_PREEMPT_VOLUNTARY is not set

# CONFIG_PREEMPT_DESKTOP is not set

CONFIG_PREEMPT_RT=y

CONFIG_PREEMPT=y

CONFIG_PREEMPT_SOFTIRQS=y

CONFIG_PREEMPT_HARDIRQS=y

CONFIG_PREEMPT_BKL=y

CONFIG_PREEMPT_RCU=y

# CONFIG_CRITICAL_PREEMPT_TIMING is not set

```

----------

## zehnan1

Oh, I just realized I posted in a network/security forum. I apologize, it should bekernel & hardware.

----------

## NeddySeagoon

zehnan1,

Well spotted. - Moved from Networking & Security to Kernel & Hardware.

I'm not familliar with the real time kernel. Can you repeat the tsets with either gentoo-sources or vanilla sources and post your 

```
grep PREEMPT /usr/src/linux/.config
```

from there

----------

## zehnan1

I experience the same problems with gentoo-sources. Tested with 2.6.15-gentoo-r7 with voluntary preemption.

```

# CONFIG_PREEMPT_NONE is not set

CONFIG_PREEMPT_VOLUNTARY=y

# CONFIG_PREEMPT is not set

```

dstat outputs for

copying

```

----total-cpu-usage---- -disk/total -net/total- ---paging-- ---system--

usr sys idl wai hiq siq|_read write|_recv _send|__in_ _out_|_int_ _csw_

  3   4  77  16   0   0| 580k  129k|   0     0 |   0     0 |1180   235 

  3   8  32  55   0   2|  19M    0 |   0     0 |   0     0 |1571   993 

  0   9   0  87   0   4|  27M    0 |   0     0 |   0     0 |1673   822 

  1  11   0  84   0   4|  28M   44k|   0     0 |   0     0 |1729   870 

  0   6   0  92   0   2|  18M   32k|   0     0 |   0     0 |1601   571 

  0   4   0  94   0   2|  12M 2432k|   0     0 |   0     0 |1503   448 

  1   7   0  88   0   4|  13M 5936k| 128B    0 |   0     0 |1574   465 

  0   7   0  89   0   4|  14M   13M|1636B    0 |   0     0 |1660   551 

  0   7   0  89   0   4|  16M   12M|   0     0 |   0     0 |1703   618 

  1   4   0  90   0   5|  16M   11M|   0     0 |   0     0 |1712   788 

  1   7   0  87   0   5|  13M   14M| 568B    0 |   0     0 |1688   559 

  0   7   0  88   0   5|  15M   12M| 490B    0 |   0     0 |1678   628 

  0   6   0  89   0   5|  12M   12M|  78B    0 |   0     0 |1637   484 

  1   7   0  88   0   4|  10M   14M|   0     0 |   0     0 |1809   809 

  0   8   0  87   0   5|  14M   13M|   0     0 |   0     0 |1676   559 

  0   8   0  86   0   6|  16M   12M|   0     0 |   0     0 |1694   618 

  1   7   0  88   0   4|  16M   15M|  78B    0 |   0     0 |1732   706 

  0   7   0  90   0   3|  14M   12M|   0     0 |   0     0 |1650  1520 

  0   8   0  88   0   4|  14M   13M|4358B    0 |   0     0 |1689  1622 

  0   5   0  91   0   4|  14M 9704k|   0     0 |   0     0 |1632  1585 

  1   7   0  89   0   3|  13M   11M|   0     0 |   0     0 |1641  1578 

  0   8   0  89   0   3|  15M   10M| 142B    0 |   0     0 |1689  1758 

```

unraring

```

----total-cpu-usage---- -disk/total -net/total- ---paging-- ---system--

usr sys idl wai hiq siq|_read write|_recv _send|__in_ _out_|_int_ _csw_

  3   4  65  27   0   1|2030k 1761k|   0     0 |   0   219B|1253   357 

 81  10   0   8   0   1|  23M    0 |   0     0 |   0     0 |1569   451 

 81   9   0   9   0   1|  22M    0 |   0     0 |   0     0 |1617   780 

 82  11   0   7   0   0|  22M    0 |   0     0 |   0     0 |1558   627 

 78  10   0  11   0   1|  22M    0 |4358B    0 |   0     0 |1564   473 

 61  11   0  23   0   5|  19M   11M|   0     0 |   0     0 |1892   274 

 53  10   0  32   0   5|  18M   13M|   0     0 |   0     0 |2038   209 

 37   9   0  47   0   7|  12M   17M|   0     0 |   0     0 |1901   219 

 42   7   0  46   0   4|  14M   13M| 142B    0 |   0     0 |1824   271 

 37   6   0  52   0   5|  11M   15M|   0     0 |   0     0 |1794   186 

 42   6   0  48   0   4|  13M   13M|   0     0 |   0     0 |1821   197 

 45   7   0  44   0   4|  14M   12M|   0     0 |   0     0 |1854   199 

 44   8   0  42   0   6|  14M   15M|  91k   90k|   0     0 |2269   575 

 44   9   0  42   0   5|  14M   12M|  61k   59k|   0     0 |2136   450 

 39   8   0  48   0   5|  12M   16M|  68k   68k|   0     0 |2253   840 

 42   9   0  43   0   6|  14M   13M|  40k   39k|   0     0 |2183  1380 

 40   8   0  47   0   6|  13M   13M|   0     0 |   0     0 |1868  1125 

 33   9   0  54   0   4|  11M   15M| 258B  126B|   0     0 |1839   978 

 46   9   0  41   0   5|  14M   12M|   0     0 |   0     0 |1910  1261 

 34   7   0  54   0   5|  12M   13M|   0     0 |   0     0 |1817   907 

 32   9   0  55   0   4|  11M 7872k|1059B  730B|   0     0 |1764   895 

 32   7   0  57   0   4|  11M   13M|1706B 1351B|   0     0 |1789  1001 

 30   8   0  57   0   5|  10M   14M| 284B    0 |   0     0 |1794   870 

```

----------

## NeddySeagoon

zehnan1,

Turn on

```
Preemption Model -> Preemptible Kernel (Low-Latency Desktop) 

Preempt The Big Kernel Lock 
```

for better performance.

----------

## zehnan1

Thanks for suggestions Neddy, but it didn't help a bit. 

Now, I figure this could be scheduler problem.  For eyample, I have a task which completes in 5 seconds (like firefox reading its binaries and stuff) when no other task requires hard drive. When another task (like copy) is using hard drive and under asumption the harddisk IO is equally divided between the two tasks, firefox ought to start in twice as much time (plus some overhead), so a bit more that 10 seconds.  Unfortunately, it's not like that. 

I'm using cfq scheduler, which, I believe, redistributes IO equally among tasks. Am I wrong? 

I tried the same case on my freinds gentoo machine. The firefox would start slower, but cca. 2x slower, not 10+x slower as here. The only obvious difference is that I'm using a SATA drive.

Any help would be appreciated. Thanks.

----------

## NeddySeagoon

zehnan1,

Its silly question time - run uname -a

Is the date/time shown the date/time that you compiled your kernel ?

I'm using SATA here too.  How much slower it gets when HDD I/O is divided between two tasks depends on how far the disk hread has to move between the two files. I would expect the HDD access time to at least double, plus head movement time. which is 'lost' for reading and writing and thats where most of the time goes.

----------

## drvolk

Hi,

I have no SATA but "normal" IDE drives on my Asus A8R-MVP Board, with an AMD64x2 4200+.

And i have the same problems  :Sad: 

So the problem seems to be not depending on SATA configs etc.

Please let me know, when you solved your problem. I will do it also also, if i could found a solution.

----------

## moesasji

Another me too post as I experience exactly the same problems.

It's driving me nuts as I can't find any solution. 

I'm working on a AMD64 X2 4400Hz, 1GB, 1 SATA drive

(but I think my previous P4-system had similar problems, but not to this extend)

My dstat output looks very similar to that posted by zehnan1. 

Basically very high IO_wait on any program that uses IO-access.

Programs that are worst: rar, par2repair, K3B when it is checking a checksum.

Opening a simple terminal can take a minute under those conditions, which is normally <1s.

Even typing a piece of text often halts for a second. Definitely not normal on a multi-user system. 

PREEMPT kernel-options:

```
$ grep PREEMPT /usr/src/linux/.config

# CONFIG_PREEMPT_NONE is not set

CONFIG_PREEMPT_VOLUNTARY=y

# CONFIG_PREEMPT is not set

CONFIG_PREEMPT_BKL=y
```

The IO-scheduler is set to CFG...I still intend to try the anticipatory scheduler. 

But seeing responses in a similar AMD64 problem I have little hope that it solves to problem. 

So hopefully anybody has a briljant idea.

ps) I'm running a 2.6.18-gentoo kernel. Uname -a shows the correct time.

----------

## Akkara

Could it be the I/O scheduler you're using?

I had found that using the CFQ I/O scheduler works well for me.  I also do realtime audio (tho on a gentoo-sources kernel - with 1000 Hz interrupt, preemptible kernel, and preempt big kernel lock).

```
Block layer  ---> 

    IO Schedulers  ---> 

        <*> CFQ I/O scheduler

        Default I/O scheduler (CFQ)

```

System uname: 2.6.18-gentoo-r6 x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 4400+

Edit: Didn't see that moesasji already suggested CFQ and it seemed to not work for them.

However I notice a similarity:  They both are using SATA drives.

I too had trouble with SATA about a year ago.  It turned out the problem was a loose sata cable - the system kept working but there was many errors and retries which slowed everythign down.  The problem cleared up after replacing the cable. However even the new cable was still prone to causing problems if I jiggled it.  So I ended up tieing it all down with wireties and I never had this problem afterwards.

----------

## moesasji

Thanks Akkara for this suggestion.

I've tried wiggling slightly with the SATA-cable during extraction of a rar-file. 

It seems to have no effect on the output of dstat or on the responsive-ness of my system.

Anyway, if k3b is calculating the checksum for an iso-file I see reading of typically 60-70MB/s from my HD.

That is about as high as it can get. So I find it difficult to believe that the problem would be in the cable. 

Note that if I try to open a simple terminal it only shows up after K3B is finished. 

Even though the terminal should be cached in memory and not have to come from HD.

ps) I indeed have cfq set as IO-scheduler, so that is not the solution. 

----

~ $ cat /usr/src/linux/.config | grep IOSCHED

CONFIG_IOSCHED_NOOP=y

CONFIG_IOSCHED_AS=y

CONFIG_IOSCHED_DEADLINE=y

CONFIG_IOSCHED_CFQ=y

CONFIG_DEFAULT_IOSCHED="cfq"

----

----------

## Janne Pikkarainen

If everything else (I/O scheduler tuning etc) fails, one might want to try ionice command. If my memory serves me, it's available in unstable/masked sys-apps/util-linux packages.

----------

## boniek

Ionice is available in sys-process/schedutils which is not hardmasked.

----------

## ashrack

I get the same problem.

Using pata drives.

Ubuntu Edgy with vanilla kernel 2.6.19.2+reiser4+fuse261 and using the DeadLine scheduler Could the scheduler be the problem in my case>?

This is the preempt set:

```

# CONFIG_PREEMPT_NONE is not set

# CONFIG_PREEMPT_VOLUNTARY is not set

CONFIG_PREEMPT=y

CONFIG_PREEMPT_BKL=y
```

----------

## moesasji

@ashrack: Seeing the problem I don't believe it is related to IO-scheduler.

In my case I could pinpoint this problem to a specific patch introduced into the 2.6.18 kernel. 

See this post in the corresponding bugreport. 

Basically I could generate a condition that IO is still running, but the whole system stalls for ~10s. 

For me this behavior is not present in the 2.6.17 kernels....so running a 2.6.17 kernel seems the best work-around.

Unfortunately response on kernel-bugs seems extremely slow....

----------

