# Disk I/O locks up system

## chaoscommander

I often get lockups/hangs that last for a few seconds to several minutes, whenever a process has to do a lot of disk I/O. I took a look with iotop to see if it's one specific process hogging the disk. Sometimes it's a regular update process, like mlocate, makewhatis, baloo etc., but sometimes (especially when waking up from hibernation) it's a bunch of userspace processes (firefox, libreoffice, kde itself, whatever has been hibernating) doing whatever it is they are doing with the complete HDD bandwidth.

I first suspected it was due to my machine being a little older, but it's always the disk, not the CPU that is at 100% capacity. It also happens with a fairly fast ThinkPad that has a SSD in it (albeit more rarely, so far only when large packages are being merged). Is there a way to reconfigure the IO scheduler so it will allocate bandwidth more equally instead of letting single threads block the entire disk? I'm using CFQ I/O scheduler.

Or am I looking for the problem in an entirely wrong direction?

----------

## eccerr0r

If your machine completely hangs, I'd say it's a hardware issue and perhaps it's time the machine needs to be replaced.  My machines can take 100% disk i/o load just fine.  Is it just Gentoo that exhibits this failure mode?  When you run 'dmesg' if it comes back from hanging, does it report something?

If it's just "so slow that there's virtually no forward progress" that's a different case.  There's ionice or you can make cgroups to throttle but you can't use this on swapping and kernel (resume from disk) so if it's crashing during either of these, see above.

Make sure you're using the right kernel disk io driver too, or try a different one (use specific driver instead of generic, or vice versa).

----------

## chithanh

The system should not lock up fully, even during disk I/O.

Try to enable voluntary or full kernel preemption. Also in my experience, the deadline scheduler behaves better with heavy I/O load and interactive processes.

----------

## Goverp

I suspect the problem is swapping.  Using firefox and libreoffice on my (1 Gb RAM) laptop shows typical symptoms: you can use one or the other continuously, but changing between them causes the disk light to come on hard, and a minute or so before it the system starts to respond again.

I have a suspicion that Javascript is implicated somewhere in firefox - my laptop frequently slows right down on some web pages that appear pretty innocuous, so I wonder if there are memory leaks or alternatively wasteful algorithms that run fast on machines with lots of RAM but slow where RAM is constrained.

----------

## Ant P.

 *chaoscommander wrote:*   

> Is there a way to reconfigure the IO scheduler so it will allocate bandwidth more equally instead of letting single threads block the entire disk? I'm using CFQ I/O scheduler.
> 
> Or am I looking for the problem in an entirely wrong direction?

 

No, you're looking in the right place. CFQ is an utter trainwreck for desktop use; switch to BFQ (available in any sufficiently advanced patchset - genpatches has it) and these problems will be far less frequent.

----------

## Cyker

Also, check the power management setup for your Thinkpad; Some of them, esp. newer ones, have known issues with ASPM which can cause weird problems like this.

I had to disable ASPM completely because every time I plugged in an ExpressCard it would make the whole system 'pause' every couple of seconds.

(Mine is an X230 for reference)

----------

## Mr. M

I'm also seeing lock-ups on a thinkpad t420s. This started happening after upgrading my system a few days ago (kernel from 3.14.33 to 3.18.7; I hadn't updated my system for a while). The system seems to hang freeze completely when e.g. emerging packages in a KDE terminal. The emerge job continues (I see the HD led blinking) but everything else is frozen (mouse pointer does not move, cannot switch windows using Alt-Tab, cannot switch VT using e.g. Alt-F1). The system does come back after a while (5-10 minutes).

I never had this problem before to this degree, so I wonder if it is just the IO scheduler (did it get worse for desktop use recently?). I will try using the BFQ scheduler. What is the best way for enabling this? Is it just USE=experimental for gentoo-sources and then enable it in the kernel config (where?)?

----------

## chaoscommander

 *Quote:*   

> If it's just "so slow that there's virtually no forward progress" that's a different case.

 

This is the case. I can still move the mouse, but it will take seconds to react and/or feel like moving through syrup. The desktop doesn't respond, switching to a TTY takes up to tens of seconds.

 *Quote:*   

> Try to enable voluntary or full kernel preemption. Also in my experience, the deadline scheduler behaves better with heavy I/O load and interactive processes.

 

Kernel is fully preemptible.

 *Quote:*   

> No, you're looking in the right place. CFQ is an utter trainwreck for desktop use; switch to BFQ (available in any sufficiently advanced patchset - genpatches has it) and these problems will be far less frequent.

 

Okay. Trying BFQ and/or deadline goes to the to-do list, I will report..

Mr. M's problem sounds very similar to mine, it also just started a while ago.

 *Quote:*   

> I will try using the BFQ scheduler. What is the best way for enabling this? Is it just USE=experimental for gentoo-sources and then enable it in the kernel config (where?)?

 

Yes, that's it, found it at "Enable the Block Layer -> IO Schedulers"

----------

## Mr. M

I noticed that the "baloo_file_extr" process was constantly running and creating a lot of IO. After disabling it, my system seems to be more responsive again.

You can disable it by adding "Indexing-Enabled=false" to ~/.kde4/share/config/baloofilerc

----------

## Mr. M

Switching to the BFQ scheduler made a huge difference in terms of responsiveness  :Smile: .

----------

## chaoscommander

I second that. It still isn't perfect, but much, much better. Waking up from hibernation now is about 5 times faster. (1 minute instead of 5).

----------

## chaoscommander

Hm.. it still hangs massively on big emerges, though.

I just found this in dmesg, about 20sec after wakeup from hibernation, but I'm not sure if it's related.

```
[192649.197026] ------------[ cut here ]------------

[192649.197039] WARNING: CPU: 1 PID: 2917 at drivers/gpu/drm/i915/intel_display.c:901 intel_wait_for_vblank+0x1ed/0x200()

[192649.197042] vblank wait on pipe A timed out

[192649.197046] CPU: 1 PID: 2917 Comm: upowerd Tainted: G        W      3.18.9-gentoo_chaosbox64 #1

[192649.197048] Hardware name: Zepto Orion/Zepto, BIOS A15 27/08/2008

[192649.197051]  0000000000000009 ffff88013a60bb18 ffffffff818ed125 0000000000000001

[192649.197056]  ffff88013a60bb68 ffff88013a60bb58 ffffffff8104c637 0000000000000000

[192649.197061]  ffff8800bac10000 0000000000070040 000000000000000a 000000010b77040c

[192649.197066] Call Trace:

[192649.197073]  [<ffffffff818ed125>] dump_stack+0x4f/0x7c

[192649.197079]  [<ffffffff8104c637>] warn_slowpath_common+0x77/0xa0

[192649.197083]  [<ffffffff8104c6d1>] warn_slowpath_fmt+0x41/0x50

[192649.197087]  [<ffffffff8149248d>] intel_wait_for_vblank+0x1ed/0x200

[192649.197092]  [<ffffffff814cceef>] intel_tv_detect+0x21f/0x590

[192649.197098]  [<ffffffff8142703c>] status_show+0x3c/0x80

[192649.197103]  [<ffffffff814d66db>] dev_attr_show+0x1b/0x60

[192649.197108]  [<ffffffff81338897>] ? debug_smp_processor_id+0x17/0x20

[192649.197114]  [<ffffffff811c69df>] sysfs_kf_seq_show+0xaf/0x140

[192649.197118]  [<ffffffff811c561b>] kernfs_seq_show+0x1b/0x20

[192649.197123]  [<ffffffff811758aa>] seq_read+0xea/0x370

[192649.197127]  [<ffffffff811c5d55>] kernfs_fop_read+0xf5/0x160

[192649.197137]  [<ffffffff81153b87>] vfs_read+0x97/0x180

[192649.197140]  [<ffffffff81154131>] SyS_read+0x41/0xb0

[192649.197143]  [<ffffffff818f61d6>] system_call_fastpath+0x16/0x1b

[192649.197145] ---[ end trace 6303ad86b167098e ]---
```

----------

## chaoscommander

That trace dump has not recurred -> seems unrelated.

Still: my responsiveness issues persist, albeit to a lesser extent than before. Does anyone have any other ideas?

----------

## Yamakuzure

 *Mr. M wrote:*   

> Switching to the BFQ scheduler made a huge difference in terms of responsiveness .

 I can not concur. My system is on ZFS and I have a VMware Workstation with Windows 7 running when I am at work, and using BFQ is a huge pain. Just switching between the vmware and any other window takes 20-30 seconds with BFQ, but <1 second with CFQ.

Has anybody had the same experience?

----------

## kernelOfTruth

 *Yamakuzure wrote:*   

>  *Mr. M wrote:*   Switching to the BFQ scheduler made a huge difference in terms of responsiveness . I can not concur. My system is on ZFS and I have a VMware Workstation with Windows 7 running when I am at work, and using BFQ is a huge pain. Just switching between the vmware and any other window takes 20-30 seconds with BFQ, but <1 second with CFQ.
> 
> Has anybody had the same experience?

 

Looks like you're experiencing quite the opposite

Can you please report this to the BFQ (bfq-iosched) Mailing list ?

Ideally with a reproducer ?

https://groups.google.com/forum/#!forum/bfq-iosched

Paolo Valente (who is probably the most widely known in relation to BFQ) and the others working on it & using it would highly appreciate (me too   :Wink:   ), I'm sure

----------

## shazeal

 *Yamakuzure wrote:*   

>  *Mr. M wrote:*   Switching to the BFQ scheduler made a huge difference in terms of responsiveness . I can not concur. My system is on ZFS and I have a VMware Workstation with Windows 7 running when I am at work, and using BFQ is a huge pain. Just switching between the vmware and any other window takes 20-30 seconds with BFQ, but <1 second with CFQ.
> 
> Has anybody had the same experience?

 

I have the same issue just not really looked into it as Ive been coding most of the time recently so not run into it much. I switched to ZFS a few weeks ago for all my disks root included. And have noticed when the computer is under heavy heavy load with Virtualbox running I get large delays before actions are completed, windows/bash/other IO tasks etc.

I never thought to attribute it to BFQ since its always solved the problem  :Wink: 

----------

## shazeal

Found the solution to the ZFS issue! Its not BFQ's fault at all.

https://bbs.archlinux.org/viewtopic.php?id=196439

 *Quote:*   

> Thanks for the quick response and suggestion. Quick googling revealed it is ZFS that sets "noop" scheduler as long as its pools are occupying the whole disk device.

 

So if you have ZFS as a partition rather than full disk you need to force the noop scheduler.

I have root/swap/zfs partitions on my root drive, hence it was selecting BFQ as the scheduler.

----------

## kernelOfTruth

 *shazeal wrote:*   

> Found the solution to the ZFS issue! Its not BFQ's fault at all.
> 
> https://bbs.archlinux.org/viewtopic.php?id=196439
> 
>  *Quote:*   Thanks for the quick response and suggestion. Quick googling revealed it is ZFS that sets "noop" scheduler as long as its pools are occupying the whole disk device. 
> ...

 

Oh - I thought this was known or given   :Embarassed: 

I forgot that this also took me quite some time to stumble upon this   :Laughing: 

```
echo "bfq" > /sys/module/zfs/parameters/zfs_vdev_scheduler
```

----------

## shazeal

 *kernelOfTruth wrote:*   

> 
> 
> Oh - I thought this was known or given  
> 
> I forgot that this also took me quite some time to stumble upon this  
> ...

 

This confuses me more   :Laughing: 

So the system scheduler should be noop, but you can set the vdev scheduler to bfq??

What does the vdev scheduler normally use? I thought ZFS had its own scheduler? I dont see any documentation on this, apart from the man page which only contains this...

 *Quote:*   

> zfs_vdev_scheduler (charp)
> 
>                    I/O scheduler
> 
>                    Default value: noop.

 

So verbose   :Rolling Eyes: 

----------

## energyman76b

 *Yamakuzure wrote:*   

>  *Mr. M wrote:*   Switching to the BFQ scheduler made a huge difference in terms of responsiveness . I can not concur. My system is on ZFS and I have a VMware Workstation with Windows 7 running when I am at work, and using BFQ is a huge pain. Just switching between the vmware and any other window takes 20-30 seconds with BFQ, but <1 second with CFQ.
> 
> Has anybody had the same experience?

 

with zfs please use NO io scheduler.

----------

## kernelOfTruth

 *energyman76b wrote:*   

>  *Yamakuzure wrote:*    *Mr. M wrote:*   Switching to the BFQ scheduler made a huge difference in terms of responsiveness . I can not concur. My system is on ZFS and I have a VMware Workstation with Windows 7 running when I am at work, and using BFQ is a huge pain. Just switching between the vmware and any other window takes 20-30 seconds with BFQ, but <1 second with CFQ.
> 
> Has anybody had the same experience? 
> 
> with zfs please use NO io scheduler.

 

meaning ?

blk-mq ?

setting it to noop ? (noop still does some minor ordering)

Thanks !

----------

## energyman76b

noop

----------

## Yamakuzure

Strange. I will try with noop, then. But why is CFQ working just fine?

Has anybody ever compared using noop versus injecting using bfq to zfs_vdev_scheduler?

Edit: KernelOfTruth wrote: "when using BFQ, switching the vdev-scheduler to BFQ also should significantly reduce latency: echo bfq > /sys/module/zfs/parameters/zfs_vdev_scheduler"

So, better than noop? Riddles...

----------

## haarp

Also, if you're using an SSD and ext4 wit hthe discard option, deleting lots of files can cause considerable delays as they get TRIMed from the disk.

----------

## krinn

Or maybe you are blaming the wrong thing?

While scheduler should switch to each task and give it an amount of time to work, if the scheduler is unable to switch, you can use whatever scheduler you wish, it won't switch.

Why it couldn't switch, if your hardware is busy and block interrupt, scheduler (the one you wish) must wait the hardware to stop doing that.

What you should look at? Stuff like enabling MSI, use APIC instead of PIC, irq conflict for older hw (one using PIC only)...

If you have IRQ conflict and two devices try to work, one device may hide the other device request and worst may re-run the first one.

Say a mouse and hdd, when your hdd is writing and the mouse is moved, the mouse movement may get just ignore because every time the mouse move, the conflict might gives hand to the hdd again. Result: hdd is busy writing, mouse move, hdd is ask to continue while mouse is ignore. And user may see hdd doing its work, while mouse get no answer at all, in the mean time, poor scheduler is waiting the hdd to end before it can work.

Of course a mouse have no IRQ, but usb mouse are using the usb controller irq, and lame motherboard are used to shared usb controller with hdd controller.

You should also look at broken apic/pic option in kernel (CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS), and maybe using hpet that would gave a finer timer resolution and may help scheduler balancing.

If you lack memory and the task was swap, there's nothing you can do, the hdd must read the swap to restore the memory because the scheduler is switching to it, in this case, i suppose if noop scheduler is doing less switching, then it would be the best to use.

For your information, i just never had any problem like this using deadline scheduler, however i don't think the scheduler type makes any difference there (as long as memory is not an issue).

----------

## energyman76b

I had severe latency problems with disk io in the past. Crippling problems.

I tried EVERYTHING. Besides, IRQs were ok.

The solution: noop io-scheduler+zfs for rotating rust platters & noop io-scheduler+ssd

----------

## Yamakuzure

First, thank you for your suggestions, krinn! *krinn wrote:*   

> Or maybe you are blaming the wrong thing?
> 
> While scheduler should switch to each task and give it an amount of time to work, if the scheduler is unable to switch, you can use whatever scheduler you wish, it won't switch.
> 
> Why it couldn't switch, if your hardware is busy and block interrupt, scheduler (the one you wish) must wait the hardware to stop doing that.

 Fat chance if one scheduler is doing fine while the other hangs and hangs and hangs... *tehe* well, at least in my case it certainly was caused by ZFS not using bfq, so over all bfq mainly worked keeping up never doing any good. (Or something like that...) *krinn wrote:*   

> What you should look at? Stuff like enabling MSI, use APIC instead of PIC, irq conflict for older hw (one using PIC only)...

 Dell Precision M4800, one year old, utilizing MSI(-X), APIC and irqbalance is running. *krinn wrote:*   

> If you have IRQ conflict and two devices try to work, one device may hide the other device request and worst may re-run the first one.
> 
> Say a mouse and hdd, when your hdd is writing and the mouse is moved, the mouse movement may get just ignore because every time the mouse move, the conflict might gives hand to the hdd again. Result: hdd is busy writing, mouse move, hdd is ask to continue while mouse is ignore. And user may see hdd doing its work, while mouse get no answer at all, in the mean time, poor scheduler is waiting the hdd to end before it can work.
> 
> Of course a mouse have no IRQ, but usb mouse are using the usb controller irq, and lame motherboard are used to shared usb controller with hdd controller.
> ...

 Yes, of course, although then no scheduler would do just fine.

However:

```
 # grep REROUTE .config

CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS=y
```

 *krinn wrote:*   

> If you lack memory and the task was swap, there's nothing you can do, the hdd must read the swap to restore the memory because the scheduler is switching to it, in this case, i suppose if noop scheduler is doing less switching, then it would be the best to use.

 Well, 32GiB Ram, 8x6GiB ZRAM Swap, 1x8GiB Swap partition. I can link qtwebkit:4, qtwebkit:5, firefox and libreoffice at the same time (build directory is on tmpfs) plus running a Windows 7 VM with 8 GiB RAM without even touching swap.

 *krinn wrote:*   

> For your information, i just never had any problem like this using deadline scheduler, however i don't think the scheduler type makes any difference there (as long as memory is not an issue).

 The deadline scheduler is the one that I never tried. Maybe I give it a shot.

----------

## Yamakuzure

So. I tried "noop", but it did not perform well under heavy disk load. Under normal load everything was as snappy as it can get.

Then I tried BFQ again but set the zfs scheduler to use bfq, too. And I guess I will keep this. My system is responsive in all situations I had, yet. (Including starting a windows 7 vm while copy syncs 3GB Data and two large directories getting synced with rsync.)

Wow!

I think this detail about echo'ing "bfq" to /sys/module/zfs/parameters/zfs_vdev_scheduler should be added to the ZFS Entry in the gentoo wiki, shouldn't it?

----------

## chaoscommander

 *haarp wrote:*   

> Also, if you're using an SSD and ext4 wit hthe discard option, deleting lots of files can cause considerable delays as they get TRIMed from the disk.

 

I just returned here because the problem is still annoying me a lot. How did I miss this? Switching to cronjob fstrim, let's see!

----------

## chaoscommander

 *chaoscommander wrote:*   

>  *haarp wrote:*   Also, if you're using an SSD and ext4 wit hthe discard option, deleting lots of files can cause considerable delays as they get TRIMed from the disk. 
> 
> I just returned here because the problem is still annoying me a lot. How did I miss this? Switching to cronjob fstrim, let's see!

 

That was the solution. Awesome!

----------

