# 2.6.32 Low Latency CFQ IO

## Cyker

Has anyone played with this?

My system's been a bit nippier since I upgraded the kernel to 2.6.32, but I noticed every now and then I'd get a massive spike (Well... more of a rectangle...) of IOwait and anything that touched the disk in this time would just freeze until the iowait storm finished.

I suspect it's some interaction with the software RAID and the low latency CFQ IO scheduler; I find if I disable the low-latency mode or disable CFQ altogether, my system goes back to its old behaviour where heavy disk IO will chug things a bit but nothing actually locks up for any appreciable time.

If this is the trade-off, I'm leaving low-latency mode off!!

Edit: I spoke too soon! Even with low-latency mode off, I'm experiencing the giant IOwait rectangles!  :Laughing: 

Some change in the IO scheduling really seems to have trouble with huge sustained IO requests?

----------

## StringCheesian

What filesystem(s)?

I tried btrfs with compression and experienced something similar, everything would randomly (every 1 to 30 minutes) pause for about 5 seconds. I never saw anything like that with ext4.

----------

## Cyker

This is on plain ext4 (Backward-compatibility mode; No extents etc.) on an mdadm RAID5 array.

It only happens when the file system is under a lot of pressure, e.g. copying huge (100GBs) files. Maybe the fs needs a source quench notification?  :Razz: 

----------

## StringCheesian

If you're on 64 bit, do the complaints posted in this massive thread sound familiar?

"AMD64 system slow/unresponsive during disk access..."

Part 1: https://forums.gentoo.org/viewtopic-t-482731.html

Part 2: https://forums.gentoo.org/viewtopic-t-793263.html

----------

## Cyker

Well, it's not quite the same: Their problem is that every time the disk is accessed their systems chug up; Mine is only related to when the disk is being overloaded with IO requests.

In earlier kernels, the CFQ IO scheduler would still fit in other processes access to the disk, but for somewhere between 2.6.30 and .32, processes are being starved during heavy IO load.

All I did was cp a 100GB file and all other processes slowly locked up as they went to access the disk. I might try it with no IO scheduler (Or maybe the deadline?) next time to see how it compares?

----------

## DigitalCorpus

Change Kernel timer Hertz to 1000Hz and patch your kernel with BFS or start using zen-sources. I'm 22 days stable on 2.6.32-zen4, 64-bit, C2Q @ 3.2GHz. Your jaw may drop at system responsiveness.

Zen-kernel

Gentoo and Zen

BFS

----------

## Cyker

Sorry, to clarify, I was talking about I/O scheduling, not process scheduling (TBH I don't know how much effect the process scheduler would have, as the bulk file copying conditions causing these problems shouldn't need that much CPU time?)

The problem basically boils down to the CFQ IO scheduler starving some of my tasks, when its raison d'etre is to aggressively avoid starvation!!

----------

## kernelOfTruth

 *Cyker wrote:*   

> Sorry, to clarify, I was talking about I/O scheduling, not process scheduling (TBH I don't know how much effect the process scheduler would have, as the bulk file copying conditions causing these problems shouldn't need that much CPU time?)
> 
> The problem basically boils down to the CFQ IO scheduler starving some of my tasks, when its raison d'etre is to aggressively avoid starvation!!

 

well you noticed that it's failing at its task (avoiding complete starvation) - there hasn't be a de facto "fix" yet

you could try tweaking:

/proc/sys/vm/dirty_background_ratio

/proc/sys/vm/dirty_ratio

to very low values which would hit throughput but improve responsiveness and minimize those spikes

anyways: it's not an IO scheduler problem only (at least on 64bit) and can be improved a lot with running BFS, CFQ with low latency and tweaking some knobs

----------

## DigitalCorpus

 *Cyker wrote:*   

> Sorry, to clarify, I was talking about I/O scheduling, not process scheduling (TBH I don't know how much effect the process scheduler would have, as the bulk file copying conditions causing these problems shouldn't need that much CPU time?)
> 
> The problem basically boils down to the CFQ IO scheduler starving some of my tasks, when its raison d'etre is to aggressively avoid starvation!!

 

I completely understand, but copying files does involve the kernel spinning it's wheels here and there which results in CPU time. I go through about 100 GB a week. BFS significantly reduces the halt on the system caused by heavy I/O. Increasing system Hz allows better timing interruptions where the user can make input, thus you don't have that spike where you cannot do anything.

----------

## Cyker

The problem doesn't 'feel' the same as the one the 64-bit people have; During normal use everything's fine right up until I literally saturate the disks with I/O; As soon as that happens, cp (or whatever) hogs the disks until completion and after a few minutes, any app that wants to access the disk blocks/stalls until the op is over! They can't even redraw their windows!

If they don't need to access the disk, they run perfectly smoothly - No jerkiness or anything! I can still wave the cursor around, type stuff in, shoot bill; As long as the app doesn't try to touch the disks!

This is not how the disk subsystem used to behave; Previously I could thrash all the IDE and SATA disks on the box to an inch of their lives and I still wouldn't get this application blocking that I get now, so I suspect it's something changed/introduced between 2.6.29 and 2.6.32-r2...

The behaviour is very 'dumb' - You start hammering the disks and the system is smooth right up to a point, then suddenly IOwait shoots up, maxing both cores (!!) until all 100GB or whatever have been copied; It's like it's storing up all the write requests until it runs out of buffer then, instead of gracefully sending them out or regulating the IO, it floods the disks...?

----------

## kernelOfTruth

how big is your quantum?

/sys/block/sd*/queue/iosched/quantum ?

----------

## Cyker

They are all 4

(I don't think I've never played about with any of the sysfs stuff for block devices except for RAID stuff)

----------

