# Tickless system/High Res Timers - care to explain?

## micmac

Hi there,

in the upcoming 2.6.21 kernel there will be changes that I don't really get yet.

There's a new option called NO_HZ for the tickless system. It's about timers

(wake up calls) being generated only when needed instead of a static wake up

time. Here are my questions so far:

1. Will CONFIG_HZ still be used when NO_HZ is enabled? The name NO_HZ

kind of implies CONFIG_HZ won't be used, but on the other hand CONFIG_HZ

isn't hidden when NO_HZ is enabled.

2. In case CONFIG_HZ is still used when NO_HZ is enabled can a high value

like 1000 HZ damage desktop hardware (i386/amd64/ia64)? What's the point

of a high HZ value? Is it about response time only (like 1 ms vs. 10 ms)? I

tried both 100HZ and 1000HZ and I wasn't able to tell the difference.

3. High Resolution Timers: Say your machine has one (on my nforce2 board

it says 'Switched to high resolution mode on CPU 0' so I guess this one does).

Has this an impact on the tickless system? Or on anything else?

I guess that's it for now. I know these are pretty vague questions, but I read

about tickless systems and high res timers and the articles were a little over

my head. I guess I would like to know what's in it for the users.

Thanks!

----------

## rmh3093

High res timers is exactly what it sounds like, im pretty sure it allows for nanosecond resolution for timeslices.

On a system with out dynticks the kernel allocates out fixed timeslices so that info can get processed. Many times the info that needs to get processed only requires a faction of its timeslice to complete its task... On a system with out dynticks the timeslices get wasted and always consumed. On a system with dynticks however, the kernel can give out smaller timeslices to tasks that only needs small time slices. This allows you to process more info with out impacting system performance too much. It wont make your computer run cooler because the computer still has to do the same amount of work to keep your computer idle. It it designed to help out really bogged down systems (eg. multiple virtual machines, heavy multitasking).

----------

## aysther

I'm interested in hearing specific answers to the above questions, also. I'm trying to configure this, but I'm not sure if it's as simple as turning on CONFIG_NO_HZ, and that's that, or if you need to configure or disable several other options to get the "full effect" as well. I know this probably sounds like a rather uninformed question, but that's exactly why I'm asking. Thanks!

----------

## Vlad.Sharp

Anyone care to elaborate further? I've read about the benefits of having dynamic ticks, however are there any **special** options to achieve the full extent of the benefits? (as asked above). - or is enabling it enough?

----------

## micmac

Hello all,

after playing around a bit I think I'm fit to answer _some_ of my questions myself.

 *micmac wrote:*   

> 
> 
> 1. Will CONFIG_HZ still be used when NO_HZ is enabled? The name NO_HZ
> 
> kind of implies CONFIG_HZ won't be used, but on the other hand CONFIG_HZ
> ...

 

CONFIG_HZ is still used when NO_HZ=y is selected. I enabled NO_HZ and

CONFIG_HZ=100. Afterwards I changed CONFIG_HZ to 300. I started tvtime

and used its debug screen ('d') to check for timing problems. With 300 there

were none whereas with 100 tvtime had its issues displaying frames at the right

time.

 *micmac wrote:*   

> 
> 
> 2. In case CONFIG_HZ is still used when NO_HZ is enabled can a high value
> 
> like 1000 HZ damage desktop hardware (i386/amd64/ia64)? What's the point
> ...

 

I don't know if it can damage hardware. But as you can see from what I just

wrote above, there are certain treats using higher values for CONFIG_HZ. For video

and audio playback 300 seems perfect.

 *micmac wrote:*   

> 
> 
> 3. High Resolution Timers: Say your machine has one (on my nforce2 board
> 
> it says 'Switched to high resolution mode on CPU 0' so I guess this one does).
> ...

 

I don't know. I guess it's wise to enable High Res Timers in case your box

has one of them. Why not put it to use when it's there anyway.

mic

----------

## depontius

I've been playing with tickless systems for a bit now, and I'll throw in a bit of conjecture about CONFIG_HZ and NO_HZ. I might remember reading something to the effect that the kernel is only tickless when there's nothing to do. But when it's busy, it still ticks along at CONFIG_HZ, using that as a scheduling interval. Then when all of the work is done, and it's in wait-for-keypress or wait-for-packet mode, the ticks turn off and it waits for the next timer to expire, a keypress, or a packet.

Does anyone here understand the whole "deferred timers" thing? My impression is that there are various places with kernel timers used as timeouts, and by the time you've got a real-world quantity of them the kernel is waking too often. My impression is that since most of the timeouts really aren't critical, the deferred timers allow the value to be specified as non-critical, and they get fudged together. Then the system wakes once, services a bunch of timeouts, then quits ticking again.

Is special attention required to get a deferred timer instead of the old-style? I note on my laptop that at the moment the single largest cause of wake-ups is "afs_rxevent : schedule_timeout (process_timeout)", almost twice as often as the number 2 cause of wakeups, cpufreq-set. (No, I haven't gotten around to slowing the sampling time on this in /etc/conf.d/local.start.) In other words, after deferred timers get added, will we have to hammer on projects individually to get them used, or will they kind of slip in automagically?

----------

## Rob1n

 *depontius wrote:*   

> Does anyone here understand the whole "deferred timers" thing? My impression is that there are various places with kernel timers used as timeouts, and by the time you've got a real-world quantity of them the kernel is waking too often. My impression is that since most of the timeouts really aren't critical, the deferred timers allow the value to be specified as non-critical, and they get fudged together. Then the system wakes once, services a bunch of timeouts, then quits ticking again.

 

That's basically right, yes.  Normally timers are scheduled to occur after a given time (e.g. in 250ms time).  In most cases this accuracy isn't needed, if it's actually woken in 240ms or 260ms then there's no issues.  So, everything asking for a timer within a given range can be woken at the same time (so the system only has to wake up once instead of 4 or 5 times).

 *Quote:*   

> Is special attention required to get a deferred timer instead of the old-style? I note on my laptop that at the moment the single largest cause of wake-ups is "afs_rxevent : schedule_timeout (process_timeout)", almost twice as often as the number 2 cause of wakeups, cpufreq-set. (No, I haven't gotten around to slowing the sampling time on this in /etc/conf.d/local.start.) In other words, after deferred timers get added, will we have to hammer on projects individually to get them used, or will they kind of slip in automagically?

 

I believe these are purely internal timers, so it's only the kernel which needs to be updated to use them where applicable.

----------

## depontius

 *Rob1n wrote:*   

>  *depontius wrote:*   Does anyone here understand the whole "deferred timers" thing? My impression is that there are various places with kernel timers used as timeouts, and by the time you've got a real-world quantity of them the kernel is waking too often. My impression is that since most of the timeouts really aren't critical, the deferred timers allow the value to be specified as non-critical, and they get fudged together. Then the system wakes once, services a bunch of timeouts, then quits ticking again. 
> 
> That's basically right, yes.  Normally timers are scheduled to occur after a given time (e.g. in 250ms time).  In most cases this accuracy isn't needed, if it's actually woken in 240ms or 260ms then there's no issues.  So, everything asking for a timer within a given range can be woken at the same time (so the system only has to wake up once instead of 4 or 5 times).
> 
>  *Quote:*   Is special attention required to get a deferred timer instead of the old-style? I note on my laptop that at the moment the single largest cause of wake-ups is "afs_rxevent : schedule_timeout (process_timeout)", almost twice as often as the number 2 cause of wakeups, cpufreq-set. (No, I haven't gotten around to slowing the sampling time on this in /etc/conf.d/local.start.) In other words, after deferred timers get added, will we have to hammer on projects individually to get them used, or will they kind of slip in automagically? 
> ...

 

In the case of AFS, it wouldn't surprise me if the kernel module needs updating too, and it's out-of-tree. But I guess that's better than having to update userspace.

----------

## Paapaa

You guys should read these also:

http://lwn.net/Articles/223185/

http://lwn.net/Articles/240080/

Covers the basics of tickless kernel.

----------

## depontius

 *Paapaa wrote:*   

> You guys should read these also:
> 
> http://lwn.net/Articles/223185/
> 
> http://lwn.net/Articles/240080/
> ...

 

Thanks for the pointers. I'd already read the more recent one, and may have read the earlier one, but that was quite a while back. But the more recent one contains a link to deferrable timers that might be of interest, here: http://lwn.net/Articles/228143/ Still looks as if it isn't picked up for free, though.

As a side note, my laptop and deskside have been running 2.6.20 ~x86, and today they're running 2.6.20-r4 x86. In the process I discovered that my deskside wasn't running tickless, and neither machine had hi-res timers enabled. Both of those situations are corrected, now.

But things are a little odd on my laptop. Sometimes all is well and it spends most of its time in C3, but sometimes (like today) it never gets to C3.

```
    PowerTOP version 1.2       (C) 2007 Intel Corporation

Cn          Avg residency (10s) Long term residency avg

C0 (cpu running)        ( 0.2%)

C1                0.0ms ( 0.0%)                   0.0ms

C2               24.5ms (99.8%)                   6.5ms

C3                0.0ms ( 0.0%)                   0.0ms

Wakeups-from-idle per second :  40.8 

Top causes for wakeups:

  24.4% ( 2.0)       afs_rxevent : schedule_timeout (process_timeout) 

  15.9% ( 1.3)       cpufreq-set : queue_delayed_work_on (delayed_work_timer_ 

  12.2% ( 1.0)           ifplugd : schedule_timeout (process_timeout) 

  11.0% ( 0.9)       <interrupt> : ehci_hcd:usb1, uhci_hcd:usb2, uhci_hcd:usb3, uhci_hcd:usb4,  

   9.8% ( 0.8)           xfsbufd : schedule_timeout (process_timeout) 

   6.1% ( 0.5)     <kernel core> : queue_delayed_work_on (delayed_work_timer_ 

   6.1% ( 0.5)                   : e1000_intr (e1000_watchdog) 

   2.4% ( 0.2)      runscript.sh : __netdev_watchdog_up (dev_watchdog) 

   2.4% ( 0.2)     <kernel core> : page_writeback_init (wb_timer_fn) 

   2.4% ( 0.2)         automount : do_setitimer (it_real_fn) 
```

Incidentally, the "Top causes for wakeups" are pretty much the same as they were yesterday, when it was spending about 90% of the time in C3. Any ideas?

[EDIT] Next hunk of oddity... I've now got the laptop doing something. I find it mildly annoying that it's supposed to be doing something straightforward, and can only seem to apply 50% CPU to it, but that's another matter. What's odd and relevant to the current topic is what PowerTop says while it's "working":

```
    PowerTOP version 1.2       (C) 2007 Intel Corporation

Cn          Avg residency (5s)  Long term residency avg

C0 (cpu running)        (41.7%)

C1                0.0ms ( 0.0%)                   0.0ms

C2                0.5ms (58.3%)                  11.7ms

C3                0.0ms ( 0.0%)                   0.0ms

Wakeups-from-idle per second :  1059.4 

Top causes for wakeups:

  97.5% (1065.4)       <interrupt> : ehci_hcd:usb1, uhci_hcd:usb2, uhci_hcd:usb3, uhci_hcd:usb4,  

   1.5% (16.6)       <interrupt> : libata 

   0.2% ( 2.0)       afs_rxevent : schedule_timeout (process_timeout) 

   0.1% ( 1.2)       cpufreq-set : queue_delayed_work_on (delayed_work_timer_ 

   0.1% ( 1.0)           ifplugd : schedule_timeout (process_timeout) 

   0.1% ( 1.0)              tail : do_nanosleep (hrtimer_wakeup) 

   0.1% ( 1.0)             clsbd : schedule_timeout (process_timeout) 

   0.1% ( 0.8)           xfsbufd : schedule_timeout (process_timeout) 

   0.1% ( 0.6)                   : e1000_intr (e1000_watchdog) 

   0.0% ( 0.4)               top : schedule_timeout (process_timeout) 
```

For some reason, the fact that the CPU is actually doing work, which of course puts it in C0, is being chalked up as USB wakeups! By the way, this laptop is in a port replicator, cover closed, and all access is over the network. The network card is a mini-PCI hardwired, not USB, and the wireless is currently not being used.

----------

## Bones McCracker

 *rmh3093 wrote:*   

> It wont make your computer run cooler because the computer still has to do the same amount of work to keep your computer idle.

 

I don't understand the above.  I keep reading that Tickless/Highres operation dramatically reduces power consumption.  Very nearly all Power-In is converted to Heat-Out.  Therefore, it should make the machine run cooler.

I have a couple other questions to toss into the mix.  (Yes, I'm googling, but the answers to these might benefit others as well.)    :Smile: 

Also, what have people seen in terms of benchmarks?  Here's at least one test that shows virtually zero effect.

http://www.phoronix.com/scan.php?page=article&item=651&num=1

Without running Powertop, are there some common-sense rules that apply?  For example, are there certain daemons that are so demanding in timer access that it obviates any potential benefit (e.g., ntpd or whatever).  (There's one list at http://www.linuxpowertop.org/known.php).

What acpi versions and features are required to take advantage of it?

Thanks.

----------

