# Optimizing for system responsiveness

## helmers

To increase your system responsiveness, you can lower the latency of the kernel scheduler. (with lolo-sources, you can set a numerical value. The default is 100, I use 1500 right now, which might be a bit too much.)

The second thing, is to set a shorter read-ahed value with hdparm. A value of 2 will be okay.

"hdparm -a2 /dev/hda", where /dev/hda is your hard drive.

You should be awere, that this increases system overhead, and overall speed. If you want it the other way, set read-ahed much higher, up to 255, which is max. 128 is a good value for servers.

"hdparm -a128 /dev/hda",  where /dev/hda is your hard drive.

And finally, since the biggest slowdown in most systems is the hard drive, the "-Os" compiler flag is a very good one. It makes smaller executables, which means less memory and less to read from the HD. To prove that it is a good one, the gentoo-sources also uses this flag.

Please let me know if you have any comments, the reason I'm posting is because I've been wondering about these things myself, hope you find it useful.

--

Regards,

Helmers

----------

## pjp

Wouldn't compiling for speed, as opposed to Os, be more of an advantage with modern hardware?

----------

## rac

 *kanuslupus wrote:*   

> Wouldn't compiling for speed, as opposed to Os, be more of an advantage with modern hardware?

 

People can and probably have written some PhD theses on similar subjects.  Optimization is hard.  Two places I'm aware of where you typically see compilers trade speed for space is in inlining functions and unrolling loops.  The idea is to avoid the overhead of subroutine calls, setting up local stack frames, jumping, paging, etc. in the inlining case, and to avoid the comparison and branch steps in the loop unrolling.

Loops are not as costly as they used to be on earlier processors, thanks to speculative execution and branch prediction.  Overaggressive inlining and unrolling can actually hurt perfomance, if things start spilling out of caches.

And every time I've tried to really optimize code, it came down to a few functions that needed special treatment.  Choices that might be appropriate for those few inner loops would not be so for the entire program - it might bloat it so badly that it would take forever to load from disk and would cause the system to thrash unnecessarily.

helmers, do you have information as to why the gentoo-kernel patches (18_gcc3-compile-opts in particular) prefers -Os to -O2?  I wonder if it might be because of GCC 3.x bugs wrt optimization?

----------

## borenson

its set for -Os because it minimises the overhead due to cache invalidation of the kernel butting in 100 (or 1500!) times a second

----------

## ghetto

This is just my humble opinion but usual when it comes to such things the sensible path lies somewhere in the middle.

Options such as -funroll-loops and -O3 supposedly speed stuff up by optimizing the binarry itself, which has the sorry side effect of causing a great amount of bloat. The -Os optimization makes smaller binary's and thus they can load much faster because of the reduced footprint, but the program itself will not be as responsive when compared to something that is compiled with -O3. In either case, personally, i can barely "notice" the difference on a modern piece of hardware. Although i know there is a difference because I have experimented with different compiler options and compared the resulting binary sizes and compared how long it take to load the program, however judging the program responsivness once loaded is a bit harder as it would require some actual benchmarking.

Im not a compiler buff, feel free to refute me if im wrong. However please provide explaination if you do so.

my cflags:

"-march=athlon -O2 -pipe -frerun-loop-opt -frerun-cse-after-loop

-fexpensive-optimizations -fprefetch-loop-arrays -falign-functions=4

-Wno-deprecated"

With these options i believe ive managed to create a binary that is about 1/10th smaller but just as responsive once loaded(they even load faster) than something compiled with plain:

"-march=athlon -03 -pipe"

The flag im most suspicios of is '-falign-functions=4' ive read what it says about it in the manual but if someone could actually put it in simple terms and give an example of how this one works i would be much obliged, but i do know it is doing something good because my binarys are more responsive with out size increase.

----------

## kerframil

1500 sounds a bit excessive. YMMV, but I wouldn't go above 1000 unless there's a strong indication that it is beneficial, and would recommend starting at around the 500 mark. Red Hat uses 512hz for i686 kernels and personally, I trust Red Hat to pick something sensible.

----------

## idl

You can also try using a different filesystem ReiserFS and XFS are fast popular alternatives to the extX filesystem.

----------

## PhilCl

to illustrate the point - I used all the optimisations I could. -funroll-loops etc inc options like foptimizesibling calls, It's great fome some samll routines but when applied to Xfree - It had a mem footprint of 80Mb - This all comes down to the tradeoff between cache size, cache architecture  (hence alignment )  and the latency difference between registers cache memory and storage systems. 

My feeling is that for a small function or program which fits onto a few cache lines - optimize it to hell but as soon as it becomes part of a larger program it causes more problems that it's worth,  <it's only really better if the routine is run many times hence the advantage is worthwile otherwise the penalty for a cache miss is too large>

hope that makes a bit of sense - it's the computer architecture problem

----------

## keratos68

 *rac wrote:*   

>  *kanuslupus wrote:*   Wouldn't compiling for speed, as opposed to Os, be more of an advantage with modern hardware? 
> 
> People can and probably have written some PhD theses on similar subjects.  Optimization is hard.  Two places I'm aware of where you typically see compilers trade speed for space is in inlining functions and unrolling loops.  The idea is to avoid the overhead of subroutine calls, setting up local stack frames, jumping, paging, etc. in the inlining case, and to avoid the comparison and branch steps in the loop unrolling.
> 
> Loops are not as costly as they used to be on earlier processors, thanks to speculative execution and branch prediction.  Overaggressive inlining and unrolling can actually hurt perfomance, if things start spilling out of caches.
> ...

 

Consider the following please:

o   That larger (optimised via inline/unroll) code requires significant symbol resolution and lookups by the dynamic linker.

o   Larger code occupies more disk space. Disk blocks may not (usually not) contiguous however this can be mitigaed to a point by sound partitioning principles - e.g 'tmp' directories mounted on different filesystem than 'data' directories. More blocks=longer load times=more disk/CPU/swap activity.

o   Optimisations can be overly-aggressive, leading to seg faults or unstable O/S. Not always the case, but "-O4" has demonstrated (to me) that a number of servers/workstations here 'can' become 'unfriendly'.

Of course, all these factors can be mitigated by additional/upgraded hardware - I find that 2GB RAM and DualCPU mbo's assists in reducing such overheads. For those on the other side of the bleeding-edge, may I suggest, as our colleauge above does, the "-Os" option.

I am currently in the position of documenting this topic as part of my PhD, a white-paper to be published mid-year at Kings College London (KCL), Department of Computer Science. Naturally this addresses not just Gentoo, not even Linux, but current methods & practice employed by todays O/S's.

----------

## red_over_blue

Dazzle68,

You obviously seem to know what you are talking about.  What cflags would you recommend for a 1.4GHz Athlon T-Bird with 512 megs of ram and 1024 megs of swap?  I would like stability over speed, but currently only use

CFLAGS="-mcpu=i686 -O2 -pipe -fomit-frame-pointer"

since I was reading another post about -O3 introducing some kind of software/hardware performance deficiency as compared to -O2.

I know I could read the entire man page for gcc... but it is very cryptic to someone who is not a fulltime/hobby programmer.  I have programing knowledge, but not to that extent.

Thanks for any reply.

----------

## kerframil

 *Quote:*   

> o   That larger (optimised via inline/unroll) code requires significant symbol resolution and lookups by the dynamic linker.

 

Very interesting indeed.

 *Quote:*   

> o   Larger code occupies more disk space. Disk blocks may not (usually not) contiguous however this can be mitigaed to a point by sound partitioning principles - e.g 'tmp' directories mounted on different filesystem than 'data' directories. More blocks=longer load times=more disk/CPU/swap activity.

 

I agree with this. It's all too easy to just put the entire root filesystem on one partition, but given enough space there are some pretty legitimate reasons for doing so. I think some people are put off by a lack of basic grounding in terms of choosing appropriate partition sizes for the various elements of the filesystem layout. Furthermore, I am slightly bothered by Gentoo's habit of keeping certain things outside of /var when it probably shouldn't - although one can modify things easily enough.

 *Quote:*   

> Of course, all these factors can be mitigated by additional/upgraded hardware - I find that 2GB RAM and DualCPU mbo's assists in reducing such overheads. For those on the other side of the bleeding-edge, may I suggest, as our colleauge above does, the "-Os" option.

 

I was wondering, is it not the case that simply "-O" can be a reasonable compromise also?

----------

## keratos68

 *red_over_blue wrote:*   

> Dazzle68,
> 
> You obviously seem to know what you are talking about.  What cflags would you recommend for a 1.4GHz Athlon T-Bird with 512 megs of ram and 1024 megs of swap?

 

Gosh, I'm only one of many SW and Sys Engineers here, the beauty of Engineering, is that it is artistic, innovative and a TEAM EFFORT. I'm new to Gentoo but do have experience in *nix,AS400,RISC,Windows,CPM and PRIME. I think it would be innappropriate for me to recommend a configuration for a system that I have had little experience with - in terms of architectural analysis. I think what you have red_over_blue is perhaps inline with your desire/goal for stability+performance. There are many flags that can be employed to inact various performances by the GCC compiler, and you are spot-on - there's so many I think one could devote a "lifetime" - I've spent the best part of 26 months to-date on it, but I'll be throwing the towel in soon  :Smile: 

 *kerframil wrote:*   

> I agree with this. It's all too easy to just put the entire root filesystem on one partition, but given enough space there are some pretty legitimate reasons for doing so. I think some people are put off by a lack of basic grounding in terms of choosing appropriate partition sizes for the various elements of the filesystem layout. Furthermore, I am slightly bothered by Gentoo's habit of keeping certain things outside of /var when it probably shouldn't - although one can modify things easily enough.

 

Totally agree - further, I don't know about you Kerframil but I believe that perhaps Gentoo relies too heavily on use of */share , /etc directories in the "end-user" configuration rather than leaving these for BASE Gentoo System stuff. For example, perhaps components such like DNS,NAT,NETFILTER,IPCFG,Sound etc. would be better located under /var ??? Maybe the Gentoo Devs might consider this in a future release. 

Oh, and "-O" flag alone, correct me if I am wrong, but on GNU GCC, this should instruct the compiler phases to reduce the cost of compilation (CPU/HDD/SWAP) and ensure that debugging is simplified by the inclusion of certain run-time enablers, such as contexts & frames. I'm not sure this would be the same as "-O[1-5]", personally I use "-Os" to reduce disk usage, load time and swap activity. To mitigate any loss in performance, I installed a dual CPU Mbo!! It seems reasonable.

Thanks for an interesting thread guys  :Smile: 

----------

