# Oprimize Kernel for size

## LonelyStar

Hi,

In the kenrel-config (under general), one can select "CONFIG_CC_OPTIMIE_FOR_SIZE".

The help says, that this option will compile the kernel with -Os instead of -O2.

Does that make sense? I am on a desktop machine, so size is not that important, is it?

----------

## eccerr0r

Depends on your use, choose what you want.

Smaller could be better, I suppose the option was put there for embedded systems where there is limited space to store things...

----------

## Cyker

If you can get it small enough to fit in the cpu cache, Os can be faster than O2!  :Very Happy: 

----------

## Odysseus

 *LonelyStar wrote:*   

> Hi,
> 
> In the kenrel-config (under general), one can select "CONFIG_CC_OPTIMIE_FOR_SIZE".
> 
> The help says, that this option will compile the kernel with -Os instead of -O2.
> ...

 

The setting has been there for several kernel incarnations. There is a size vs. speed trade-off depending upon your cpu architecture. If you were running say an original Pentium with 256K cache or a 486 w/128 then the smaller size of the finished kernel and modules using -Os optimizations would be an advantage as more code would be able to reside entirely in cache. This would also hold true with some newer processors like Celerons for example which generally  have less cache then their more expensive Intel brethren.

Most modern processors have over a meg of cache, many now have 2, 3, 4, or more megs. These processors can take advantage of the -O2 optimizations since they aren't negatively effected by the larger binaries and can make use of the increased speed afforded by compiling -O2.

So a general guide could be: for modern systems, if you have 1meg or more of cache per CPU, optimizing -O2 yields a performance boost; if you have less than 1meg per CPU, -Os is more advantageous.

On a side note there also was a time years ago when -O2 optimization just wasn't considered safe for kernel compilation. Now this is no longer an issue. I personally configure my kernel with -02 optimization and have done so for years with only positive results.

I hope this helps.

Ciao

Odysseus

----------

## zyko

CONFIG_CC_OPTIMIE_FOR_SIZE doesn't make any difference at all.

 *Quote:*   

> Most modern processors have over a meg of cache, many now have 2, 3, 4, or more megs. These processors can take advantage of the -O2 optimizations since they aren't negatively effected by the larger binaries and can make use of the increased speed afforded by compiling -O2.

 

Not really. In current versions of gcc, -O2 and -Os are almost identical. There are only 7 (out of more than 140) -f options that differ between -O2 and -Os, and those 7 options don't generally have a significant impact on performance. Basically, -O2 and -Os emit code that is almost identical and shouldn't be expected to differ much in performance or size. There used to be much more of a difference between optimization levels in gcc-3, but times have changed. Also, the kernel doesn't seem contain that much computationally expensive code for gcc to optimize in the first place...

I ran some benchmarks with 2.6.31 to test if CONFIG_CC_OPTIMIZE_FOR_SIZE affects anything at all. I tested dm_crypt performance with aes-xts-plain and serpent-xts-plain, x264 encoding speed and some mysql stress tests. There were no measurable differences, as would have been expected.

----------

## Odysseus

 *zyko wrote:*   

> Not really. In current versions of gcc, -O2 and -Os are almost identical. There are only 7 (out of more than 140) -f options that differ between -O2 and -Os, and those 7 options don't generally have a significant impact on performance. Basically, -O2 and -Os emit code that is almost identical and shouldn't be expected to differ much in performance or size. There used to be much more of a difference between optimization levels in gcc-3, but times have changed. Also, the kernel doesn't seem contain that much computationally expensive code for gcc to optimize in the first place...
> 
> I ran some benchmarks with 2.6.31 to test if CONFIG_CC_OPTIMIZE_FOR_SIZE affects anything at all. I tested dm_crypt performance with aes-xts-plain and serpent-xts-plain, x264 encoding speed and some mysql stress tests. There were no measurable differences, as would have been expected.

 

Before making a blanket statement like -Os makes no difference vs. -O2, what kind of machine are you running the tests on? Was it on a machine with a large amount of cache (1 meg or more)?

If you're testing on a system with a large amount of cache then of course there wouldn't be much of a difference speed-wise as compiling with either setting wouldn't make enough of a change as to affect whether the finished binary could reside entirely in the cache or not (the point I made in my previous post). You also have to take into account what level optimization the rest of the system's binaries were configured as well. Like I said, if you've got an older CPU with little cache then even slight changes to the size of the binary could make a difference in performance, as this could be the difference between a cache hit or miss and the performance penalty that results from this. Especially on older machines with slower system memory and bus speeds.

A fairer test would be to have two machines with the same arches, let's say x86. Setup one machine with an old Pentium, Pentium Pro, Pentium II or Pentium III with 256K cache, the other with a single core Pentium-M with 2megs of cache. Set-up both machines with identical software and running services. Compile both machines entirely with -Os optimizations including kernel configuration and benchmark. Then recompile both machines entirely with -O2 optimizations including kernel configuration and benchmark. Compare the results. Only after making such a direct comparison would it be reasonable to make a conclusion regarding this kernel setting IMHO.

Ciao,

Odysseus

----------

## aCOSwt

 *Cyker wrote:*   

> If you can get it small enough to fit in the cpu cache, Os can be faster than O2! 

 

I am *not* amused !   :Wink: 

My first ATT System V would have happily fit into my today's proc L2 cache !

well... in these times... the only cache processors knew was the instruction prefetch...

Why do we systematically design OSes so that their size necessarily exceeds the cache size ?   :Laughing: 

----------

## eccerr0r

If all your computer is running is kernel code, that's not a very useful computer... 

but seriously you have to test to know whether -Os or -O2 is better for *your* machine.

----------

## energyman76b

 *Odysseus wrote:*   

>  *LonelyStar wrote:*   Hi,
> 
> In the kenrel-config (under general), one can select "CONFIG_CC_OPTIMIE_FOR_SIZE".
> 
> The help says, that this option will compile the kernel with -Os instead of -O2.
> ...

 

wow. No.

First of all, for years O2 was considered the only save setting. Os was not.

Second, it does not depend on cache size.

Third, nothing helps you in case of a cache miss or cold cache.

Fourth, the advantage is less bound to the architecture and more on the type of load.

Which setting is the right one? Only testing can tell. It depends so much on the stuff you do, that there is no definite answer.

----------

## zyko

 *Quote:*   

> Before making a blanket statement like -Os makes no difference vs. -O2, what kind of machine are you running the tests on? Was it on a machine with a large amount of cache (1 meg or more)?
> 
> If you're testing on a system with a large amount of cache then of course there wouldn't be much of a difference speed-wise as compiling with either setting wouldn't make enough of a change as to affect whether the finished binary could reside entirely in the cache or not (the point I made in my previous post).

 

Point well taken. I used a host with 8M of L3 cache to run my tests  :Wink: 

----------

## Odysseus

 *energyman76b wrote:*   

> 
> 
> wow. No.
> 
> First of all, for years O2 was considered the only save setting. Os was not.
> ...

 

-O2 was more safe than -Os? I don't believe this is so (but I could be wrong), -Os has fewer optimizations and is less aggressive than -O2 and less aggressive = safer code. Back when I started using Linux, in the mid to late '90s most distros defaulted to -O0 or -O1 optimization because these were considered the safest settings. As a matter of fact one of the reasons many of us switched to Gentoo was precisely because other distros were being compiled with such modest optimizations. If you look on the GCC website it specifically states that less aggressive optimization = safer and that the "default" optimization is -O0 (no optimization). Here on the Gentoo site for years there have been posted admonitions against using too aggressive optimizations, as they could cause compilation failure or unexpected outcomes.

Back when I was young and stupid and new to Gentoo, I was one of the idiots that used to try compiling with -O3 and very aggressive settings, and when things wouldn't work for me the very first thing that would be asked upstream was "what compiler optimizations did you use?" And of course the the next thing would be "have you tried with lower or no optimization?" Many times the aggressive optimizations we thought were making our systems faster were in reality slowing things down, because the generated binaries were so much larger than our hardware could comfortably handle. As a result, those of us who were/ are gamers noticed that we were getting fewer frames with aggressive optimization than we were with more modest settings.

As far as the architecture comment is concerned, I wasn't referring to ARM vs. IA64 vs. x86 or AMD64. I was speaking strictly about cache size on comparable architectures which was plainly evident in the context of my statement. Architecture is perhaps the wrong wording on my part. I'm not an engineer nor a professional (just a hobbyist), but I recon I've probably been using these contraptions for as long as you've been around. (I got my first Tandy in 1977 and a Commodore when I was 18 in 1980).  So I'm allowing myself a bit of latitude in regards to nomenclature.   :Wink:   :Wink: 

But common sense would suggest that compilation optimization has a dramatic impact on resource starved machines. Larger binaries take-up more hard disk space, more memory space, and despite your assertions otherwise, more cache space. This is why I specifically mentioned in both of my previous posts small cache vs. larger cache scenarios in regard to compilation optimization. 

You are correct that nothing in the world can help if one suffers a cache miss or cold cache scenario, however I stand by my statement, the smaller the compiled binary the more likely there will be a cache hit on cache starved machines, since more can reside directly in cache. It just stands to reason that this would be true.

Look at it this way, if you have a limited amount of cache or main memory for that matter, you can only store so much in that given space. If the CPU asks for something that isn't residing in cache it must then look for it in main memory, and finally if it not there it then must check the swap. The more that can be placed into cache the greater the odds of a cache hit, conversely the less that can be placed into cache the greater the chance of a cache miss. This is just logical thinking as far as I'm concerned.

I hope this clarifies my statements.

Ciao,

Odysseus

----------

## energyman76b

no,

in the mid 90s everybody used O2.

O2 is well tested 'default'. Os was not. It does not matter that Os optimizes less - it was just less tested. You can go to marc and search for yourself, if you want to.

And about 'smaller is better' - Intel's c compiler produces much larger binaries that run faster for miracolous reasons.

The reason is - saved cache space does not help you, if your branch predictor is constantly wrong and you have to flush the pipeline.

----------

