# Kernel CFLAG tweaking experiences

## dabooty

i didn't find another thread on this so here goes:

Has anybody tweaked their kernel CFLAGS and what are your findings?

mine are (far from being scientifically tested) that my customized flags are faster, but O3 for some reason fails to boot (immediately after grub the box reboots)

these are my findings from compiling a sloppy kernel configuration and a leaner configuration

```

Kernel Changelog

bzImage.tvrg1: 2.6.0

fairly regular kernel, a bit sloppy

compiled with the default makefile

bzImage.tvrg2: 2.6.0

compiled with custom cflags, same config as the first 2.6 try

added: -O2 -pipe -mfpmath=sse -msse2 -mmmx -ftracer

it got a bit bigger, but lots faster

bzImage.tvrg3: 2.6.0

compiled with custom cflags, leaner config

more things excluded

more things as a module (only if i don't need them often)

still having problems with DMA if i unset PCI-IDE

added: -O2 -pipe -mfpmath=sse -msse2 -mmmx -ftracer

it got lots smaller and it's fast as hell

bzImage.tvrg4: 2.6.0

compiled with custom cflags, this time 03, leaner config

same config as number 3

added: -O3 -pipe -mfpmath=sse -msse2 -mmmx -ftracer

no matter what I try, O3 won't boot, just after grub it reboots.

```

and my cpuinfo (let me know if there is other relevant information so i can edit this post and others can post it too)

```
dabooty@dawikidnezz dabooty $ cat /proc/cpuinfo

processor       : 0

vendor_id       : GenuineIntel

cpu family      : 15

model           : 0

model name      : Intel(R) Pentium(R) 4 CPU 1600MHz

stepping        : 10

cpu MHz         : 1595.582

cache size      : 256 KB

fdiv_bug        : no

hlt_bug         : no

f00f_bug        : no

coma_bug        : no

fpu             : yes

fpu_exception   : yes

cpuid level     : 2

wp              : yes

flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm

bogomips        : 3145.72

```

i would be very interested to see how you guys are doing. 

PS: have a look at this FAQ to see how you can edit your kernel makefile to include your flags

EDIT: fixed bbcode crappyness

----------

## ledskof

Have you checked out CFLAGS Central?

https://forums.gentoo.org/viewtopic.php?t=5717&highlight=cflags+central

----------

## dabooty

sure i have, and i am pretty much convinced about my cflags, but i was wondering about tweaking the kernel makefile with cflags

----------

## dedeaux

Cool.  I will take a look at my kernel config.  I have just run through and compiled everything I need into the kernel.  I have seen no improvement in performance from gs-sources to 2.6-love2.

----------

## dedeaux

oops... I forgot to add this but... you mentions "lots faster".  How are you measuring this?  Feel?

----------

## dabooty

yes  :Smile: 

it's far from scientific, but it feels quite a bit faster.

i would like to see someone make benchmarks though  :Wink: 

----------

## MG-Cloud

Great idea  :Very Happy:  This actually never occurred to me lol.  Did you stick a -march in there as well?  Does it compile with one?

----------

## wrc1944

For my Athlon-xp or duron 1300 systems, I've used these cflags in two places in my kernel Makefile since about 2.5.67, with no problems (also in 2.4 kernels).

-----------------------------------------------------------------------

HOSTCC  	= gcc

HOSTCXX  	= g++

HOSTCFLAGS	= -Wall -Wstrict-prototypes -O2 -march=athlon-xp -falign-functions=16 -falign-loops=16 -falign-jumps=16 -falign-labels=1 -ftracer  -fomit-frame-pointer

HOSTCXXFLAGS	= -O2 -march=athlon-xp -falign-functions=16 -falign-loops=16 -falign-jumps=16 -falign-labels=1 -ftracer 

----------------------------------------------------------------

So I can see the flags during the compiling, under the "Beautify" section, I also backspace out the two lines (make the quite=quiet_ just quiet=, and the quiet=silent_ just quiet=.  The correct part of the original Makefile beautify section is shown below.

ifeq ($(KBUILD_VERBOSE),1)

  quiet =

  Q =

else

  quiet=quiet_     

  Q = @

endif

# If the user is running make -s (silent mode), suppress echoing of

# commands

ifneq ($(findstring s,$(MAKEFLAGS)),)

  quiet=silent_  

endif

---------------------------------------------------------------------------

CFLAGS 		:= -Wall -Wstrict-prototypes -Wno-trigraphs -march=athlon-xp -falign-functions=16 -falign-loops=16 -falign-jumps=16 -falign-labels=1 -ftracer -fno-strict-aliasing -fno-common

AFLAGS

wrc1944

----------

## MG-Cloud

Hmm...

Actually, from what I saw of the gcc output (thanks for the tips re: quiet), this shouldn't have too much of an effect, especially with 2.6 kernels.  -O2 is already part of the makefile, and it uses -mcpu=athlon if you picked athlon in make menuconfig (although I'm athlon-xp  :Wink:  ).  -fomit-frame-pointer is already an option in menuconfig as well.  

I just compiled a kernel with my cflags though (didn't use -O3).  I don't notice much different, but that could jsut be me.

----------

## dabooty

there's no need to stick a -march in there, if you selected the right processor type in menuconfig it will be added

try 

```
make clean && make V=1 && make V=1 modules_install
```

to see the gcc statemens scroll by  

it's cleaner than the quiet hack  :Laughing: 

----------

## dabooty

I'm still not sure if i'm seeing some placebo stuff or if my box is really more responsive, so i tried some things under 2 kernels, A and B, exactly the same config except the C flags which are:

```

A: added: -O2 -pipe -mfpmath=sse -msse2 -mmmx -ftracer (no frame pointer is in config)

B: default (no frame pointer is in config)

```

it's quite hard to measure responsiveness so i'm not really satisfied with the tests is did 

test 1: copy the linux kernel sources from /usr to /home which is from ext2 to reiser, both are on different disks on the same ide controller (same disks), run 3 times

```
dabooty@dawikidnezz kerneltests $ time cp /usr/src/linux-2.6.0-gentoo . -R

kernel A (with flags)

real    0m57.361s

user    0m0.248s

sys     0m4.959s

 

real    0m58.356s

user    0m0.234s

sys     0m5.258s

real    0m58.156s

user    0m0.238s

sys     0m5.258s

kernel B (without flags)

real    0m57.082s

user    0m0.209s

sys     0m4.701s

real    0m58.869s

user    0m0.246s

sys     0m5.973s

real    0m58.282s

user    0m0.221s

sys     0m5.765s
```

which to me is no noticable difference, maybe i should've taken a larger directory

test 2 compile a 2.6 kernel, with the same .config file of course

run 2 times

```
root@dawikidnezz linux # time make

kernel A (with flags)

real    15m6.541s

user    13m18.764s

sys     0m59.251s

real    15m10.981s

user    13m26.314s

sys     1m1.521s

kernel B (without flags)

real    16m4.689s

user    14m12.906s

sys     1m4.560s

real    16m36.235s

user    14m52.747s

sys     1m7.220s
```

kernel A seems about a minute faster

test 3: glxgears

```
dabooty@dawikidnezz kerneltests $ glxgears

kernel A (with flags)

6227 frames in 5.0 seconds = 1245.400 FPS

6447 frames in 5.0 seconds = 1289.400 FPS

6443 frames in 5.0 seconds = 1288.600 FPS

6441 frames in 5.0 seconds = 1288.200 FPS

6441 frames in 5.0 seconds = 1288.200 FPS

6443 frames in 5.0 seconds = 1288.600 FPS

6441 frames in 5.0 seconds = 1288.200 FPS

kernel B (without flags)

5473 frames in 5.0 seconds = 1094.600 FPS

6418 frames in 5.0 seconds = 1283.600 FPS

6420 frames in 5.0 seconds = 1284.000 FPS

6421 frames in 5.0 seconds = 1284.200 FPS

6420 frames in 5.0 seconds = 1284.000 FPS

6417 frames in 5.0 seconds = 1283.400 FPS
```

kernel A seems a wee bit faster again.

All "tests"  :Wink:  where run after a reboot with only a gnome terminal and gedit open.

Am I to conclude some things from this? These tests are so crappy that they probably should just be ignored, but i would love to see someone  perform benchmarks.

----------

## wrc1944

Mg-Cloud & dabooty,

My reasoning for including the -march flag is that I was of the impression that flags in the Makefile would overide the config file. Additionally, as I understand it, if the -mcpu flag is set, that will produce code that is not only for the specified cpu, but also can be run on any cpus below, whereas the -march flag produces code which is only optimized for the specific cpu, and will not work with lessor cpus. Wouldn't this be more desirable if you only were compiling for one specific machine and architecture?

Since I believe the config processor setting K7 implies or sets -mcpu, and there was no athlon-xp option, I added the -march=athlon-xp flag in the Makefile. My post above is a direct copy of my Makefile lines as I edited them. 

If I'm going wrong here with this reasoning, I'd sure like to know so I can correct it.

I'll give the make v=1 a try next compile- that will save having to do my other Makefile edit! Thanks for the tip!

dabooty,

About your tests: If I'm not mistaken, having the drives on the same ide controller will make a difference. In other words, the slave will not be as fast as the master, given they are identical disks. I've read that several times over the years, but don't really know with any certainty if that is correct. They also said that if the slave drive was a slower model, the master would be limited to whatever the slave's capabilities were. Maybe this is no longer true, or never was, but it is something to consider.

Perhaps testing a bunch of kernels and flags on the same master disk will give more meaningful results? Anyway, your results are interesting- guess I'll try a few myself when I have time.

wrc1944

----------

## dabooty

 *wrc1944 wrote:*   

> Mg-Cloud & dabooty,
> 
> My reasoning for including the -march flag is that I was of the impression that flags in the Makefile would overide the config file. Additionally, as I understand it, if the -mcpu flag is set, that will produce code that is not only for the specified cpu, but also can be run on any cpus below, whereas the -march flag produces code which is only optimized for the specific cpu, and will not work with lessor cpus. Wouldn't this be more desirable if you only were compiling for one specific machine and architecture?
> 
> Since I believe the config processor setting K7 implies or sets -mcpu, and there was no athlon-xp option, I added the -march=athlon-xp flag in the Makefile. My post above is a direct copy of my Makefile lines as I edited them. 
> ...

 

you are right to assume that -march should be faster but break compatibility as opposed to -mcpu, and indeed -march is more desirable, but the kernel seems to set -march by itself:

my flags in the makefile:

```
 -Wall -Wstrict-prototypes -O2 -fomit-frame-pointer -pipe -mfpmath=sse -msse2 -mmmx -ftracer
```

what i see scrolling by: 

```

 gcc -Wp,-MD,arch/i386/kernel/.semaphore.o.d -nostdinc -iwithprefix include -D__KERNEL__ -Iinclude  -D__KERNEL__ -Iinclude  -Wall -Wstrict-prototypes -Wno-trigraphs -O2 -fno-strict-aliasing -fno-common -mfpmath=sse -msse2 -mmmx -ftracer -pipe -mpreferred-stack-boundary=2 -march=pentium4 -Iinclude/asm-i386/mach-default -fomit-frame-pointer -Wdeclaration-after-statement    -DKBUILD_BASENAME=semaphore -DKBUILD_MODNAME=semaphore -c -o arch/i386/kernel/semaphore.o arch/i386/kernel/semaphore.c

```

EDIT: i wanted to mark -march bold in above code but apparently bold in code doesn't work

 *wrc1944 wrote:*   

> 
> 
> I'll give the make v=1 a try next compile- that will save having to do my other Makefile edit! Thanks for the tip!
> 
> 

 

it's a capital V  :Wink: 

 *wrc1944 wrote:*   

> 
> 
> dabooty,
> 
> About your tests: If I'm not mistaken, having the drives on the same ide controller will make a difference. In other words, the slave will not be as fast as the master, given they are identical disks. I've read that several times over the years, but don't really know with any certainty if that is correct. They also said that if the slave drive was a slower model, the master would be limited to whatever the slave's capabilities were. Maybe this is no longer true, or never was, but it is something to consider.
> ...

 

you are right that having them on the same controller hurts performance a bit, but i wanted my disks and my cdr to be on different ones. That's the way they are and i didn't change them for either of the tests. The hardware configuration (while maybe not optimal) was identical.

 *wrc1944 wrote:*   

> 
> 
> Perhaps testing a bunch of kernels and flags on the same master disk will give more meaningful results? Anyway, your results are interesting- guess I'll try a few myself when I have time.
> 
> wrc1944

 

By all means do so, or if you can think of some more relevant tests (i thought about hdparm, but i've rebooted enough for today  :Smile: ) i would be very much interested in seeing other peoples findings

----------

## dedeaux

Ok.  I have run through this an seen little to improvement.  Will keep playing though.

I am watching the kids here so... no long periods of time to commit.  Basically, set a compile and come back later.

----------

## sindre

I think processor specific cflags should be placed in arch/i386/Makefile

----------

## Evil Dark Archon

i haven't had any problems putting them in the main Makefile

----------

## wrc1944

I've seen references to both locations, too. I guess we need some more info from some gcc experts, which I certainly am not- I just go on what I've read, and my limited experience in doing this. 

I recall I did try putting my flags in arch/i386/Makefile back when I was trying out 2.5.6x and higher kernels, and ran into compile problems. I'm not sure if it was the kernels themselves, or something I was doing wrong, but I switched to the main Makefile around 2.5.69, and started having successful compiles.

wrc1944

----------

## sindre

If you put -march only in the "root" Makefile, there will be more than one -march in the cflags. How do you know which one gcc will actually use?

----------

## NewBlackDak

If gcc sees ambiguous flags it will use the last one listed.

----------

## GentooBox

Well... 

i just tryed to tweak my kernel.

i edited the makefile and added this:

-funroll-loops -pipe

nothing more. -fomit-frame-pointer was there already.

well, i tryed to benchmark my system with the 2 kernels. - and the tweaked one is faster.

without new flags:

 *Quote:*   

> Setup is 4735 bytes.
> 
> System is 1528 kB
> 
> Kernel: arch/i386/boot/bzImage is ready
> ...

 

with new flags:

 *Quote:*   

> Setup is 4735 bytes.
> 
> System is 1528 kB
> 
> Kernel: arch/i386/boot/bzImage is ready
> ...

 

BUT.. just to make sure it was faster i tested again.

and the second time the new kernel was only 2 seconds faster.

im running with the new kernel right now...

i think that the kernel is optimized as default.

----------

## dedeaux

ok...

Two days later here... I have tried the tweaking of the kernel config, and I am pleased with the performance of 2.6.  the smoke is finally clearing from the move to 2.6 and I am enjoying it so far.  Certain aspects of the desktop experience are feeling natural now where they were a bit sluggish before.  Granted, nothing has provided that lightning fast experience for me yet.  I have modest hardware(geforce 4 440go, p4M 1.8 512M ddr) so it is plenty responsive.

But... all that aside...  I wanted to report my findings:

With the tweking of the kernel with several flags, in my case does not make a HUGE difference in the experience.  Of course, running glxgears or timing certain compiles will show a difference.  I have't seen any stability probs with them in either.  So I say I'll keep it.

Now... I understand that the intel compiler is now able to build the kernel successfully.  I wanna see that with my kernel.

Incidently.... Can anyone confirm that moving from 512 to 1GB of RAM will give me a boost in performance.

Thanks.

----------

## dabooty

 *Quote:*   

> Incidently.... Can anyone confirm that moving from 512 to 1GB of RAM will give me a boost in performance. 

 

more than tweaking your kernel CFLAGS  :Smile: 

----------

## _Nomad_

changed -O2 to -Os and I must say that made quite a differens on my system. No benchmark though, so it might just be my imagination  :Laughing: 

----------

