# CFlags for Intel Atom?

## pyther

Hello.

I was wondering if anyone knows what CFlags I might want to use with the Intel Atom Processor. I just purchased the Intel D945GCLF which has an Atom cpu.

Thanks!

EDIT: Fixed Type on ModelLast edited by pyther on Fri Jun 13, 2008 12:05 am; edited 1 time in total

----------

## poly_poly-man

this?

no atom in there.... atom is a processor for handhelds - and AFAIK, it's not available anywhere yet.

What kind of processor do you really have (basically, on a livecd, cat /proc/cpuinfo)?

poly-p man

----------

## pyther

My bad I made a huge type the motherboard is Intel D945GCLF and it does have a atom processor.

----------

## poly_poly-man

can you give the contents of /proc/cpuinfo?

my guess is something like nocona, but I'm not sure what technologies this supports...

poly-p man

----------

## pyther

As I just ordered the hardware today I probably won't have the system until Tuesday or Wednesday of next week so I can't get the contents until then. Sorry

----------

## ergobill

Pyther is correct this is a new product from Intel going after VIA market.  Very Low power mb with 945 chipset in mini-itx form.  Ubuntu (canonical) is working on "netbook" version to take advantage of / optimize for the atom processor.

----------

## Hun73r

Its basic a celeron with HT so the celeron CFlags probaly work with it ant it out performance the c7 from via

----------

## crazycat

Atom has as much in common with "celeron with ht" as a giraffe with a crocodile. The only thing in common is that they are both x86. Atom is in-order cpu. That means compiler optimizations will have much more influence on the speed of execution. Atom's performance is also pretty weak - cflags are  more important for the in-order cpu. Here is a very good article about atom architecture.

----------

## tab21

I have the same board and used the following in my make.conf to install 64 bit Gentoo:

```

CFLAGS="-march=nocona -O2 -pipe -fomit-frame-pointer"

CXXFLAGS="${CFLAGS}"

MAKEOPTS="-j3"

CHOST="x86_64-pc-linux-gnu"

ALSA_CARDS="intel_hda"

INPUT_DEVICES="keyboard mouse"

VIDEO_CARDS="i810"

```

I'm sure you can use -march=prescott and a 32bit stage3 tarball to get up and running as well.

----------

## pyther

tab21: thanks, I'll try those

My /proc/cpuinfo

```

processor   : 0

vendor_id   : GenuineIntel

cpu family   : 6

model      : 28

model name   :          Intel(R) Atom(TM) CPU  230   @ 1.60GHz

stepping   : 2

cpu MHz      : 1596.084

cache size   : 512 KB

physical id   : 0

siblings   : 2

core id      : 0

cpu cores   : 1

fpu      : yes

fpu_exception   : yes

cpuid level   : 10

wp      : yes

flags      : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl tm2 ssse3 cx16 xtpr lahf_lm

bogomips   : 3196.02

clflush size   : 64

cache_alignment   : 64

address sizes   : 32 bits physical, 48 bits virtual

power management:

processor   : 1

vendor_id   : GenuineIntel

cpu family   : 6

model      : 28

model name   :          Intel(R) Atom(TM) CPU  230   @ 1.60GHz

stepping   : 2

cpu MHz      : 1596.084

cache size   : 512 KB

physical id   : 0

siblings   : 2

core id      : 0

cpu cores   : 1

fpu      : yes

fpu_exception   : yes

cpuid level   : 10

wp      : yes

flags      : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl tm2 ssse3 cx16 xtpr lahf_lm

bogomips   : 3193.10

clflush size   : 64

cache_alignment   : 64

address sizes   : 32 bits physical, 48 bits virtual

power management:

```

----------

## wswartzendruber

Intel states that the Atom is ISA compliant with the Merom:

```
CFLAGS="-march=core2 -Os -fomit-frame-pointer -pipe"
```

But that's only if you're using GCC 4.3.

----------

## poly_poly-man

 *wswartzendruber wrote:*   

> Intel states that the Atom is ISA compliant with the Merom:
> 
> ```
> CFLAGS="-march=core2 -Os -fomit-frame-pointer -pipe"
> ```
> ...

 

correct....

however, if:

gcc < 4.2.0, 32-bit: prescott

gcc < 4.2.0, 64-bit: nocona

gcc >= 4.2.0: native

no kidding, I just found out about march=native....I'm switching to that now  :Very Happy: 

poly-p man

----------

## kernelOfTruth

 *poly_poly-man wrote:*   

>  *wswartzendruber wrote:*   Intel states that the Atom is ISA compliant with the Merom:
> 
> ```
> CFLAGS="-march=core2 -Os -fomit-frame-pointer -pipe"
> ```
> ...

 

++

use -march=native, don't use -march=core2 since it breaks several apps like xorg-server, ... (at least for me + hardened)

----------

## wswartzendruber

 *kernelOfTruth wrote:*   

>  *poly_poly-man wrote:*    *wswartzendruber wrote:*   Intel states that the Atom is ISA compliant with the Merom:
> 
> ```
> CFLAGS="-march=core2 -Os -fomit-frame-pointer -pipe"
> ```
> ...

 

Is not march=native just going to determine that you're running a Core 2?  And I've never had any problems with march=core2, but I'm not running hardened.

----------

## Jinidog

Well, -march=core2 should work, but is there any evidence that it is the fastest option?

The atom is a in-order CPU. Every CPU since the Pentium-Pro was a Out-Of-Order CPU. 

Somebody could do some benchmarks. I would try -march=i586 together mit the -msse -msse2 -msse3 flags.

----------

## m0w1337

Hi!

I just bought the same mainboard with the Atom on it and wondered, if you already found the right settings for this CPU to run good.

I would be very happy about an answer, because this is going to be my first attempt to install a gentoo linux, so i need every hint i can get  :Wink: 

Thanks

m0

----------

## Cyker

At the moment, there is no optimisation support for the Atom.

It's a bit tricky, but I'd guess that Pentium MMX with all the SSE flags would be the most efficient setup at the moment...

----------

## Martux

Pyther, how did this work for you? Everything up and running? Are you satisfied?

Which cflags did you use? I' ll get my Intel D945GCLF on tuesday and am looking for some practical tipps/experiences.

If i get it right it would be possible to use amd64 with this machine?

Also -march=native (GCC-4.3) works great here  with a Core2Quad Q9450, i would really like to use it with the atom too. Which brings me to my main question:

I' d like to precompile the system with my Quad for obvious reasons. What exactly would happen with -march=native? It would be optimised for the Quad then, right?

Is precompiling for another machine even possible with that flag?

----------

## SDNick484

 *Martux wrote:*   

> Pyther, how did this work for you? Everything up and running? Are you satisfied?
> 
> Which cflags did you use? I' ll get my Intel D945GCLF on tuesday and am looking for some practical tipps/experiences.
> 
> If i get it right it would be possible to use amd64 with this machine?
> ...

 

Wow, oddly enough I'm in the exact same boat.  I just picked up an Intel C2Q 9300 from Fry's over the weekend ($180, sweet!), and my Acer One with an Atom N270 arrived yesterday.  I got rid of it's base OS (Linpus, a Fedora 8 off shoot), and through on Xubuntu, but I'm very tempted to go with Gentoo (which I plan to run on the quad & already run on my gracefully aging laptop).

Anyways, if I get a chance to try it, I'll post back.

EDIT:

Went to look at the online documentation for GCC and noticed:

<<<<<

native  -  This selects the CPU to tune for at compilation time by determining the processor type of the compiling machine. Using -mtune=native will produce code optimized for the local machine under the constraints of the selected instruction set. Using -march=native will enable all instruction subsets supported by the local machine (hence the result might not run on different machines). 

...

prescott  -  Improved version of Intel Pentium4 CPU with MMX, SSE, SSE2 and SSE3 instruction set support. 

...

core2  -  Intel Core2 CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3 and SSSE3 instruction set support. 

>>>>>

I will probably give native a try, but my hopes are lower.  It looks like I may need prescott for builds to work on both systems.

----------

## d2_racing

 *poly_poly-man wrote:*   

> 
> 
> however, if:
> 
> gcc < 4.2.0, 32-bit: prescott
> ...

 

Hi Poly, if I use native with my T7200 

```

gentootux ~ # cat /proc/cpuinfo

processor       : 0

vendor_id       : GenuineIntel

cpu family      : 6

model           : 15

model name      : Intel(R) Core(TM)2 CPU         T7200  @ 2.00GHz

stepping        : 6

cpu MHz         : 2000.000

cache size      : 4096 KB

physical id     : 0

siblings        : 2

core id         : 0

cpu cores       : 2

apicid          : 0

initial apicid  : 0

fdiv_bug        : no

hlt_bug         : no

f00f_bug        : no

coma_bug        : no

fpu             : yes

fpu_exception   : yes

cpuid level     : 10

wp              : yes

flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm

bogomips        : 3993.26

clflush size    : 64

power management:

processor       : 1

vendor_id       : GenuineIntel

cpu family      : 6

model           : 15

model name      : Intel(R) Core(TM)2 CPU         T7200  @ 2.00GHz

stepping        : 6

cpu MHz         : 2000.000

cache size      : 4096 KB

physical id     : 0

siblings        : 2

core id         : 1

cpu cores       : 2

apicid          : 1

initial apicid  : 1

fdiv_bug        : no

hlt_bug         : no

f00f_bug        : no

coma_bug        : no

fpu             : yes

fpu_exception   : yes

cpuid level     : 10

wp              : yes

flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm

bogomips        : 3988.61

clflush size    : 64

power management:

gentootux ~ #                          

```

I will endup with a 32 bits or a 64 bits installation ?

----------

## Etal

 *d2_racing wrote:*   

> I will endup with a 32 bits or a 64 bits installation ?

 

It depends on your chost, not your cflags.

----------

## d2_racing

 *AM088 wrote:*   

>  *d2_racing wrote:*   I will endup with a 32 bits or a 64 bits installation ? 
> 
> It depends on your chost, not your cflags.

 

Good,that's what I tought  :Razz: 

----------

## Martux

SDNick484, thanks for your answer.

In the meantime, i' ve done the following:

I just copied over my "-march=native" compiled system from the C2Q, 64Bit, with adjusted kernel of course (hyperthreading and the likes) and it booted just fine.

Then i thought the atom would be powerful enough to re-compile itself using "native".

GCC alone took over 8h compile time...  :Rolling Eyes:  (that's about the time the C2Q take compiling the whole system) So for now i just use it that way and cannot see any performance issues. It does what it should do: play music and videos and serve the net.

No need for more tuning. The only thing is i can't get GLX-extensions to work with the xf86-video-i810 driver whatsoever, but that will soon be another thread.

Thanks again for bringing light into the "mysterious" march=native thing.

----------

## d2_racing

 *Martux wrote:*   

> GCC alone took over 8h compile time...  (that's about the time the C2Q take compiling the whole system) So for now i just use it that way and cannot see any performance issues. It e thing.

 

It's insane, 8h, maybe GCC is not optimised for this CPU.

----------

## int2str

 *Martux wrote:*   

> GCC alone took over 8h compile time... 

 

That's not right at all...

I just did it again to see how mine compares. Command was "time emerge gcc". Result:

 * Messages for package sys-devel/gcc-4.1.2:

...

real    82m24.540s

user    75m56.971s

sys     6m51.811s

1h, 20mins... that's a long shot from your 8 hours.

----------

## d2_racing

 *int2str wrote:*   

>  *Martux wrote:*   GCC alone took over 8h compile time...  
> 
> That's not right at all...
> 
> 

 

Hi, can you post your lspci plz.

Also, did you run hdparm on your HDD ?

----------

## SDNick484

Well I just finished getting my C2Q up and running (I'm using an Intel mobo with the new G45 chipset and it was a pain -- networking doesn't work in the 2008.0-r1 kernel), and am using the native flag.  I've done some minimal testing and no issues thus far on running apps compiled on the C2Q with the atom.

Below is how the flags from the Atom compare to the C2Q:

Atom: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx constant_tsc arch_perfmon pebs bts pni monitor ds_cpl est tm2 ssse3 xtpr lahf_lm

C2Q: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 lahf_lm

diff: pse36 lm vmx smx cx16 sse4_1

As you can see, all the additional flags are on the C2Q, and they're all relative new.  Since I don't believe GCC has any optimizations yet that utilize though flags, native will likely yield close results.  

I'm heading camping tomorrow, but will likely make the swap from Xubuntu to Gentoo on my Acer One next week.  I don't really plan to compile anything there unless it's a last resort.

----------

## int2str

Check out this difference:

Same command, two different machines.

Command is "emerge -s mozilla-firefox".

Intel Core2 Duo E6850 @ 3Ghz; 4GB RAM; SATA hard drive

```
office ~ # time emerge -s mozilla-firefox

real    0m15.710s

user    0m0.566s

sys     0m0.397s
```

Intel Atom @ 1.6Ghz; 2GB RAM; Compact flash card

```
terra ~ # time emerge -s mozilla-firefox

real    0m2.488s

user    0m2.306s

sys     0m0.155s
```

Pretty sweet  :Smile: 

----------

## int2str

To explain my setup a bit better:

I have the MSI Wind barebones. The main "hard drive" (SDA) is a 16GB compact flash card that hosts the main operating system and is what I'm booting off of. The secondary hard drive (SDB) is a 160GB SATA hard drive used for data storage. I mount some of the temp directories etc. on the hard drive, not to wear out the CF card write cycles while compiling Gentoo.

Here is part of my /etc/fstab:

```
/dev/sda1               /               ext2            noatime         0 1

/dev/sdb1               /home/shared    ext3            noatime         0 0

/home/shared/Linux/Terra/log            /var/log        none    bind            0 0

/home/shared/Linux/Terra/portage        /var/tmp/portage        none    bind    0 0

/home/shared/Linux/Terra/distfiles      /usr/portage/distfiles  none    bind    0 0

tmpfs                   /tmp    tmpfs   noatime         0 0
```

Some information that was requested above:

My CFLAGS are simple:

```
CFLAGS="-march=prescott -O2 -pipe -fomit-frame-pointer"
```

```
terra ~ # hdparm -i /dev/sda /dev/sdb

/dev/sda:

 Model=                                        , FwRev=20071116, SerialNo=CF CARD     0001562C

 Config={ HardSect NotMFM Fixed DTR>10Mbs }

 RawCHS=16383/15/63, TrkSize=0, SectSize=576, ECCbytes=4

 BuffType=DualPort, BuffSize=1kB, MaxMultSect=1, MultSect=?0?

 CurCHS=16383/15/63, CurSects=15481935, LBA=yes, LBAsects=31719424

 IORDY=no, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}

 PIO modes:  pio0 pio1 pio2 pio3 pio4

 DMA modes:  *mdma0 mdma1 mdma2

 AdvancedPM=no

 * signifies the current active mode

/dev/sdb:

 Model=WDC WD1600AAJS-00B4A0                   , FwRev=01.03A01, SerialNo=     WD-WMAT20947254

 Config={ HardSect NotMFM HdSw>15uSec SpinMotCtl Fixed DTR>5Mbs FmtGapReq }

 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=50

 BuffType=unknown, BuffSize=8192kB, MaxMultSect=16, MultSect=?16?

 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=312581808

 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}

 PIO modes:  pio0 pio3 pio4

 DMA modes:  mdma0 mdma1 mdma2

 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6

 AdvancedPM=no WriteCache=enabled

 Drive conforms to: Unspecified:  ATA/ATAPI-1,2,3,4,5,6,7

 * signifies the current active mode
```

```
terra ~ # hdparm -tT /dev/sda /dev/sdb

/dev/sda:

 Timing cached reads:   1138 MB in  2.00 seconds = 568.71 MB/sec

 Timing buffered disk reads:   44 MB in  3.12 seconds =  14.10 MB/sec

/dev/sdb:

 Timing cached reads:   1206 MB in  2.00 seconds = 603.21 MB/sec

 Timing buffered disk reads:  332 MB in  3.00 seconds = 110.62 MB/sec
```

```
00:00.0 Host bridge: Intel Corporation 82945G/GZ/P/PL Memory Controller Hub (rev 02)

00:02.0 VGA compatible controller: Intel Corporation 82945G/GZ Integrated Graphics Controller (rev 02)

00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 (rev 01)

00:1c.1 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 2 (rev 01)

00:1c.2 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 3 (rev 01)

00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 4 (rev 01)

00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #1 (rev 01)

00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #2 (rev 01)

00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #3 (rev 01)

00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #4 (rev 01)

00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI Controller (rev 01)

00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1)

00:1f.0 ISA bridge: Intel Corporation 82801GB/GR (ICH7 Family) LPC Interface Bridge (rev 01)

00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller (rev 01)

00:1f.2 IDE interface: Intel Corporation 82801GB/GR/GH (ICH7 Family) SATA IDE Controller (rev 01)

00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01)

01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)
```

----------

## kernelOfTruth

 *SDNick484 wrote:*   

> ...
> 
> diff: pse36 lm vmx smx cx16 sse4_1
> 
> As you can see, all the additional flags are on the C2Q, and they're all relative new.  Since I don't believe GCC has any optimizations yet that utilize though flags, native will likely yield close results.  
> ...

 

 *Quote:*   

> # Support for SSE4.1 built-in functions and code generation are available via -msse4.1.
> 
> # Support for SSE4.2 built-in functions and code generation are available via -msse4.2.
> 
> # Both SSE4.1 and SSE4.2 support can be enabled via -msse4.

 

http://www.gnu.org/software/gcc/gcc-4.3/changes.html  :Smile: 

int2str, you tried out reiser4 with cryptcompress/lzo or just standard reiser4 ?

I believe ext3 might wear out your flash-drive pretty fast 

for the record:

time emerge -s mozilla-firefox

 *Quote:*   

> real    0m6.758s
> 
> user    0m1.965s
> 
> sys     0m0.338s

 

----------

## int2str

[quote="kernelOfTruth"] *SDNick484 wrote:*   

> I believe ext3 might wear out your flash-drive pretty fast 

 

I'm using ext2 on the flash card  :Smile: 

----------

## Etal

 *int2str wrote:*   

>  *kernelOfTruth wrote:*   I believe ext3 might wear out your flash-drive pretty fast  
> 
> I'm using ext2 on the flash card 

 

I think it would be a better idea to use use the specialized JFFS2 or LogFS if you're worried about wearing out the drive.

----------

## kernelOfTruth

 *AM088 wrote:*   

>  *int2str wrote:*    *kernelOfTruth wrote:*   I believe ext3 might wear out your flash-drive pretty fast  
> 
> I'm using ext2 on the flash card  
> 
> I think it would be a better idea to use use the specialized JFFS2 or LogFS if you're worried about wearing out the drive.

 

don't forget nilfs2 but we're getting a little off-topic   :Wink: 

----------

## Martux

I found out why compiles take forever here: I've got a problem with acpi, it takes up to 50% of my cpu time   :Sad: 

----------

## joaopft

Atom switches off units needed for 64 bit operation when working in 32 bit mode, so if battery time is a concern, it may be better to stick to a build for the x86 arch. Also, memory access is limited by 533MHz FSB, so this low bandwidth does not benefit the amd64 arch. 

Edit: it seems the N270 Atom, the one used in laptops, cannot run x86_64 code, anyway...

-O3 sets unroll loops, which will produce code with lengthy, non-looped, instruction sequences. We must consider that an in-order core heavily relies on its L1 and L2 caches to avoid stalled pipelines.

----------

## joaopft

From Intel's site http://www.intel.com/cd/software/products/asmo-na/eng/386925.htm, their compiler optimizes code very aggressively by doing out-of-order scheduling of instructions by software (by the compiler), instead of hardware:

 *Quote:*   

> The new low-power IA does not provide an integrated out-of-order scheduler that schedules instruction dispatch into the execution pipeline to take optimum advantage of the architecture and minimize dependency stalls. The Intel® C++ Compiler models the low-power IA pipeline and execution flow, thus enabling it to produce code with the optimum instruction execution sequence for low-power IA.

 

As of yet, gcc can't optimize code in this fashion, and it is clear it will be very difficult to do this in an arch independent fashion. 

Replacing a just-in-time out-of-order hardware scheduler with an arch indep. out-of-order compiler is no simple task. For one thing, the compiler does not have the live data flow to assist it. 

Intel's compiler code only works in their particular hardware alone: "modeling the low-power IA pipeline and exec. flow, thus enabling it to produce code with the optimum instruction execution sequence". And they are talking here about modeling a CISC x86 exec flow!

----------

## ocbMaurice

Hi all,

Just wanted to mention this thread:

https://forums.gentoo.org/viewtopic-p-5274124.html#5274124

There seems to be a serious issue on those D945GCLF/D945GCLF2 boards.

You should probably not enable automatic fan throtteling (kacpid will hog your cpu).

I'm currently using:

D945GCLF2

RTL 8139 PCI Card 

Gentoo Hardened 2008.0 amd64 release

Kernel linux-2.6.25-hardened-r9

GCC x86_64-pc-linux-gnu-3.4.6

```
CFLAGS="-march=nocona -O2 -pipe -fomit-frame-pointer"

CXXFLAGS="${CFLAGS}"

MAKEOPTS="-j3"

CHOST="x86_64-pc-linux-gnu"

ALSA_CARDS="intel_hda"

INPUT_DEVICES="keyboard mouse"

VIDEO_CARDS="i810"
```

I'm still trying out this baby, so I'm still interested if there are better cflags.

It may will replace my home server when I have the time on xmas  :Wink: 

Greetings, Maurice

----------

## nerdbert

Has someone made benchmarks in the meantime?

Right now I'm using the following for the N270

```

CFLAGS="-march=prescott -O2 -pipe -fomit-frame-pointer"

CXXFLAGS="${CFLAGS}"

CHOST="i686-pc-linux-gnu"

MAKEOPTS="-j3"  #to make use of HT!?

```

I'm thinking about switching to -march=i586 -mmmx -sse -sse2 -sse3 ...

I'd recompile gcc and check compile times for a specific package before and after. Or is there a better/quicker way to find out which is faster?

----------

## brian33x51

The N330 ends up being a different processor...there's no reason to not go 64bit...

CFLAGS="-Os -march=native -fomit-frame-pointer -pipe"

ACCEPT_KEYWORDS="~amd64"

MAKEOPTS="-j5"

before that I was using -march=core2

Based on my benchmarks

multi thread compiles (j5)about 1/2 the speed of an athlon64 x2 4200+ (j3)

single thread triangulation (10240 vertices):

- n330 10.88s

- athlon64 x2 4200+: 4.12s

- clovertown 1.6: 2.48s

----------

## nerdbert

Yes, they belong to the same family (Diamondville), but the 230 and 330 come with Intel 64. The N270 on the other hand supports enhanced speedstep.

Meanwhile I've done some very basic tests for the N270 with nbench, which seems to run faster with march=prescott -O2 -fomit-frame-pointer than -march=i586 -O2 -mmmx -msse -msse2 -msse3 -fomit-frame-pointer

I've posted the results here, if anyone is interested.

----------

## dakster

I've got the 330 atom processor as well, and am considering upgrading to gcc 4.3 and recompiling my whole system with the march=native (currently with 32bit march prescott). However, I noticed somehow I've ended up with 486 in my chost, and not 686 (doh!). Looks like a pain in the arse to fix according to the chost changing guide. Is it even worth changing the system to 686, or will upgrading the compiler and flipping to march-native and rebuilding the system be the biggest performance gain?

----------

## gcasillo

I recently bought an Asus 1000HE netbook. It comes with an Atom N280 processor. I'm trying to determine what the appropriate CHOST, CFLAGS, etc. are for this. Also, I am strongly considering using a couple machines on my network (C2D E6300 and C2Q Q6600) to assist with large compiles using distcc and crossdev. Has anyone done this in a similar manner? What target would I choose for building my crossdev tools? i586, i686, or something else?

----------

## rufnut

 *gcasillo wrote:*   

> Has anyone done this in a similar manner? What target would I choose for building my crossdev tools? i586, i686, or something else?

 

I do similar to what you are suggesting.

i686 crossdev setup is fine as my q6600 is running x86_64

If your q6600 and the other are just x86 (i686) you wont need crossdev just setup distcc as you need.

One thing to be careful though is that the use of "native" as a cpuflag did conflict when using distcc, maybe they could have fixed it now ?

Good Luck.

----------

## gringo

 *Quote:*   

> One thing to be careful though is that the use of "native" as a cpuflag did conflict when using distcc, maybe they could have fixed it now ?

 

if you have -march=native, distcc won´t work ( and it shouldn´t work, because using distcc and -march=native just doesn´t make any sense at all).

Or just setup a chroot in your quad system and build all your stuff there and setup a binhost. Only if you are running a multilib system of course.

I used -march=core2 -mtune=generic on my eeepc and it went fine afaict. 

The gcc guys are going to provide a dedicated atom target, maybe we are lucky and it will be ready for gcc-4.5.

cheers

----------

## WuDDjA

I am using gcc-4.5 from toolchain overlay with -march=atom

and it works perfect on my system.

but its a bit tricky to compile the gcc 4.5

take a look at my bug report

https://bugs.gentoo.org/show_bug.cgi?id=270558

you also should enable the atom processor in the kernel

with a little patch from here:

http://patchwork.kernel.org/patch/23085/

I'm already on testing but will report later if it works.

emerge -e world

takes a long time on intel atom N270  :Very Happy: 

----------

## rufnut

 *gringo wrote:*   

> 
> 
> if you have -march=native, distcc won´t work ( and it shouldn´t work, because using distcc and -march=native just doesn´t make any sense at all).
> 
> 

 

I don't see why not,  it works for gcc so should for distcc ?

https://bugs.gentoo.org/223159

A little work may be needed on distcc to test a few things that's all.

 :Smile: 

----------

## gringo

 *Quote:*   

> I don't see why not, it works for gcc so should for distcc ?
> 
> https://bugs.gentoo.org/223159
> 
> A little work may be needed on distcc to test a few things that's all. 

 

i´m not sure i get what you mean, that bug explicitly states that distcc will be disabled if -march=native is used, and that´s how it should work IMO.

 *Quote:*   

> I am using gcc-4.5 from toolchain overlay with -march=atom
> 
> and it works perfect on my system. 

 

nice !

 *Quote:*   

> takes a long time on intel atom N270

 

sure it will  :Smile: 

thanks for the info WuDDjA, i wasn´t aware the atom stuff was already merged !

cheers

----------

## gcasillo

Just wanted to report my success with a Gentoo install on my new Asus Eee 1000HE netbook. Here's my emerge info for reference:

```
Portage 2.2_rc33 (!/usr/portage/profiles/default/linux/x86/2008.0/desktop, gcc-4.3.3, glibc-2.10.1-r0, 2.6.28-gentoo-r5 i686)

=================================================================                                                            

System uname: Linux-2.6.28-gentoo-r5-i686-Intel-R-_Atom-TM-_CPU_N280_@_1.66GHz-with-gentoo-2.0.1                             

Timestamp of tree: Tue, 26 May 2009 00:15:01 +0000                                                                           

distcc 3.1 i686-pc-linux-gnu [enabled]                                                                                       

ccache version 2.4 [enabled]                                                                                                 

app-shells/bash:     4.0_p24                                                                                                 

dev-lang/python:     2.5.4-r2, 2.6.2                                                                                         

dev-util/ccache:     2.4-r8                                                                                                  

dev-util/cmake:      2.6.4                                                                                                   

sys-apps/baselayout: 2.0.1                                                                                                   

sys-apps/openrc:     0.4.3-r2                                                                                                

sys-apps/sandbox:    1.9                                                                                                     

sys-devel/autoconf:  2.63-r1                                                                                                 

sys-devel/automake:  1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10.2                                                                    

sys-devel/binutils:  2.19.1-r1                                                                                               

sys-devel/gcc-config: 1.4.1                                                                                                  

sys-devel/libtool:   2.2.6a                                                                                                  

virtual/os-headers:  2.6.29                                                                                                  

ACCEPT_KEYWORDS="x86 ~x86"                                                                                                   

CBUILD="i686-pc-linux-gnu"                                                                                                   

CFLAGS="-march=prescott -O2 -pipe -fomit-frame-pointer"                                                                      

CHOST="i686-pc-linux-gnu"                                                                                                    

CONFIG_PROTECT="/etc /usr/kde/4.2/env /usr/kde/4.2/share/config /usr/kde/4.2/shutdown /usr/share/config"                     

CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/eselect/postgresql /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo /etc/udev/rules.d"                                                

CXXFLAGS="-march=prescott -O2 -pipe -fomit-frame-pointer"                                                                    

DISTDIR="/usr/portage/distfiles"                                                                                             

FEATURES="ccache distcc distlocks fixpackages parallel-fetch preserve-libs protect-owned sandbox sfperms strict unmerge-orphans userfetch"                                                                                                                

GENTOO_MIRRORS="ftp://ftp.ussg.iu.edu/pub/linux/gentoo ftp://ftp.gtlib.gatech.edu/pub/gentoo http://www.gtlib.gatech.edu/pub/gentoo http://gentoo.mirrors.pair.com/ ftp://gentoo.mirrors.pair.com/ http://gentoo.mirrors.tds.net/gentoo ftp://gentoo.mirrors.tds.net/gentoo http://mirror.datapipe.net/gentoo ftp://mirror.datapipe.net/gentoo"

LDFLAGS="-Wl,-O1"

MAKEOPTS="-j16"

PKGDIR="/mnt/nfs_portage/distfiles"

PORTAGE_COMPRESS="gzip"

PORTAGE_COMPRESS_FLAGS="-9"

PORTAGE_CONFIGROOT="/"

PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"

PORTAGE_TMPDIR="/var/tmp"

PORTDIR="/mnt/nfs_portage"

SYNC="rsync://rsync.namerica.gentoo.org/gentoo-portage"

USE="X a52 aac acl acpi alsa amarok asf audiofile avi bash-completion berkdb bluetooth branding bzip2 cdr cli cracklib crypt ctype cups dbus dirac dlloader dri dts encode faac faad fam ffmpeg firefox flac foomaticdb frontendonly ftp gdbm gif hal imap imlib innodb ip4100 isdnlog joystick jpeg kde kdehiddenvisibility kdeprefix lcms libg++ libwww live lm_sensors logrotate mad maildir matroska midi mikmod mime mmx mmxext mng mp3 mpeg mudflap mysql ncurses nls nptl nptlonly nsplugin ogg oggvorbis openal opengl openmp pcre pdf perl plasma png postgres ppds python qt3support qt4 quicktime readline reflection samba sasl scanner sdl semantic-desktop session smp speex spell spl sse sse2 ssl ssse3 startup-notification subversion svg sysfs syslog taglib tcpd theora tiff transcode truetype unicode usb vhosts vorbis webkit win32codecs x264 x86 xcomposite xine xml xml2 xorg xulrunner xv xvid zlib" ALSA_CARDS="hda-intel" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CAMERAS="ptp2" ELIBC="glibc" INPUT_DEVICES="evdev keyboard mouse synaptics" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" USERLAND="GNU" VIDEO_CARDS="intel"

Unset:  CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, FFLAGS, INSTALL_MASK, LANG, LC_ALL, LINGUAS, PORTAGE_RSYNC_EXTRA_OPTS, PORTDIR_OVERLAY
```

As you can see, I'm using gcc-4.3.3 and glibc-2.10.1. My CFLAGS are per the recommendation in the safe cflags page for an Atom N270 (almost the same CPU). For my kernel build, I selected the Core2 processor family. Seems to be okay, though that was unintuitive and required some googling.

Believe it or not, I'm running KDE-4.2.3 on this little puppy, and so far, I'm happy. Very usable, though I have to acclimate to KDE4 since I've been a stubborn, heavy user of KDE-3.5 until now.

----------

## pigeon768

 *nerdbert wrote:*   

> than -march=i586 -O2 -mmmx -msse -msse2 -msse3 -fomit-frame-pointer

  The -msse and -msse2 are redundant. You only need to include the highest level of sse you have - gcc will fill in the rest. Run 'gcc -c -Q -O3 -msse3 --help=target' and compare to 'gcc -c -Q -O3 -msse -msse2 -msse3 --help=target'.

For those (including myself) using -march=prescott: You probably want to include "-mssse3". (three s's) The Atom supports ssse3, while the prescott did not.

----------

## gringo

um, does anyone here know if there is a backport of the atom specific bits for gcc-4.4 ?? 

edit : just found this - > https://bugs.gentoo.org/262603

thanks !

----------

## rufnut

 *gringo wrote:*   

> 
> 
> i´m not sure i get what you mean, that bug explicitly states that distcc will be disabled if -march=native is used, and that´s how it should work IMO.
> 
> 

 

My sentiments are pretty much the same as this guy here in his last paragraph:

https://bugs.launchpad.net/distcc/+bug/188813

When you look at the bug report:

https://bugs.gentoo.org/223159

Looks like they tried to implement the patch and due to problems it may have been pulled out in later versions of distcc,  or maybe it was just modified to fail if "march=native" is detected.

----------

## gringo

 *Quote:*   

> My sentiments are pretty much the same as this guy here in his last paragraph: 

 

don´t know exactly what you mean, the guy in that bugs explains pretty well the problem and a workaround is available.

I you want to disable distcc for a few packages "manually", there are a few bash hacks available.

 *Quote:*   

> Looks like they tried to implement the patch and due to problems it may have been pulled out in later versions of distcc, or maybe it was just modified to fail if "march=native" is detected.

 

don´t know if sth. has changed in the latest version of distcc, i use distcc quite a lot and last time i tried -march=native with distcc ( which was with the first distcc-3.x release) all jobs were processed locally, which is how it should work IMO. 

it´s quite easy to test if this is still the case, right ?  :Wink: 

cheers

----------

## rufnut

From :

https://bugs.launchpad.net/distcc/+bug/188813

 *Quote:*   

> or
> 
> (preferably?) rewrite them to read -march=arch-of-the-build-machine so the
> 
> target architecture is the same on all build nodes

 

I reckon this should be the way it is done. 

It creates a bit of work for distcc, as if the "arch" is unknown to say stable gcc and/or distcc (-march=atom) 

then maybe it could drop to prescott or whatever the concensus is until gcc 4.5.x is stable.

I am not real keen upgrading all nodes to gcc 4.5.x is the reason for the above statement.

There is nothing stopping me manually setting eg (-march=prescott) for a machine but I would have preferred some automation, which I guess is the reason (-march=native) was introduced.

 :Smile: 

----------

## gringo

 *Quote:*   

> or (preferably?) rewrite them to read -march=arch-of-the-build-machine so the
> 
> target architecture is the same on all build nodes

 

do you really want an app like distcc to rewrite your -march setting ? Why don´t you set the correct one in first place ?

And how is that supposed to work if you are crosscompiling f.ex. ?

That doesn´t make any sense to me and in any case i don´t think rewriting compiler parameters is distcc´s job.

That said, i would like to have a better workaround too and just set -march=native everywhere, but it really isn´t that easy.

cheers  :Smile: 

----------

## Rony

GCC-optimized: adding the suggested GCC compiler flags for Intel® Atom™

```
-Wall -O1 -msse3 -march=core2 -mfpmath=sse -pedantic -pipe -fstrength-reduce -fexpensive-optimizations -finline-functions -funroll-loops -foptimize-register-move
```

I am testing with the above on with Intel's D945GCLF2D (Atom 330).

Regards.

----------

## gringo

if that numbers are correct, that isn´t that bad i would say, i was expecting way more difference between icc and gcc. 

Would be great to see the same benchmark with the new atom target included. 

I found some time ago a discussion about what would be the best options for gcc when building for an atom and Arjan van de Ven ( intel kernel hacker)  suggested -march=core2 -mtune=generic. Note that this was before the atom target was even in development IIRC.

http://lkml.indiana.edu/hypermail/linux/kernel/0810.1/2015.html

cheers guys

----------

## rufnut

 *gringo wrote:*   

>  *Quote:*   or (preferably?) rewrite them to read -march=arch-of-the-build-machine so the
> 
> target architecture is the same on all build nodes 
> 
> do you really want an app like distcc to rewrite your -march setting ? Why don´t you set the correct one in first place ?
> ...

 

He does say "read -march=arch-of-the-build-machine" not rewrite ?

 :Confused: 

----------

## gringo

 *Quote:*   

> He does say "read -march=arch-of-the-build-machine" not rewrite ? 

 

no, he says "rewrite them to read". 

English is not my main language but i get that as "rewriting".

and this starts to be a bit pointless and complete OT.

cheers

----------

## rufnut

 *gringo wrote:*   

>  *Quote:*   He does say "read -march=arch-of-the-build-machine" not rewrite ?  
> 
> no, he says "rewrite them to read". 
> 
> English is not my main language but i get that as "rewriting".
> ...

 

He is talking about rewriting distcc.

 :Smile: 

----------

## Mr_Maniac

 *Rony wrote:*   

> GCC-optimized: adding the suggested GCC compiler flags for Intel® Atom™
> 
> ```
> -Wall -O1 -msse3 -march=core2 -mfpmath=sse -pedantic -pipe -fstrength-reduce -fexpensive-optimizations -finline-functions -funroll-loops -foptimize-register-move
> ```
> ...

 

I have a Intel D945GCLF2, too. System compiled with

```
CFLAGS="-march=nocona -O2 -pipe"
```

GCC: gcc (Gentoo 4.3.3-r2 p1.2, pie-10.1.5) 4.3.3

GLIBC: glibc-2.10.1-r0

Kernel: 2.6.29-r5 - CONFIG_MCORE2=y

64bit-System

With the CFLAGS mentioned by you I get following results:

```

BYTEmark* Native Mode Benchmark ver. 2 (10/95)

Index-split by Andrew D. Balsa (11/97)

Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST                : Iterations/sec.  : Old Index   : New Index

                    :                  : Pentium 90* : AMD K6/233*

--------------------:------------------:-------------:------------

NUMERIC SORT        :          193.76  :       4.97  :       1.63

STRING SORT         :          44.542  :      19.90  :       3.08

BITFIELD            :      5.6065e+07  :       9.62  :       2.01

FP EMULATION        :          17.793  :       8.54  :       1.97

FOURIER             :            6628  :       7.54  :       4.23

ASSIGNMENT          :           2.757  :      10.49  :       2.72

IDEA                :           739.7  :      11.31  :       3.36

HUFFMAN             :          354.47  :       9.83  :       3.14

NEURAL NET          :          1.9554  :       3.14  :       1.32

LU DECOMPOSITION    :          62.884  :       3.26  :       2.35

==========================ORIGINAL BYTEMARK RESULTS==========================

INTEGER INDEX       : 9.923

FLOATING-POINT INDEX: 4.257

Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0

==============================LINUX DATA BELOW===============================

CPU                 : 4 CPU GenuineIntel Intel(R) Atom(TM) CPU  330   @ 1.60GHz 1596MHz

L2 Cache            : 512 KB

OS                  : Linux 2.6.29-gentoo-r5

C compiler          : x86_64-pc-linux-gnu-gcc

libc                :

MEMORY INDEX        : 2.563

INTEGER INDEX       : 2.413

FLOATING-POINT INDEX: 2.361

Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38

* Trademarks are property of their respective holder.

```

With my standard-CFLAGS

CFLAGS="-march=nocona -O2 -pipe"

I have

```

BYTEmark* Native Mode Benchmark ver. 2 (10/95)

Index-split by Andrew D. Balsa (11/97)

Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST                : Iterations/sec.  : Old Index   : New Index

                    :                  : Pentium 90* : AMD K6/233*

--------------------:------------------:-------------:------------

NUMERIC SORT        :          497.52  :      12.76  :       4.19

STRING SORT         :          62.335  :      27.85  :       4.31

BITFIELD            :      2.0232e+08  :      34.71  :       7.25

FP EMULATION        :          52.817  :      25.34  :       5.85

FOURIER             :          6763.3  :       7.69  :       4.32

ASSIGNMENT          :          9.4219  :      35.85  :       9.30

IDEA                :          2106.5  :      32.22  :       9.57

HUFFMAN             :          913.79  :      25.34  :       8.09

NEURAL NET          :           8.498  :      13.65  :       5.74

LU DECOMPOSITION    :          311.92  :      16.16  :      11.67

==========================ORIGINAL BYTEMARK RESULTS==========================

INTEGER INDEX       : 26.488

FLOATING-POINT INDEX: 11.927

Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0

==============================LINUX DATA BELOW===============================

CPU                 : 4 CPU GenuineIntel Intel(R) Atom(TM) CPU  330   @ 1.60GHz 1596MHz

L2 Cache            : 512 KB

OS                  : Linux 2.6.29-gentoo-r5

C compiler          : x86_64-pc-linux-gnu-gcc

libc                :

MEMORY INDEX        : 6.624

INTEGER INDEX       : 6.599

FLOATING-POINT INDEX: 6.615

Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38

* Trademarks are property of their respective holder.

```

CFLAGS="-march=core2 -O2 -pipe"]:

```

BYTEmark* Native Mode Benchmark ver. 2 (10/95)

Index-split by Andrew D. Balsa (11/97)

Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST                : Iterations/sec.  : Old Index   : New Index

                    :                  : Pentium 90* : AMD K6/233*

--------------------:------------------:-------------:------------

NUMERIC SORT        :          510.72  :      13.10  :       4.30

STRING SORT         :          61.406  :      27.44  :       4.25

BITFIELD            :      2.3122e+08  :      39.66  :       8.28

FP EMULATION        :            54.4  :      26.10  :       6.02

FOURIER             :          6757.9  :       7.69  :       4.32

ASSIGNMENT          :          8.7198  :      33.18  :       8.61

IDEA                :          2157.4  :      33.00  :       9.80

HUFFMAN             :          908.02  :      25.18  :       8.04

NEURAL NET          :          10.043  :      16.13  :       6.79

LU DECOMPOSITION    :          408.24  :      21.15  :      15.27

==========================ORIGINAL BYTEMARK RESULTS==========================

INTEGER INDEX       : 26.924

FLOATING-POINT INDEX: 13.790

Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0

==============================LINUX DATA BELOW===============================

CPU                 : 4 CPU GenuineIntel Intel(R) Atom(TM) CPU  330   @ 1.60GHz 1596MHz

L2 Cache            : 512 KB

OS                  : Linux 2.6.29-gentoo-r5

C compiler          : x86_64-pc-linux-gnu-gcc

libc                :

MEMORY INDEX        : 6.715

INTEGER INDEX       : 6.721

FLOATING-POINT INDEX: 7.648

Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38

* Trademarks are property of their respective holder.

```

And the best results so far

CFLAGS="-march=native -O2 -pipe"

```

BYTEmark* Native Mode Benchmark ver. 2 (10/95)

Index-split by Andrew D. Balsa (11/97)

Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST                : Iterations/sec.  : Old Index   : New Index

                    :                  : Pentium 90* : AMD K6/233*

--------------------:------------------:-------------:------------

NUMERIC SORT        :          508.56  :      13.04  :       4.28

STRING SORT         :          60.862  :      27.20  :       4.21

BITFIELD            :       2.314e+08  :      39.69  :       8.29

FP EMULATION        :          54.498  :      26.15  :       6.03

FOURIER             :          6778.9  :       7.71  :       4.33

ASSIGNMENT          :          9.5709  :      36.42  :       9.45

IDEA                :          2164.4  :      33.10  :       9.83

HUFFMAN             :          911.25  :      25.27  :       8.07

NEURAL NET          :          10.083  :      16.20  :       6.81

LU DECOMPOSITION    :          412.64  :      21.38  :      15.44

==========================ORIGINAL BYTEMARK RESULTS==========================

INTEGER INDEX       : 27.270

FLOATING-POINT INDEX: 13.872

Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0

==============================LINUX DATA BELOW===============================

CPU                 : 4 CPU GenuineIntel Intel(R) Atom(TM) CPU  330   @ 1.60GHz 1596MHz

L2 Cache            : 512 KB

OS                  : Linux 2.6.29-gentoo-r5

C compiler          : x86_64-pc-linux-gnu-gcc

libc                :

MEMORY INDEX        : 6.908

INTEGER INDEX       : 6.729

FLOATING-POINT INDEX: 7.694

Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38

* Trademarks are property of their respective holder.

```

Can someone post results with gcc-4.5 and "-march=atom"? My System is in use (Router/Server), so i don't want to make too big changes...

----------

## Bircoph

I use the following for my Atom N270 (on Asus Eee PC 1000H):

```

CFLAGS="-march=core2 -m32 --param l1-cache-line-size=64

--param l1-cache-size=32 --param l2-cache-size=512

-O2 -funswitch-loops -fpredictive-commoning

-fgcse-after-reload -ftree-vectorize -fomit-frame-pointer

-mfpmath=sse -pipe"

```

Some explanation why exactly these flags are used. (I use gcc-4.3.3-r2 ATM: the latest unmasked gcc for Gentoo.)

1) Why not "-march=native"?

That's obvious: a) because current gcc doesn't understand atom properly and will fail to detect it in the best way; b) this will make distcc unusable.

2) Why "-march=core2 -m32"?

Just learn this CPU instruction set, actually it equals to core2 with the exception of x86_64 instructions (also -m32 is required for distcc crosscompilation on amd64):

```

% x86info -f

x86info v1.24.  Dave Jones 2001-2009

Feedback to <davej@redhat.com>.

Found 2 CPUs

--------------------------------------------------------------------------

CPU #1

EFamily: 0 EModel: 1 Family: 6 Model: 28 Stepping: 2

CPU Model: Unknown model.

Processor name string: Intel(R) Atom(TM) CPU N270   @ 1.60GHz

Type: 0 (Original OEM)  Brand: 0 (Unsupported)

Number of cores per physical package=1

Number of logical processors per socket=2

Number of logical processors per core=2

APIC ID: 0x0    Package: 0  Core: 0   SMT ID 0

Feature flags:

 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflsh ds acpi mmx fxsr sse sse2 ss ht tm pbe

Extended feature flags:

 sse3 [2] monitor ds-cpl est tm2 ssse3 xTPR [15] [22]

--------------------------------------------------------------------------

CPU #2

EFamily: 0 EModel: 1 Family: 6 Model: 28 Stepping: 2

CPU Model: Unknown model.

Processor name string: Intel(R) Atom(TM) CPU N270   @ 1.60GHz

Type: 0 (Original OEM)  Brand: 0 (Unsupported)

Number of cores per physical package=1

Number of logical processors per socket=2

Number of logical processors per core=2

APIC ID: 0x1    Package: 0  Core: 0   SMT ID 1

Feature flags:

 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflsh ds acpi mmx fxsr sse sse2 ss ht tm pbe

Extended feature flags:

 sse3 [2] monitor ds-cpl est tm2 ssse3 xTPR [15] [22]

--------------------------------------------------------------------------

```

3) Why "--param l1-cache-line-size=64 --param l1-cache-size=32 --param l2-cache-size=512"?

Because N270 isn't core2: in have smaller l1/l2 cache, thus code generated for core2 will not be so efficient for Atom because of improper cache use: data/code blocks may be too long, etc.

Specifying CPU cache is also always important for distcc: compiler on the other host don't know what CPU you actually use.

4) Why "-O2 -funswitch-loops -fpredictive-commoning -fgcse-after-reload -ftree-vectorize"?

This is actually -O3 -fno-inline-functions. Atom CPU provides relatively small L1/L2 cache, thus its efficiency will be decreased due to extra inlining dramastically, CPU cache should be used for better purposes.

5) Why "-fomit-frame-pointer"?

Because it gains extra free register, this is extremely important because on x86 you have only 4 free-to-use general registers. (JFYI: access to register is 3 times faster that even to L1 cache). If you'll really want to debug something, you'll need recompile it with -g/-g3 anyway.

Isn't it enabled by default? No, it isn't, because it interferes with debugging, read gcc manual.

6) Why -mfpmath=sse?

SSE unit is significantly more efficient than i387 used by default for x86, mostly more due to enhanced instructions. The only problem that i387 unit allows 80-bit width floats, but SSE allows maximum width of 64 bits. In theory this may be a problem for applications relying on 80-bit width floats, but not specifying this explicitly for gcc. Practically I use tons of scientific software (such as root, maxima, R, octave,...) compiled with -mfpmath=sse (in make.conf CFLAGS) for years without any problems.

Ideally -mfpmath=see,i387 as it actually doubles amount of available registers (i387 and sse units are implemented separately by Intel), but gcc register allocator can't model separate units utilization at once, so it is quite risky from the performance POW to use -mfpmath=see,i387 everywhere, you should implement an appropriate assembly by hand.

7) Why "-pipe"?

This speeds compilation up via pipes utilization to avoid temporary files usage. This doesn't affect generated code itself.

----------

## Mr_Maniac

```

~ # CFLAGS="-march=core2 --param l1-cache-line-size=64 --param l1-cache-size=32 --param l2-cache-size=512 -O2 -funswitch-loops -fpredictive-commoning -fgcse-after-reload -ftree-vectorize -fomit-frame-pointer -mfpmath=sse -pipe" emerge nbench

~ # nbench 

BYTEmark* Native Mode Benchmark ver. 2 (10/95)

Index-split by Andrew D. Balsa (11/97)

Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST                : Iterations/sec.  : Old Index   : New Index

                    :                  : Pentium 90* : AMD K6/233*

--------------------:------------------:-------------:------------

NUMERIC SORT        :          516.24  :      13.24  :       4.35

STRING SORT         :           62.19  :      27.79  :       4.30

BITFIELD            :      2.3393e+08  :      40.13  :       8.38

FP EMULATION        :          54.894  :      26.34  :       6.08

FOURIER             :          6778.9  :       7.71  :       4.33

ASSIGNMENT          :          9.7533  :      37.11  :       9.63

IDEA                :          2172.2  :      33.22  :       9.86

HUFFMAN             :          921.62  :      25.56  :       8.16

NEURAL NET          :          9.8775  :      15.87  :       6.67

LU DECOMPOSITION    :           418.4  :      21.68  :      15.65

==========================ORIGINAL BYTEMARK RESULTS==========================

INTEGER INDEX       : 27.617

FLOATING-POINT INDEX: 13.841

Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0

==============================LINUX DATA BELOW===============================

CPU                 : 4 CPU GenuineIntel Intel(R) Atom(TM) CPU  330   @ 1.60GHz 1596MHz

L2 Cache            : 512 KB

OS                  : Linux 2.6.29-gentoo-r5

C compiler          : x86_64-pc-linux-gnu-gcc

libc                : 

MEMORY INDEX        : 7.027

INTEGER INDEX       : 6.791

FLOATING-POINT INDEX: 7.676

Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38

* Trademarks are property of their respective holder.

```

Okay... It really is a bit faster, but really only a bit  :Wink: 

----------

## s4e8

gcc 4.5.0 snapshot 20090702. -march=atom -O3 -mfpmath=sse -fomit-frame-pointer, ATOM N270

nbench score: 539.71 59.888 2.2706e8 87.08 7800.8 13.017 2276.4 979.22, NEURAL NET crashed

compare to Bircoph's CFLAGS, it win: 0.4% 1.4% 28.8% 65.4% 1.7% 8.4% 20% 4.2%

----------

## Bircoph

 *s4e8 wrote:*   

> gcc 4.5.0 snapshot 20090702. -march=atom -O3 -mfpmath=sse -fomit-frame-pointer, ATOM N270
> 
> nbench score: 539.71 59.888 2.2706e8 87.08 7800.8 13.017 2276.4 979.22, NEURAL NET crashed
> 
> compare to Bircoph's CFLAGS, it win: 0.4% 1.4% 28.8% 65.4% 1.7% 8.4% 20% 4.2%

 

This result is very interesting. Could you please post

```

gcc -Q --help=target -march=atom

```

?

And be aware of two important aspects:

1) All measurement data should be provided with errors (either absolute with confidence probability or errors in term of standard deviation), otherwise your benefits may be just a game of statistics, nothing more. Of course, you should run tests several times to be able to calculate errors. This way I can't tell that my options are better than Mr_Maniac's: statistical error is higher test delta in my case.

2) nbench is very, eh, specific benchmark: it covers only some aspects of real-world tasks, thus you should be critical to its results. Some small example.

I have two boxes:

a) Athlon-XP 3200+ (2205 MHZ), 64KB L1 512KB L2, 32bit.

b) Celeron D (2533 MHz), 16KB L1, 256KB L2, 64bit.

Here are nbench results (memory/integer/floating indices) with errors in standard deviations:

a) 12.187 \pm 0.021; 14.068 \pm 0.014; 23.135 \pm 0.025

b) 10.36 \pm 0.18; 8.84 \pm 0.05; 13.75 \pm 0.04

As you can see, host (b) is significantly worse host (a) beyond any errors with nbench.

But wait! Try to generate 16KBit RSA key on both hosts. Host (b) appears to be ~8x times faster: due to 64bit mode and 3x more general use registers it strikes in long arithmetic tasks, particularly in anything related to asymmetric cryptography.

Thus be very careful estimating performance only via tests: you should perform really hard work to say (a) better (b): performance varies greatly depending on task in question.

----------

## s4e8

here is results.

```

bin # ./gcc -Q --help=target -march=atom  

The following options are target specific:

  -m128bit-long-double                  [disabled]

  -m32                                  [enabled]

  -m3dnow                               [disabled]

  -m3dnowa                              [disabled]

  -m64                                  [disabled]

  -m80387                               [enabled]

  -m96bit-long-double                   [enabled]

  -mabi=                      

  -mabm                                 [disabled]

  -maccumulate-outgoing-args            [disabled]

  -maes                                 [disabled]

  -malign-double                        [disabled]

  -malign-functions=          

  -malign-jumps=              

  -malign-loops=              

  -malign-stringops                     [enabled]

  -march=                               atom

  -masm=                      

  -mavx                                 [disabled]

  -mbranch-cost=              

  -mcld                                 [disabled]

  -mcmodel=                   

  -mcrc32                               [disabled]

  -mcx16                                [disabled]

  -mfancy-math-387                      [enabled]

  -mfma                                 [disabled]

  -mforce-drap                          [disabled]

  -mfp-ret-in-387                       [enabled]

  -mfpmath=                   

  -mfused-madd                          [enabled]

  -mglibc                               [enabled]

  -mhard-float                          [enabled]

  -mieee-fp                             [enabled]

  -mincoming-stack-boundary=  

  -minline-all-stringops                [disabled]

  -minline-stringops-dynamically        [disabled]

  -mintel-syntax                        [disabled]

  -mlarge-data-threshold=     

  -mmmx                                 [disabled]

  -mmovbe                               [disabled]

  -mms-bitfields                        [disabled]

  -mno-align-stringops                  [disabled]

  -mno-fancy-math-387                   [disabled]

  -mno-fused-madd                       [disabled]

  -mno-push-args                        [disabled]

  -mno-red-zone                         [disabled]

  -mno-sse4                             [enabled]

  -momit-leaf-frame-pointer             [disabled]

  -mpc                        

  -mpclmul                              [disabled]

  -mpopcnt                              [disabled]

  -mpreferred-stack-boundary= 

  -mpush-args                           [enabled]

  -mrecip                               [disabled]

  -mred-zone                            [enabled]

  -mregparm=                  

  -mrtd                                 [disabled]

  -msahf                                [disabled]

  -msoft-float                          [disabled]

  -msse                                 [disabled]

  -msse2                                [disabled]

  -msse2avx                             [disabled]

  -msse3                                [disabled]

  -msse4                                [disabled]

  -msse4.1                              [disabled]

  -msse4.2                              [disabled]

  -msse4a                               [disabled]

  -msse5                                [disabled]

  -msseregparm                          [disabled]

  -mssse3                               [disabled]

  -mstack-arg-probe                     [disabled]

  -mstackrealign                        [enabled]

  -mstringop-strategy=        

  -mtls-dialect=              

  -mtls-direct-seg-refs                 [enabled]

  -mtune=                     

  -muclibc                              [disabled]

  -mveclibabi=                

```

----------

## Bircoph

This is odd, I can't see any significant difference.

I wonder what the've done...

----------

## s4e8

 *Bircoph wrote:*   

> This is odd, I can't see any significant difference.
> 
> I wonder what the've done...

 

There's new file atom.md, define some atom specific behavior.

```

......

;; Atom is an in-order core with two integer pipelines.

(define_attr "atom_unit" "sishuf,simul,jeu,complex,other"

  (const_string "other"))

(define_attr "atom_sse_attr" "rcp,movdup,lfence,fence,prefetch,sqrt,mxcsr,other"

  (const_string "other"))

(define_automaton "atom")

;;  Atom has two ports: port 0 and port 1 connecting to all execution units

(define_cpu_unit "atom-port-0,atom-port-1" "atom")

;;  EU: Execution Unit

;;  Atom EUs are connected by port 0 or port 1.

......

```

----------

## hielvc

s4e8 I ran your code on my AMD Athlon(tm) X2 Dual Core Processor BE-2300. No matter what I put in for "target I got the same output  *Quote:*   

> gcc -Q --help=target -march=k8 |awk '/enabled/ {print $1}' 
> 
> -m64
> 
> -m80387
> ...

 

Using this code 

```
echo 'int main(){return 0;}' > test.c && gcc -v -Q -march=native -O2   test.c -o test && rm test.c test

Using built-in specs.

Target: x86_64-pc-linux-gnu

Configured with: /var/tmp/portage/sys-devel/gcc-4.3.3-r2/work/gcc-4.3.3/configure --prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/4.3.3 --includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/include --datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.3.3 --mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.3.3/man --infodir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.3.3/info --with-gxx-include-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/include/g++-v4 --host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu --disable-altivec --disable-fixed-point --disable-nls --with-system-zlib --disable-checking --disable-werror --enable-secureplt --enable-multilib --enable-libmudflap --disable-libssp --enable-libgomp --enable-cld --disable-libgcj --enable-languages=c,c++,treelang --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu --with-bugurl=http://bugs.gentoo.org/ --with-pkgversion='Gentoo 4.3.3-r2 p1.2, pie-10.1.5'

Thread model: posix

gcc version 4.3.3 (Gentoo 4.3.3-r2 p1.2, pie-10.1.5) 

COLLECT_GCC_OPTIONS='-v' '-Q'  '-O2' '-o' 'test'

 /usr/libexec/gcc/x86_64-pc-linux-gnu/4.3.3/cc1 -v test.c -D_FORTIFY_SOURCE=2 -march=k8-sse3 -mcx16 -msahf --param l1-cache-size=64 --param l1-cache-line-size=64 -mtune=k8 -dumpbase test.c -auxbase test -O2 -version -o /tmp/ccoIvqwu.s

ignoring nonexistent directory "/usr/local/include"

ignoring nonexistent directory "/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/../../../../x86_64-pc-linux-gnu/include"

#include "..." search starts here:

#include <...> search starts here:

 /usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/include

 /usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/include-fixed

 /usr/include

End of search list.

GNU C (Gentoo 4.3.3-r2 p1.2, pie-10.1.5) version 4.3.3 (x86_64-pc-linux-gnu)

   compiled by GNU C version 4.3.3, GMP version 4.3.1, MPFR version 2.4.1-p5.

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072

options passed:  -v test.c -D_FORTIFY_SOURCE=2 -march=k8-sse3 -mcx16

 -msahf --param l1-cache-size=64 --param l1-cache-line-size=64 -mtune=k8

 -O2

options enabled:  -falign-labels -falign-loops -fargument-alias

 -fasynchronous-unwind-tables -fauto-inc-dec -fbranch-count-reg

 -fcaller-saves -fcommon -fcprop-registers -fcrossjumping

 -fcse-follow-jumps -fdefer-pop -fdelete-null-pointer-checks

 -fearly-inlining -feliminate-unused-debug-types -fexpensive-optimizations

 -fforward-propagate -ffunction-cse -fgcse -fgcse-lm

 -fguess-branch-probability -fident -fif-conversion -fif-conversion2

 -finline-functions-called-once -finline-small-functions -fipa-pure-const

 -fipa-reference -fivopts -fkeep-static-consts -fleading-underscore

 -fmath-errno -fmerge-constants -fmerge-debug-strings

 -fmove-loop-invariants -fomit-frame-pointer -foptimize-register-move

 -foptimize-sibling-calls -fpeephole -fpeephole2 -freg-struct-return

 -fregmove -freorder-blocks -freorder-functions -frerun-cse-after-loop

 -fsched-interblock -fsched-spec -fsched-stalled-insns-dep

 -fschedule-insns2 -fsigned-zeros -fsplit-ivs-in-unroller

 -fsplit-wide-types -fstrict-aliasing -fstrict-overflow -fthread-jumps

 -ftoplevel-reorder -ftrapping-math -ftree-ccp -ftree-ch -ftree-copy-prop

 -ftree-copyrename -ftree-cselim -ftree-dce -ftree-dominator-opts

 -ftree-dse -ftree-fre -ftree-loop-im -ftree-loop-ivcanon

 -ftree-loop-optimize -ftree-parallelize-loops= -ftree-pre -ftree-reassoc

 -ftree-salias -ftree-scev-cprop -ftree-sink -ftree-sra -ftree-store-ccp

 -ftree-ter -ftree-vect-loop-version -ftree-vrp -funit-at-a-time

 -funwind-tables -fvar-tracking -fvect-cost-model -fzero-initialized-in-bss

 -m128bit-long-double -m3dnow -m64 -m80387 -maccumulate-outgoing-args

 -malign-stringops -mcx16 -mfancy-math-387 -mfp-ret-in-387 -mfused-madd

 -mglibc -mieee-fp -mmmx -mno-sse4 -mpush-args -mred-zone -msahf -msse

 -msse2 -msse3 -mtls-direct-seg-refs

Compiler executable checksum: f6e169a902c79329927a6921bcb422f4

 main

Analyzing compilation unit

Performing interprocedural optimizations

 <visibility> <early_local_cleanups> <inline> <static-var> <pure-const>Assembling functions:

 main

Execution times (seconds)

 parser                :   0.01 (100%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall      76 kB ( 7%) ggc

 global alloc          :   0.00 ( 0%) usr   0.01 (100%) sys   0.01 (33%) wall       0 kB ( 0%) ggc

 TOTAL                 :   0.01             0.01             0.03               1118 kB

Internal checks disabled; compiler is not suited for release.

Configure with --enable-checking=release to enable checks.

COLLECT_GCC_OPTIONS='-v' '-Q'  '-O2' '-o' 'test'

 /usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/../../../../x86_64-pc-linux-gnu/bin/as -V -Qy -o /tmp/ccW4B8bR.o /tmp/ccoIvqwu.s

GNU assembler version 2.19.1 (x86_64-pc-linux-gnu) using BFD version (GNU Binutils) 2.19.1

COMPILER_PATH=/usr/libexec/gcc/x86_64-pc-linux-gnu/4.3.3/:/usr/libexec/gcc/x86_64-pc-linux-gnu/4.3.3/:/usr/libexec/gcc/x86_64-pc-linux-gnu/:/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/:/usr/lib/gcc/x86_64-pc-linux-gnu/:/usr/libexec/gcc/x86_64-pc-linux-gnu/4.3.3/:/usr/libexec/gcc/x86_64-pc-linux-gnu/:/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/:/usr/lib/gcc/x86_64-pc-linux-gnu/:/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/../../../../x86_64-pc-linux-gnu/bin/

LIBRARY_PATH=/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/:/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/:/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/../../../../lib64/:/lib/../lib64/:/usr/lib/../lib64/:/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/../../../../x86_64-pc-linux-gnu/lib/:/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/../../../:/lib/:/usr/lib/

COLLECT_GCC_OPTIONS='-v' '-Q'  '-O2' '-o' 'test'

 /usr/libexec/gcc/x86_64-pc-linux-gnu/4.3.3/collect2 --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o test /usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/../../../../lib64/crt1.o /usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/../../../../lib64/crti.o /usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/crtbegin.o -L/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3 -L/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3 -L/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/../../../../x86_64-pc-linux-gnu/lib -L/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/../../.. /tmp/ccW4B8bR.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/crtend.o /usr/lib/gcc/x86_64-pc-linux-gnu/4.3.3/../../../../lib64/crtn.o
```

As you can see its an x86_64-pc-linux-gnu running gcc-4.3.3 using march=ntive which defaults to k8-sse3. As you can see 3dnow and company plus a bunch more  are actually enabled. I like my output   :Wink: 

----------

## s4e8

 *hielvc wrote:*   

> s4e8 I ran your code on my AMD Athlon(tm) X2 Dual Core Processor BE-2300. No matter what I put in for "target I got the same output 

 

OK, here 's the -Q -v output:

```

GNU C (GCC) version 4.5.0 20090702 (experimental) (i686-pc-linux-gnu)

        compiled by GNU C version 4.5.0 20090702 (experimental), GMP version 4.2.4, MPFR version 2.4.1-p1

GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096

options passed:  -v a.c -march=atom -mfpmath=sse -O3 -fomit-frame-pointer

options enabled:  -falign-labels -falign-loops -fargument-alias

 -fauto-inc-dec -fbranch-count-reg -fcaller-saves -fcommon

 -fcprop-registers -fcrossjumping -fcse-follow-jumps -fdefer-pop

 -fdelete-null-pointer-checks -fdwarf2-cfi-asm -fearly-inlining

 -feliminate-unused-debug-types -fexpensive-optimizations

 -fforward-propagate -ffunction-cse -fgcse -fgcse-after-reload -fgcse-lm

 -fguess-branch-probability -fident -fif-conversion -fif-conversion2

 -findirect-inlining -finline -finline-functions

 -finline-functions-called-once -finline-small-functions -fipa-cp

 -fipa-cp-clone -fipa-pure-const -fipa-reference -fira-share-save-slots

 -fira-share-spill-slots -fivopts -fkeep-static-consts -fleading-underscore

 -fmath-errno -fmerge-constants -fmerge-debug-strings

 -fmove-loop-invariants -fomit-frame-pointer -foptimize-register-move

 -foptimize-sibling-calls -fpcc-struct-return -fpeephole -fpeephole2

 -fpredictive-commoning -fregmove -freorder-blocks -freorder-functions

 -frerun-cse-after-loop -fsched-interblock -fsched-spec

 -fsched-stalled-insns-dep -fschedule-insns2 -fshow-column -fsigned-zeros

 -fsplit-ivs-in-unroller -fsplit-wide-types -fstrict-aliasing

 -fstrict-overflow -fthread-jumps -ftoplevel-reorder -ftrapping-math

 -ftree-builtin-call-dce -ftree-ccp -ftree-ch -ftree-copy-prop

 -ftree-copyrename -ftree-cselim -ftree-dce -ftree-dominator-opts

 -ftree-dse -ftree-forwprop -ftree-fre -ftree-loop-im -ftree-loop-ivcanon

 -ftree-loop-optimize -ftree-parallelize-loops= -ftree-phiprop -ftree-pre

 -ftree-pta -ftree-reassoc -ftree-scev-cprop -ftree-sink

 -ftree-slp-vectorize -ftree-sra -ftree-switch-conversion -ftree-ter

 -ftree-vect-loop-version -ftree-vectorize -ftree-vrp -funit-at-a-time

 -funswitch-loops -fvar-tracking -fvect-cost-model

 -fzero-initialized-in-bss -m32 -m80387 -m96bit-long-double

 -maccumulate-outgoing-args -malign-stringops -mcx16 -mfancy-math-387

 -mfp-ret-in-387 -mfused-madd -mglibc -mieee-fp -mmmx -mmovbe -mno-red-zone

 -mno-sse4 -mpush-args -msahf -msse -msse2 -msse3 -mssse3

 -mtls-direct-seg-refs

Compiler executable checksum: f142bf44665c008856fda3c64386a6ca

 main

Analyzing compilation unit

Performing interprocedural optimizations

 <visibility> <early_local_cleanups> <summary generate> <cp> <inline> <static-var> <pure-const>Assembling functions:

 main

Execution times (seconds)

 callgraph construction:   0.00 ( 0%) usr   0.00 ( 0%) sys   0.03 (11%) wall       0 kB ( 0%) ggc

 parser                :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.08 (30%) wall     192 kB (23%) ggc

 tree gimplify         :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 4%) wall       0 kB ( 0%) ggc

 tree CFG construction :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 7%) wall       0 kB ( 0%) ggc

 tree CFG cleanup      :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 4%) wall       0 kB ( 0%) ggc

 tree SSA other        :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 4%) wall       0 kB ( 0%) ggc

 tree CCP              :   0.00 ( 0%) usr   0.01 (100%) sys   0.01 ( 4%) wall       0 kB ( 0%) ggc

 expand                :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.03 (11%) wall       3 kB ( 0%) ggc

 combiner              :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 7%) wall       0 kB ( 0%) ggc

 scheduling 2          :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.03 (11%) wall       0 kB ( 0%) ggc

 machine dep reorg     :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 4%) wall       0 kB ( 0%) ggc

 TOTAL                 :   0.01             0.01             0.27                847 kB

Extra diagnostic checks enabled; compiler may run slowly

```

----------

## BillyBoy

```

BYTEmark* Native Mode Benchmark ver. 2 (10/95)

Index-split by Andrew D. Balsa (11/97)

Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST                : Iterations/sec.  : Old Index   : New Index

                    :                  : Pentium 90* : AMD K6/233*

--------------------:------------------:-------------:------------

NUMERIC SORT        :          537.24  :      13.78  :       4.52

STRING SORT         :          58.753  :      26.25  :       4.06

BITFIELD            :      1.7623e+08  :      30.23  :       6.31

FP EMULATION        :          54.418  :      26.11  :       6.03

FOURIER             :          7294.8  :       8.30  :       4.66

ASSIGNMENT          :          11.767  :      44.78  :      11.61

IDEA                :          2044.5  :      31.27  :       9.28

HUFFMAN             :           978.9  :      27.14  :       8.67

NEURAL NET          :          7.4568  :      11.98  :       5.04

LU DECOMPOSITION    :           396.2  :      20.53  :      14.82

==========================ORIGINAL BYTEMARK RESULTS==========================

INTEGER INDEX       : 27.142

FLOATING-POINT INDEX: 12.682

Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0

==============================LINUX DATA BELOW===============================

CPU                 : 4 CPU GenuineIntel Intel(R) Atom(TM) CPU  330   @ 1.60GHz 1596MHz

L2 Cache            : 512 KB

OS                  : Linux 2.6.29-gentoo-r5

C compiler          : i686-pc-linux-gnu-gcc

libc                : 

MEMORY INDEX        : 6.679

INTEGER INDEX       : 6.844

FLOATING-POINT INDEX: 7.034

Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38

* Trademarks are property of their respective holder.

```

My CFLAGS:

```
CFLAGS="-O2 -march=prescott -mtune=core2 -fomit-frame-pointer -pipe"
```

My uname:

```
Linux atom 2.6.29-gentoo-r5 #3 SMP Wed Jul 29 22:40:06 PDT 2009 i686 Intel(R) Atom(TM) CPU 330 @ 1.60GHz GenuineIntel GNU/Linux
```

My portage:

```
Portage 2.1.6.13 (default/linux/x86/2008.0, gcc-4.3.2, glibc-2.9_p20081201-r2, 2.6.29-gentoo-r5 i686)

=================================================================

System uname: Linux-2.6.29-gentoo-r5-i686-Intel-R-_Atom-TM-_CPU_330_@_1.60GHz-with-glibc2.0

Timestamp of tree: Mon, 27 Jul 2009 10:45:02 +0000
```

My kit (from dmidecode):

```
Base Board Information

        Manufacturer: Intel Corporation

        Product Name: D945GCLF2

        Version: AAE46416-106
```

I have one stick of DDR2 800 but it only runs at 533 (despite the box saying it can do 667!). I'm actually pretty happy with this. For a hundred bucks, I have a completely usable system. Gotta love Gentoo....

----------

## djtreble

Comparing march=atom to march=core2

```
CFLAGS="-O2 -march=core2 -pipe
```

```
BYTEmark* Native Mode Benchmark ver. 2 (10/95)

Index-split by Andrew D. Balsa (11/97)

Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST                : Iterations/sec.  : Old Index   : New Index

                    :                  : Pentium 90* : AMD K6/233*

--------------------:------------------:-------------:------------

NUMERIC SORT        :          479.28  :      12.29  :       4.04

STRING SORT         :          56.235  :      25.13  :       3.89

BITFIELD            :      1.3752e+08  :      23.59  :       4.93

FP EMULATION        :          46.123  :      22.13  :       5.11

FOURIER             :          7237.1  :       8.23  :       4.62

ASSIGNMENT          :          11.877  :      45.19  :      11.72

IDEA                :          1840.9  :      28.16  :       8.36

HUFFMAN             :          849.82  :      23.57  :       7.53

NEURAL NET          :          6.9442  :      11.16  :       4.69

LU DECOMPOSITION    :          399.44  :      20.69  :      14.94

==========================ORIGINAL BYTEMARK RESULTS==========================

INTEGER INDEX       : 24.182

FLOATING-POINT INDEX: 12.385

Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0

==============================LINUX DATA BELOW===============================

CPU                 : Dual GenuineIntel Intel(R) Atom(TM) CPU N270   @ 1.60GHz 1600MHz

L2 Cache            : 512 KB

OS                  : Linux 2.6.31-gentoo-r6

C compiler          : i686-pc-linux-gnu-gcc

libc                : 

MEMORY INDEX        : 6.079

INTEGER INDEX       : 6.001

FLOATING-POINT INDEX: 6.869

Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38

* Trademarks are property of their respective holder.
```

```
CFLAGS="-O2 -march=atom -pipe"
```

```
BYTEmark* Native Mode Benchmark ver. 2 (10/95)

Index-split by Andrew D. Balsa (11/97)

Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST                : Iterations/sec.  : Old Index   : New Index

                    :                  : Pentium 90* : AMD K6/233*

--------------------:------------------:-------------:------------

NUMERIC SORT        :          512.16  :      13.13  :       4.31

STRING SORT         :          56.093  :      25.06  :       3.88

BITFIELD            :      1.3813e+08  :      23.69  :       4.95

FP EMULATION        :          51.637  :      24.78  :       5.72

FOURIER             :          7118.5  :       8.10  :       4.55

ASSIGNMENT          :          12.773  :      48.60  :      12.61

IDEA                :          1531.4  :      23.42  :       6.95

HUFFMAN             :           868.2  :      24.08  :       7.69

NEURAL NET          :          7.0021  :      11.25  :       4.73

LU DECOMPOSITION    :          379.56  :      19.66  :      14.20

==========================ORIGINAL BYTEMARK RESULTS==========================

INTEGER INDEX       : 24.499

FLOATING-POINT INDEX: 12.143

Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0

==============================LINUX DATA BELOW===============================

CPU                 : Dual GenuineIntel Intel(R) Atom(TM) CPU N270   @ 1.60GHz 1600MHz

L2 Cache            : 512 KB

OS                  : Linux 2.6.31-gentoo-r6

C compiler          : i686-pc-linux-gnu-gcc

libc                : 

MEMORY INDEX        : 6.232

INTEGER INDEX       : 6.026

FLOATING-POINT INDEX: 6.735

Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38

* Trademarks are property of their respective holder.
```

```
gcc version 4.5.0-alpha20091224 20091224 (experimental) (Gentoo 4.5.0_alpha20091224)
```

Shows nothing really  :Sad:  I ran nbench again and it gave differing results, so I don't really trust it!

----------

## b0nafide

Acer Aspire One D150... 

```
gcc version 4.3.4 (Gentoo 4.3.4 p1.0, pie-10.1.5) 

CFLAGS="-O2 -march=core2 -mtune=generic -fomit-frame-pointer -pipe"

# nbench

BYTEmark* Native Mode Benchmark ver. 2 (10/95)

Index-split by Andrew D. Balsa (11/97)

Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST                : Iterations/sec.  : Old Index   : New Index

                    :                  : Pentium 90* : AMD K6/233*

--------------------:------------------:-------------:------------

NUMERIC SORT        :          525.72  :      13.48  :       4.43

STRING SORT         :          57.211  :      25.56  :       3.96

BITFIELD            :      1.7151e+08  :      29.42  :       6.15

FP EMULATION        :          56.795  :      27.25  :       6.29

FOURIER             :          7329.5  :       8.34  :       4.68

ASSIGNMENT          :          11.688  :      44.48  :      11.54

IDEA                :          2050.2  :      31.36  :       9.31

HUFFMAN             :          964.26  :      26.74  :       8.54

NEURAL NET          :          7.1714  :      11.52  :       4.85

LU DECOMPOSITION    :          405.76  :      21.02  :      15.18

==========================ORIGINAL BYTEMARK RESULTS==========================

INTEGER INDEX       : 26.942

FLOATING-POINT INDEX: 12.638

Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0

==============================LINUX DATA BELOW===============================

CPU                 : Dual GenuineIntel Intel(R) Atom(TM) CPU N270   @ 1.60GHz 1600MHz

L2 Cache            : 512 KB

OS                  : Linux 2.6.31-gentoo-r6

C compiler          : i686-pc-linux-gnu-gcc

libc                : 

MEMORY INDEX        : 6.546

INTEGER INDEX       : 6.859

FLOATING-POINT INDEX: 7.009

Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38

* Trademarks are property of their respective holder.

```

----------

## djselbeck

on HP Mini 5101:

```
CFLAGS="-O2 -march=core2 -mtune=generic -fomit-frame-pointer  -pipe" 

gcc 4.3.4
```

```
BYTEmark* Native Mode Benchmark ver. 2 (10/95)

Index-split by Andrew D. Balsa (11/97)

Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST                : Iterations/sec.  : Old Index   : New Index

                    :                  : Pentium 90* : AMD K6/233*

--------------------:------------------:-------------:------------

NUMERIC SORT        :           553.8  :      14.20  :       4.66

STRING SORT         :           60.52  :      27.04  :       4.19

BITFIELD            :      1.7867e+08  :      30.65  :       6.40

FP EMULATION        :           59.08  :      28.35  :       6.54

FOURIER             :          7646.5  :       8.70  :       4.88

ASSIGNMENT          :          12.227  :      46.53  :      12.07

IDEA                :          2147.4  :      32.84  :       9.75

HUFFMAN             :          1035.4  :      28.71  :       9.17

NEURAL NET          :          7.5818  :      12.18  :       5.12

LU DECOMPOSITION    :          429.08  :      22.23  :      16.05

==========================ORIGINAL BYTEMARK RESULTS==========================

INTEGER INDEX       : 28.329

FLOATING-POINT INDEX: 13.303

Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0

==============================LINUX DATA BELOW===============================

CPU                 : Dual GenuineIntel Intel(R) Atom(TM) CPU N280   @ 1.66GHz 1667MHz

L2 Cache            : 512 KB

OS                  : Linux 2.6.31-gentoo-r6

C compiler          : i686-pc-linux-gnu-gcc

libc                : 

MEMORY INDEX        : 6.864

INTEGER INDEX       : 7.227

FLOATING-POINT INDEX: 7.378

Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38

* Trademarks are property of their respective holder.

```

----------

## Nuteater

I recently upgraded my EEE 901 to a 4.5 prerelease to try -march=atom (and because

my system hasn’t been properly broken for a long time  :Wink: ). Here are the results.

With gcc-4.4.1 and

```
CFLAGS="-march=prescott -O2 -fomit-frame-pointer -pipe"
```

```
BYTEmark* Native Mode Benchmark ver. 2 (10/95)

Index-split by Andrew D. Balsa (11/97)

Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST                : Iterations/sec.  : Old Index   : New Index

                    :                  : Pentium 90* : AMD K6/233*

--------------------:------------------:-------------:------------

NUMERIC SORT        :           527.2  :      13.52  :       4.44

STRING SORT         :          57.857  :      25.85  :       4.00

BITFIELD            :      2.0284e+08  :      34.79  :       7.27

FP EMULATION        :          56.235  :      26.98  :       6.23

FOURIER             :          7325.3  :       8.33  :       4.68

ASSIGNMENT          :          11.777  :      44.81  :      11.62

IDEA                :          1991.2  :      30.46  :       9.04

HUFFMAN             :          869.22  :      24.10  :       7.70

NEURAL NET          :          6.5974  :      10.60  :       4.46

LU DECOMPOSITION    :          310.24  :      16.07  :      11.61

==========================ORIGINAL BYTEMARK RESULTS==========================

INTEGER INDEX       : 27.122

FLOATING-POINT INDEX: 11.237

Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0

==============================LINUX DATA BELOW===============================

CPU                 : Dual GenuineIntel Intel(R) Atom(TM) CPU N270   @ 1.60GHz 1600MHz

L2 Cache            : 512 KB

OS                  : Linux 2.6.32.8

C compiler          : i686-pc-linux-gnu-gcc

libc                : 

MEMORY INDEX        : 6.966

INTEGER INDEX       : 6.623

FLOATING-POINT INDEX: 6.232

Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38

* Trademarks are property of their respective holder.
```

With gcc-4.5.0-alpha20100408 and

```
CFLAGS="-march=atom -O2 -mssse3 -mfpmath=sse -fexcess-precision=fast -fomit-frame-pointer -pipe"
```

```
BYTEmark* Native Mode Benchmark ver. 2 (10/95)

Index-split by Andrew D. Balsa (11/97)

Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST                : Iterations/sec.  : Old Index   : New Index

                    :                  : Pentium 90* : AMD K6/233*

--------------------:------------------:-------------:------------

NUMERIC SORT        :          523.92  :      13.44  :       4.41

STRING SORT         :          59.896  :      26.76  :       4.14

BITFIELD            :      1.4147e+08  :      24.27  :       5.07

FP EMULATION        :          54.872  :      26.33  :       6.08

FOURIER             :          7708.9  :       8.77  :       4.92

ASSIGNMENT          :          13.934  :      53.02  :      13.75

IDEA                :          1939.2  :      29.66  :       8.81

HUFFMAN             :          1017.2  :      28.21  :       9.01

NEURAL NET          :          9.6915  :      15.57  :       6.55

LU DECOMPOSITION    :          451.44  :      23.39  :      16.89

==========================ORIGINAL BYTEMARK RESULTS==========================

INTEGER INDEX       : 26.900

FLOATING-POINT INDEX: 14.724

Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0

==============================LINUX DATA BELOW===============================

CPU                 : Dual GenuineIntel Intel(R) Atom(TM) CPU N270   @ 1.60GHz 800MHz

L2 Cache            : 512 KB

OS                  : Linux 2.6.32.8

C compiler          : i686-pc-linux-gnu-gcc

libc                : 

MEMORY INDEX        : 6.610

INTEGER INDEX       : 6.791

FLOATING-POINT INDEX: 8.166

Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38

* Trademarks are property of their respective holder.
```

Of course an artificial benchmark such as this doesn’t tell much, but floating point performance seems to be improved by a significant amount. Of course this may be just because of the other optimizations such as -mfpmath=sse.

----------

