# Is 64-bit/dual-core supposed to make any difference?

## neoklis

Hi all,

I recently upgraded my hardware to a 64-bit dual-core system: ASUS A8N SLI mobo, AMD Athlon 64 X2 2.4 GHz, 1Gb ram and 2 SATA2 disks. My old machine was a 2 GHz P4 and I was running Arch Linux 0.7.1. Once I got the new hardware running, I installed Arch again since its much easier and quicker than Gentoo (nice installer and binaries only   :Wink:  ). But I really needed an optimized 64-bit Linux installation to run a very math-heavy app I have written, to reduce execution time which was several minutes at a time.

So I installed Gentoo on the second hard disk and although it took a long time, I was successful first time (beginner's luck?) and when I tried my app (compiled in gentoo) it was about 5 times faster than in Arch   :Very Happy:  However, I made some silly mistake and corrupted the file systems on both disks   :Embarassed:  so I had to re-install. Arch was easy enough, had it up and running in time for lunch. However, this time I just could'nt get the Gentoo installation right, probably because I put the various commands in bash scripts to save time, but eventually I got it running again. This time though, my heavy app runs at the same speed in both Gentoo and Arch linux!   :Shocked: 

I am now trying to figure out this puzzle: Have I done something wrong that has made Gentoo/AMD64 significantly slower the second time? Have I done something right that has made Arch linux faster after the second install? (I did experiment a lot with kernel compilation in both OS's so this may be a factor). And a point here, my app run at the same speed whether I enabled or disabled SMP in the kernel (the only difference I noticed was that XFCE's CPU graph shows 50% load with SMP and 100% without!). 

So here I am after two weeks of effort, wondering whether 64 bit is supposed to be faster and if not, if I might have been better off with a faster-clocked P4 or Athlon. I would also appreciate any hints on kernel compilation, like enabling SMP or NUMA features etc.

My thanks in advance to all.

----------

## DNAspark99

I'd love an AmdX2. bastard :p

you seen yerself it WAS faster, don't doubt that you've missed something somewhere

post your make.conf for starters

----------

## bexamous2

"(the only difference I noticed was that XFCE's CPU graph shows 50% load with SMP and 100% without!)."

Keep it enabled.. the reason with SMP the load is 50% is because only one cpu is at 100% the other is at 0%... with SMP off the one cpu is at 100%, the other is not even available to be used.  For your particular program your running its not going to take advantage of both processors which is why you only get to max one cpu out, however other programs will.  If you wrote this program you're running you could try to modify it to take advantage of multiple processors..  then you really will see a huge improvement.

----------

## neoklis

 *DNAspark99 wrote:*   

> I'd love an AmdX2. bastard :p

 

I was just curious   :Wink: 

 *DNAspark99 wrote:*   

> you seen yerself it WAS faster, don't doubt that you've missed something somewhere post your make.conf for starters

 

Thanks for the reply! My make.conf as follows:

CHOST="x86_64-pc-linux-gnu"

CFLAGS="-march=k8 -O2 -pipe"

CXXFLAGS="${CFLAGS}"

USE="X aac acpi alsa avi cdparanoia cdr cscope cups divx4linux dr dvd dvdr dvdread -emacs fortran -gnome gpm gtk2 -kde nls -qt tidy v4l vcd"

MAKEOPTS="-j3"

Perhaps I should try a GRP install, less prone to errors?

----------

## neoklis

 *bexamous2 wrote:*   

> "(the only difference I noticed was that XFCE's CPU graph shows 50% load with SMP and 100% without!)."
> 
> Keep it enabled.. the reason with SMP the load is 50% is because only one cpu is at 100% the other is at 0%... with SMP off the one cpu is at 100%, the other is not even available to be used.  For your particular program your running its not going to take advantage of both processors which is why you only get to max one cpu out, however other programs will.  If you wrote this program you're running you could try to modify it to take advantage of multiple processors..  then you really will see a huge improvement.

 

Thanks! Looks like I will need a tutorial on Linux threads   :Very Happy: 

----------

## kornhs4

Maybe you want to increase the "-j" Option from 3 to 5 or 6? This would maybe give improvements in compilation, because more make-processes can be launched side by side, when you call emerge.

----------

## luisfelipe

 *neoklis wrote:*   

> 
> 
> Thanks! Looks like I will need a tutorial on Linux threads  

 

http://www.lindaspaces.com/book/

that should help you out a little. Also, which GCC flags do you use to compile your programs ??

----------

## neoklis

 *luisfelipe wrote:*   

>  *neoklis wrote:*   
> 
> Thanks! Looks like I will need a tutorial on Linux threads   
> 
> http://www.lindaspaces.com/book/
> ...

 

Looks great but rather "heavy"! Downloading the pdf now...

GCC flags I use is -march=k8 -O2 -pipe, same on both 32 and 64 bit platforms. The performance of my program is so similar on both that I suspect the processor is somehow working in the same mode on both platforms, e.g. in 32 or 64 bit mode, internally at least. Unless I made a mistake that slows down Gentoo/AMD64 to 32bit speeds exactly!

----------

## luisfelipe

Well, have you tried enabling more optimizations ? -ffast-math should give you a boost, but it can

pretty much screw up your results depending on your application.

But at least increasing to -O3 should already help out a little.

----------

## lbrtuk

 *neoklis wrote:*   

> Looks like I will need a tutorial on Linux threads  

 

All you need to learn are pthreads. All serious computer systems have a good implementation of pthreads. They're quite easy in themselves. Figuring out how parallelisable your algorithm is is another matter.

----------

## neoklis

 *luisfelipe wrote:*   

> Well, have you tried enabling more optimizations ? -ffast-math should give you a boost, but it can
> 
> pretty much screw up your results depending on your application.
> 
> But at least increasing to -O3 should already help out a little.

 

Well, I will try that, but I suppose performance would increase approx the same on both platforms. The main thing I wonder about is why it runs at the same speed in both 32 bit and 64 bit mode. Still, its a lot faster than my old machine.

----------

## neoklis

 *lbrtuk wrote:*   

>  *neoklis wrote:*   Looks like I will need a tutorial on Linux threads   
> 
> All you need to learn are pthreads. All serious computer systems have a good implementation of pthreads. They're quite easy in themselves. Figuring out how parallelisable your algorithm is is another matter.

 

Thanks, I will have a look at that. But the program is based on an old (ca 1980) DARPA program written in FORTRAN (NEC2, the antenna analysis code) which I translated to C, so without knowing much about pthreads, I expect it to be hard.

----------

## bollucks

 *neoklis wrote:*   

>  *luisfelipe wrote:*   Well, have you tried enabling more optimizations ? -ffast-math should give you a boost, but it can
> 
> pretty much screw up your results depending on your application.
> 
> But at least increasing to -O3 should already help out a little. 
> ...

 

64bit does not afford you any more speed unless the application is dependent on 64bit structures like some mathematical packages, ray tracing etc. Generally the difference between 32bit and 64bit from the user perspective is minimal otherwise. The main advantage to normal machines is that you no don't need to enable highmem when you have >=1GB ram with 64bit. Highmem is a performance hit. However the 64bit version of apps tend to also use a lot more memory depending on how it was coded.

----------

## neoklis

 *bollucks wrote:*   

>  *neoklis wrote:*    *luisfelipe wrote:*   Well, have you tried enabling more optimizations ? -ffast-math should give you a boost, but it can
> 
> pretty much screw up your results depending on your application.
> 
> But at least increasing to -O3 should already help out a little. 
> ...

 

Well, thank you for this. It now seems to me that the original large prformance advantage in 64-bit Gentoo must have come from some mistake either in my tests or in kernel compilation parameters in 32-bit ArchLinux. I carefully recompiled the Arch kernel (2.6.15), and re-installed just the base of Gentoo from a universal CD, using the latest (2.6.15-r3) kernel, also carefully compiled, with as similar parameters as possible. Truly, in 64 bit the processor features options have different choices (NUMA support, no high mem etc) and then carried out careful tests, using  the command-line version of my program, compiled with various CFLAGS. 

The results were an eye-opener: In 32-bit Arch, from specifying -march=i386 to -march=i686 (standard of Arch) to -march=k8 makes little difference: Execution times for a modest job were 25 +- 1 sec. Same for -O2 to -O3. -ffast-math makes a big difference: 19 sec. Corresponding figures in 64-bit Gentoo are 22 sec and 17 sec, so I guess I got a little something for my money and efforts!   :Very Happy: 

My thanks to all for the replies and tips.

----------

## piwacet

I believe all X2 CPUs have the SSE3 instruction set, which helps with certain mathematical calculations, so you should be safe to add

```
-msse3
```

to your CFLAGS.

I think the SSE3 instructions help with certain obscure mathematical operations, so you may not see a speedup depending on what your doing.  But its worth adding, as your CPU is designed to use these instructions.

----------

## neoklis

 *piwacet wrote:*   

> I believe all X2 CPUs have the SSE3 instruction set, which helps with certain mathematical calculations, so you should be safe to add
> 
> ```
> -msse3
> ```
> ...

 

Well, seems to make little difference... One good thing with the dual-core cpu though is that gcc makes full use of it with the -j3 flag. I can install gentoo from scratch to xfce4, firefox, thunderbird, gimp et al in one day   :Very Happy: 

----------

