# GCC intermittantly failing

## passive

Hi All,

I have a system that's had a variety of errors over the past few months. Very inconsistent, generally, but there has always seemed to be something a little "off".

I just did a fresh install of Gentoo, and while updating the core packages, I kept getting errors. I was unable to rebuild GCC or glibc, for example, but there were also failures in a variety of other packages. Trying to install the most basic KDE, which is about 100 packages, including the ones for X, resulted in about a dozen of them failing.

Some of the errors specifically mentioned GCC failing, and cautioned that it was probably a hardware or OS problem. This would make sense given the history I've seen, but I'm not exactly sure how to track it down. I don't think it's a hard drive problem, as of the two in there, one is brand new, and the other has been fairly thoroughly tested. I suspected memory, and am currently running memtest+, but it's got 18 passes with 0 errors so far.

Are there any tools you could recommend to diagnose what might be failing? Thanks.

----------

## eyoung100

Try using an emerge wrapper first to correctly rebuild your toolchain.  I use emwrap.  For more information see this post.  If there is a bunch of cruft(file trash), or you've had your system up awhile, the toolchain sometimes can become unstable.  If emwrap fails to rebuild the toolchain, then you can safely rule out gcc as being the cause.  If this doesn't work, please reply with the manufacturer of your hard drives.

----------

## tarpman

Overheating is a major possibility.

----------

## eyoung100

 *tarpman wrote:*   

> Overheating is a major possibility.

 

With that in mind, check your BIOS's PC Health Section.  Somehere in that section will be a CPU Fan Speed indicator along with a CPU Temperature and a System Temperature.  If your CPU Temp is >= 125 F, you may have a cracked CPU.  If the System Temp is >= 110 F, you may have a blocked cooling fan or not enough fans.  Heat does do major damage, and it isn't slow damage, it can be rather quick.

Let us all know what you find out   :Confused:   :Question: 

----------

## passive

Ok, well here's where I'm at:

Memtest+ went to 30 runs with 0 errors. Given how frequently I get errors when it's running regularly, I figured it's probably not the memory.

I tried:

emwrap.sh -t world

Which started with glibc, and segfaulted. I rebooted twice, an ran

emwrap.sh -r -t world

each time, with a segfault as my reward.

Now I'm looking into diagnosing the CPU. It's an Athlon 64 at 2.2 Ghz, and I have a very new fan on it. I'm currently reading this guide:

http://www.gentoo.org/doc/en/articles/hardware-stability-p1.xml

----------

## redgsturbo

 *passive wrote:*   

> Ok, well here's where I'm at:
> 
> Memtest+ went to 30 runs with 0 errors. Given how frequently I get errors when it's running regularly, I figured it's probably not the memory.
> 
> I tried:
> ...

 

is the machine overclocked?  Did you use a decent thermal paste?

----------

## eyoung100

 *eyoung100 wrote:*   

>  *tarpman wrote:*   Overheating is a major possibility. 
> 
> With that in mind, check your BIOS's PC Health Section.  Somehere in that section will be a CPU Fan Speed indicator along with a CPU Temperature and a System Temperature.  If your CPU Temp is >= 125 F, you may have a cracked CPU.  If the System Temp is >= 110 F, you may have a blocked cooling fan or not enough fans.  Heat does do major damage, and it isn't slow damage, it can be rather quick.
> 
> Let us all know what you find out   

 

This is much like the lm_sensors package described in the doc you are reading.  Check here before you put the system under load.  If the temperature is high when the system is "idling," you have an undercooling issue.  That's why we're all asking questions that normally create heat stress on a CPU, i.e. overclocking, cracking etc.

----------

## passive

I got lm_sensors working, or so it seems. Running CPUburn takes the k8temp sensor from 48 to 76 degrees in just a few minutes. I'm guessing this is the problem.

I don't think the CPU is overclocked, but I got the system as a whole from my brother, so it is a possibility. I'm not entirely sure of the thermal paste used, so I will double check that as well.

Thanks for all the help so far.

----------

## eyoung100

Would you please paste your /etc/make.conf??  Let's make sure your compile settings are not overdone also.

----------

## passive

Ok, the CPU was overclocked. It's a 2800+ which is supposed to be at 1.8Ghz. Unfortunately, the only keyboard I have is a on imac USB model, without a delete key, so thus far I am locked out of the BIOS. Fortunately, I've reset enough BIOS' in my time to be able to guess the correct jumper, and I'm now back operating at 1.8Ghz.

I also cleaned off and reapplied the thermal paste, which I believe is Artic Silver Ceramic.

At this point, it idles around 44 degrees. I have a thermaltake Silent Boost fan, which probably contributes to this relatively high idle temp. Running CPUburn, it's still getting remarkably hot (I've seen 82 degrees so far, though it's taken far longer then before to get to that point), so I wonder if my fan isn't working properly. I did a bit of research before purchasing this model, and it should have no problem cooling this CPU. The heatsink itself gets very hot, so I think it is absorbing heat very well.

Oh, here's my make.conf (I've barely done anything with this system yet):

```

# These settings were set by the catalyst build script that automatically

# built this stage.

# Please consult /etc/make.conf.example for a more detailed example.

CFLAGS="-O2 -march=i686 -pipe"

CXXFLAGS="${CFLAGS}"

# This should not be changed unless you know exactly what you are doing.  You

# should probably be using a different stage, instead.

ACCEPT_KEYWORDS="x86"

CHOST="i686-pc-linux-gnu"

USE="-ipv6"

```

At this point, I'm thinking either the fan on the heatsink isn't doing it's job, or the CPU has started producing too much heat.

----------

## redgsturbo

 *passive wrote:*   

> Ok, the CPU was overclocked. It's a 2800+ which is supposed to be at 1.8Ghz. Unfortunately, the only keyboard I have is a on imac USB model, without a delete key, so thus far I am locked out of the BIOS. Fortunately, I've reset enough BIOS' in my time to be able to guess the correct jumper, and I'm now back operating at 1.8Ghz.
> 
> I also cleaned off and reapplied the thermal paste, which I believe is Artic Silver Ceramic.
> 
> At this point, it idles around 44 degrees. I have a thermaltake Silent Boost fan, which probably contributes to this relatively high idle temp. Running CPUburn, it's still getting remarkably hot (I've seen 82 degrees so far, though it's taken far longer then before to get to that point), so I wonder if my fan isn't working properly. I did a bit of research before purchasing this model, and it should have no problem cooling this CPU. The heatsink itself gets very hot, so I think it is absorbing heat very well.
> ...

 

I have a pentium D that runs perfect at 2.66, and runs stable at 3.4Ghz except for long long compiles such as gcc, glibc, or a looped kernel recomple.  It had the same behaviour that you are describing (gcc complianing about likely hardware issues, difficult to reproduce in the same part of a compile, etc)

----------

## eyoung100

Below 100 F is perfectly acceptable.  Try:

Change -march to k8 then:

```

emerge --sync

```

followed by:

```

./emwrap.sh -wuDb

```

Let me know if it crashes...

----------

## passive

Oh, all those temperatures are Celsius. 

At the moment, I think everything is ok. I'm compiling GCC, and the temperature is stable around 57 degrees. I will change the --march, and give that a try.

Thanks for all your help.

----------

## NeddySeagoon

passive,

80C is a top limit - hotter and you can expect problems but its not a hard limit.

The hot heatsink indicates that the heatsink is in good thermal contact with the CPU but  that the heat is not being conducted to the air, or the hot air in the case is unable to get out. The PSU fan and the rear case fan should both be moving hot air out of the case, so air flows in at the front bottom, diagonally across the motherboard, over the CPU and out at the rear.

----------

## tarpman

To take NeddySeagoon's idea one step further, check that vents at the front and back are clear of dust (and the CPU fan while you're at it).

----------

## eyoung100

Congrats on getting the compile started.  Sorry for the confusion with the temperatures.  I have a feeling the US will never be metric, oh well.  Now that the temperature stays constant while working you're where you need to be.  Each system that you build will have its own unique characteristics, one of which is temperature.  The reason you could not trust your gut on what the temperature should be is because you received this from your brother.  If the temperature stays constant at around 57C(135 F), it should never rise or fall more than 5 to 10 degrees in either direction.

----------

## passive

Ok, that worked out pretty well. I think the CPU has simply decided not to run at that speed anymore. It's fine with me, it's a server box.

At this point, I've had many packages fail compiling, is there an easy way to recompile everything?

Thanks again.

----------

## eyoung100

type

```

emerge profuse

profuse

```

Set all the USE flags you like, and then:

```

emerge --emptytree --newuse world

```

This should emerge anywhere from 300-450 packages.  If it fails, use:

```

emerge --resume

```

or if you prefer the emerge wrapper I showed you I believe its:

```

emerge profuse

profuse

```

Set all the USE flags you like, and then:

```

./emwrap.sh -weuNb

```

This wrapper is the best tool, as it rebuilds your toolchain each time GCC or GLIBC has an update.  This method is longest, because GCC is built twice.  The First time your old GCC builds the new GCC and the second time the new GCC builds GLIBC, GCC and all its associated packages again, which ensures your system compiler is always up to date.  The N or --newuse ensures that all the flags you set are used.

----------

