# [SOLVED] AMD cpu getting too hot

## heikole

When running the cpu in full load mode, it is regularly getting too hot (> 80°C | 176°F). The computer will stop working in the end, a hard reboot will be necessary. Interestingly, this is just happening when using Gentoo, though not with Kubuntu performing the same task.

Cool'n'quiet is enabled in the BIOS. Gentoo is running cpufrequtils at boot while Kubuntu seems to prefer powernowd by default. Both activate the on_demand cpu governor. However, the processor is getting much hotter on Gentoo than on Kubuntu in full load. The temperature is read from k8temp and thermal_zone.

```

berliner ~ # cat /proc/cpuinfo

processor       : 0

vendor_id       : AuthenticAMD

cpu family      : 15

model           : 75

model name      : AMD Athlon(tm) 64 X2 Dual Core Processor 5000+

stepping        : 2

cpu MHz         : 1000.000

cache size      : 512 KB

physical id     : 0

siblings        : 2

core id         : 0

cpu cores       : 2

apicid          : 0

initial apicid  : 0

fpu             : yes

fpu_exception   : yes

cpuid level     : 1

wp              : yes

flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow rep_good extd_apicid pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy

bogomips        : 2009.21

TLB size        : 1024 4K pages

clflush size    : 64

cache_alignment : 64

address sizes   : 40 bits physical, 48 bits virtual

power management: ts fid vid ttp tm stc

processor       : 1

vendor_id       : AuthenticAMD

cpu family      : 15

model           : 75

model name      : AMD Athlon(tm) 64 X2 Dual Core Processor 5000+

stepping        : 2

cpu MHz         : 1000.000

cache size      : 512 KB

physical id     : 0

siblings        : 2

core id         : 1

cpu cores       : 2

apicid          : 1

initial apicid  : 1

fpu             : yes

fpu_exception   : yes

cpuid level     : 1

wp              : yes

flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow rep_good extd_apicid pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy

bogomips        : 2009.21

TLB size        : 1024 4K pages

clflush size    : 64

cache_alignment : 64

address sizes   : 40 bits physical, 48 bits virtual

power management: ts fid vid ttp tm stc

```

```

berliner ~ # sensors

it8716-isa-0290

Adapter: ISA adapter

VCore:     +0.99 V  (min =  +0.00 V, max =  +4.08 V)

VDDR:      +3.20 V  (min =  +0.00 V, max =  +4.08 V)

+3.3V:     +0.00 V  (min =  +0.00 V, max =  +4.08 V)   ALARM

+5V:       +4.78 V  (min =  +0.00 V, max =  +6.85 V)

+12V:     +11.78 V  (min =  +0.00 V, max = +16.32 V)

in5:       +0.00 V  (min =  +0.00 V, max =  +4.08 V)   ALARM

in6:       +0.00 V  (min =  +0.00 V, max =  +4.08 V)   ALARM

5VSB:      +4.76 V  (min =  +0.00 V, max =  +6.85 V)

VBat:      +2.99 V

fan1:     2149 RPM  (min =    0 RPM)

fan2:        0 RPM  (min =    0 RPM)

fan3:        0 RPM  (min =    0 RPM)

temp1:       +54°C  (low  =    -1°C, high =  +127°C)   sensor = diode

temp2:       +49°C  (low  =    -1°C, high =  +127°C)   sensor = invalid

temp3:        -8°C  (low  =    -1°C, high =  +127°C)   sensor = invalid

vid:      +0.538 V

k8temp-pci-00c3

Adapter: PCI adapter

Core0 Temp:

             +45°C

Core1 Temp:

             +46°C

```

The values above are taken from idle load. Any ideas what can be done to get a stable full load performance?

TYIA

----------

## CiSC

Ensure that

Power management and ACPI options  ---> ACPI (Advanced Configuration and Power Interface) Support  ---> Thermal Zone [*]

I had a similar problem on my laptop due to this option disabled.

----------

## heikole

Have checked this setting, and seen that it is on:

```
CONFIG_ACPI_THERMAL=y

CONFIG_THERMAL=y

CONFIG_THERMAL_HWMON=y

```

I have this problem on a desktop, though, not on a laptop.

----------

## energyman76b

CPU idle PM support

ACPI Processor P-States driver

AMD Opteron/Athlon64 PowerNow!

ondemand' cpufreq policy governor

powersave' governor

performance' governor 

Default CPUFreq governor (ondemand)

is the stuff you should have enabled

----------

## heikole

```

CONFIG_CPU_IDLE=y

CONFIG_X86_ACPI_CPUFREQ=y

CONFIG_X86_POWERNOW_K8=y

```

Default CPUFreq governor is userspace, I'll check if ondemand as a default will make any difference.

----------

## energyman76b

ondemand - and make sure that cpufrequtils does not change that.

seriously - userspace is a very bad choice. The worst.

----------

## heikole

Changing the default governor from userspace to ondemand in the kernel config doesn't help anything. I think that's because cpufreq set the governor at boot anyway to ondemand:

```

# Options when starting cpufreq (given to the `cpufreq-set` program)

START_OPTS="--governor ondemand"

```

----------

## Cyker

You're getting the problem when the CPU is under full load?

Then you either need to set the cpufreq to powersave or buy a better heatsink+fan  :Smile: 

----------

## energyman76b

since he claims that it does not happen with ubuntu we should be able to rule out hardware. Except ubuntu clocks down the cpu if a certain temp is reached.

Hmmm...

but before going down that road, I would compare kernel settings  :Wink: 

----------

## depontius

Something happened to one of my systems recently...  The temperature was rising much more than I remembered, and even the idle temperature was rather high.  In fact, I have to remove the side panel to emerge major packages or compile a kernel.

One day I had the system open for cleaning or hardware juggling, (forget which) and my finger brushed against the heatsink.  It moved - easily.  I bought the parts for this system in the winter, so I bought the motherboard/CPU/RAM pre-assembled and tested, ($9.95 extra) so I wouldn't have to worry about Vermont winter static and ESD damage.

My current guess is that their heat sink grease gave up the ghost, given a few years.  This system doesn't have a convenient lever - it's get in there with a screwdriver to pry a clamp.  It's also an AthlonXP with the heatsink touching the bare die backside.  So it's most likely a motherboard removal in order to safely clean and regrease the heatsink without cracking a corner of the CPU chip.

I haven't gotten around to it, yet.  For now I remove the side cover for heavy-duty compiling, etc.  One of these days, but well before winter static sets in.

----------

## SteveBallmersChair

 *heikole wrote:*   

> Changing the default governor from userspace to ondemand in the kernel config doesn't help anything. I think that's because cpufreq set the governor at boot anyway to ondemand:
> 
> ```
> 
> # Options when starting cpufreq (given to the `cpufreq-set` program)
> ...

 

You are correct. Your /proc/cpuinfo says your CPU was running at 1000 MHz. 1000 MHz is the idle frequency for all A64 X2s and your X2 5000+ has a full-load speed of 2.60 GHz. Generally you only see the CPU at its idle speed if you have the ondemand or powersave governors active. Userspace tends to just leave the CPU running at full speed as CPUs boot at full speed and then only throttle down when specifically told to by a CPU frequency scaling governor.

You either have a mechanical problem that is not allowing for good contact between your CPU and your heatsink or you have absolutely horrible case ventilation if the sensors are telling you the truth. I would be tempted to say so as on overheating CPU could lock up the system and force a reboot. You appear to be using the stock A64 X2 heatsink judging by the fan RPM seen in your sensors output. The X2 5000+ is not a particularly hot-running CPU and a stock heatsink with the fan running at 2150 rpm ought to keep the CPU somewhere around 30 C at idle. Yours is ~50 C, which roughly what you'd expect with the CPU running at full speed with full loads on both cores. What temps do the sensors output in Kubuntu show? I wonder if the reason your system behaves well in Kubuntu is that you are not doing very much on it that stresses the CPU heavily and it sits there at ~50 C, while you are compiling code on Gentoo and pushing the CPU a lot harder than you are in Kubuntu. If the temps really are about the same at idle in both OSes, I would say the most likely culprit is a heatsink that is not properly (fully) seated on the CPU or issues with the thermal compound (too little, too much, poor quality, or it has dried up). Those are very common problems people run into with heatsinks. 

@CiSC

Laptops will overheat if the thermal module is not loaded as the thermal module is needed to vary the heatsink fan speed. Laptops generally start with the fan off and will turn it on and turn it up as the CPU heats up the cold heatsink. Most desktops have the heatsink fan spinning at a constant RPM all of the time and don't actually need the thermal module to be loaded to stay at an appropriate temperature. Notable exceptions are some OEM machines with ducted CPU cooling such as the Dell Optiplex SFF machines I use at work. Those do vary the fan speed with CPU temp and need the thermal module loaded. 

energyman76b

AMD CPUs do not decrease their clock speed when they overheat unless somebody modified the OS's ACPI and forced the CPU governor to powersave when the thermal_zone temperature exceeds a certain level. The default behavior for overheating AMD CPUs is to force a critical shutdown when they reach their critical (S5) temperature. Intel P4 and later CPUs are the ones that reduce their clock speed by performing duty cycles (search for #PROCHOT in the Intel literature) but this is done at the CPU level rather than the ACPI level.

----------

## energyman76b

who said automatic?

(in P4 it is actually the mainboard doing the downclocking/idle states. That is why deactivating SMI on intel mobos is a certain way to fry an intel cpu).

But, what can be done from userspace is downclocking a cpu with a simple script that reads sensor values - above a threshold, powersave is set. For an example.

Whatever - OP claimed no problems with ubuntu, problems with gentoo, which doesn't sound like hardware problems.

A cpu overheatign with the stock coolers on the other hand does sound like 'badly assembled'.

----------

## Cyker

Athlon64 should not go over 50-55C even under full load with even half-decent cooling (Well, unless you live in a very hot country!); I think there is something wrong with the cooling.

I didn't know (K)Ubuntu has thermal limiting (I want that for my Gentoo!  :Very Happy: ), but that is likely why the CPU isn't over heating in there but is in Gentoo.

Even your idle temps are unusually high for an Athlon64; I have a particularly poor cooler on mine (Some Scythe POS) and even in high summer (Admittedly this is the UK so we're talking 30C tops) the CPU never idles at more than 40C, but drops to 30 as we head toward winter.

Full load will take it up to 48, slowly rising to 50C depending on how long its running at full load for.

Edit: I just realized my numbers may not be typical as I am running my own hacked powernow-k8.c in the kernel for undervolting, but nonetheless your CPU is still running rather on the high side...

Out of curiosity, what are your idle and full-load temps for both gentoo and (k)ubuntu?

----------

## energyman76b

I am not saying that kubuntu does that - just that it is possible to set up a system that way  :Wink: 

maybe it is just some kernel option. Without comparing config's we well never now.

About heat: well, I have scythe products. Scythe Shurkien, two scythe 12cm fans and a scythe fan controler and I am a very happy costumer.

PII 955:

PU Fan:     215 RPM  (min =    0 RPM, div = 32)

Aux Fan:       0 RPM  (min =  703 RPM, div = 128)  ALARM

fan5:          0 RPM  (min =    0 RPM, div = 128)

Sys Temp:    +36.0°C  (high =  -5.0°C, hyst = +125.0°C)  sensor = thermistor

CPU Temp:    +51.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor

also - temps are worthless without the 'surround' temperature in the case. 

Under load this goes up to 58°C and 1500-22000 rpm. Yesterday I tried burnK7 - reaching 67°C top.

And that is well inside the thermal 'envelope' with 65°C Tcase and 70°C Tctl. 

But saying '50°C is too hot' is just wrong. My system is pretty quiet. So it is warm. The critical temps are dependend on the cpu used. I could push my X2 6000 to 90°C without problems ....

----------

## depontius

 *Cyker wrote:*   

> Athlon64 should not go over 50-55C even under full load with even half-decent cooling (Well, unless you live in a very hot country!); I think there is something wrong with the cooling.

 

This is what I was trying to express - it may be as simple as remove, clean, regrease, and reattach the heat sink.  I believe Athlon64 heat sinks are fully indexed, so the "nudge test" won't work like it did (accidentally) for my AthlonXP.

Sometimes it's just a simple physical problem.

----------

## heikole

Thank you, guys.

Sort of solved it *sigh* by buying two chassis fans. Kubuntu's default kernel seems to stress the passively cooled chipset of my mainboard less than my Gentoo kernel config does. I must admit, though, that it was very hot in Germany these days, climate crisis is nigh  :Wink:  Otherwise Gentoo would have worked perfectly, of course.

----------

## Cyker

Apologies for reopening a solved thread, but I have discovered another possibility.

There seems to be a whole bunch of bugs with Asus motherboards and lm_sensors.

I recently swapped my server's A8N32-SLi motherboard for an A8N (The chipset was running at 45C which IMHO is far too high given what I want out of it!)

To my intense frustration, the board and CPU were actually running even hotter! Or so I thought... It turns out lm_sensors was misreporting the temp by anything up to 10C over what the real temp was; I confirmed this with k8temp, who's sensors were reporting much lower temperatures than the it87 was, and because I could hold my hand on the main chipset and CPU heatsinks without being burned.

The BIOS also reports similarly lower temperatures, but I'm not sure how reliable that is since the change in load may have allowed the system to cool down.

I probably should have twigged there was a problem given that it was saying "sensor = invalid" next to the temp reading instead of thermistor or diode...

And just to make things extra peachy, lm_sensors doesn't work at all with kernel 2.6.31 with many Asus motherboards (e.g. this one  :Evil or Very Mad:  ); You have to use lm_sensors v3+, but for some reason this is hardmasked in Portage! :S

----------

## energyman76b

it is hardmasked because of userspace issues. And k8temp - well on some cpu's it reports good temps on others complete garbage. For onboard sensors - it is well known that the readings have to be sanitized first for a lot of boards. If you are bored look into /etc/sensors.conf and be shocked...

----------

## Cyker

Heh, "userspace issues"

Translation - A couple of programs can't read the sensor data anymore  :Razz: 

That part just requires (usually minor) patches to the affected programs, and it does seem like only a few. Heck, even ksensors works and that hasn't been updated in donkeys years!

We won't know exactly which ones until more people start using it, but when 2.6.31 goes stable, the point will be moot as 2.6.31 breaks lm_sensors 2 on a significant number of systems anyway  :Sad: 

As for the sanitized sensor data, that is very common to put the raw data through a conversion algorithm for voltages and stuff, but a lot less common for temps. Usually setting it for diode and thermistor is the only settings, which is why the sensors on this Asus caught me out!

Supposedly they want us to access the data via ACPI instead of directly, but that needs lm_sensors 3  :Razz: 

Another odd thing is the output of k8temp still hasn't been fixed; It outputs its temps on the line below instead of in-line, which is why it breaks some things like ksensors.

----------

## doublehp

54 and 49 deg C are way too high. In the best case, one is the CPU temp; thus, the other one, HAS TO BE the system temp, understand: the temperature inside the tower. This is WAY TOO HIGH.

My temperatures NOW are 17, 35 and 46.

Inside tower should not go more than +10° than the room temp; ideally around 4°. The CPU should not go over +20. Otherwise, you have a ventilation problem. To save money, sellers usually put too small heat sinks, and too small fans. always take the biggets sink, and put as amny fans as you can (respecting the rules of course; there are many rules !!! ). Take care about the CPU sink: cheap MB are not designed to support heavy heat sink.

Check the HDD are in the way of some wind (either in front of a hole, or better; in front of a fan).

If you have too many PCI cards, or ribbons, internal wind can be bad: check the wind can blow properly inside, and leach all components and cards. In some cases, I had to add a fan in the middle of the tower, to force hot air from the CPU to go away (too many ribbons around the CPU: CPU fan just made the air turn in circle around the CPU, and this air got more and more hot).

Once, I had to put a spacer between the MB and the chassis to help wind BEHIND the chipset and the CPU (CPU produce hot air on both sides: hot air from the back went gently on the chipset above, and made it freeze). I just added a 1cm bit of plastic.

NEVER send the hot CPU air to the RAM.

Listen EVERY DAY for the PSU fan, and check it's turning. The CPU has temp protections; the PSU does not: when the PSU fan is dead, some component will overheat, and the PSU is likely to send over-voltage to the MB, and burn many things.

Monitor all your temps all the time. Gkrellm is a good app for this. Do not forget the HDD temps (with SMART). Configure Gkrellm with warnings and alarms. Configure your production server to email or SMS you every day a report about disk occupation and temperature.

As far as you can, also try to be informed about fan speed (too few MB support this feature: it can warn about dead fan BEFORE the components get hot). Some PSU also support this feature; not many.

----------

## energyman76b

 *doublehp wrote:*   

> 54 and 49 deg C are way too high. In the best case, one is the CPU temp; thus, the other one, HAS TO BE the system temp, understand: the temperature inside the tower. This is WAY TOO HIGH.
> 
> My temperatures NOW are 17, 35 and 46.
> 
> Inside tower should not go more than +10° than the room temp; ideally around 4°. The CPU should not go over +20. Otherwise, you have a ventilation problem. 
> ...

 

I call bullshit on that. Go to amd's or intel's side and have a look at the thermal design specifications.

----------

## doublehp

My room temp is 23, my CPU is 37. And I tend to never let it go above 45. Other people can play with limits of specification; none of my concern. I am electronic engineer, and I know that a component should never be used "near the edge".

Thermal specification (I did not read them) are likely to say that the CPU can stand up to 85° (AMD has limit at 92°C, and Intel between 95°C and 98°C); these temps are given for the extreme max, that is, for industrial test conditions: a room temperature of 65°C !!! what gives exactly what I said before: a delta between the room and the core of 20°K.

Specification say many things; but technical specs never describe clearly the exact condition of the test. The conditions of the test are describes in the books telling about the industrial world: a ready to use product (tower of computer ready and closed with all components) must be able to work for 1h at 55° for domestic use, or 65° for industrial market. Of course, If you accept your CPU to be at 85°C when your room temp is 22°C by winter, do not be surprised your box will freese when sun shines => 40°C.

----------

## energyman76b

a processor is not a cheap cap. A processor really does not care if it runs at 45°C or 65°C.

really.

And 'twenty above ambient or your cooling sucks' is just wrong. Wrong. WRONG.

There is much more to consider then plain temperatures. 

60°C with 22°C room and 26°C case is just fine - if you like it QUIET. And none of the temps is critically hight.

30°C case is fine

50°C case - that is wrong. And even if your CPU is at 60°C while your case is at 50°C - it is wrong. Failure, your cooling sucks.

Rule of thumb: keept Tcase low and the rest falls into its place. But 'CPU must not be hotter than 20°C above Tcase' that is wrong.

----------

## Cyker

CPUs are rated for much higher temperatures, but a high deltaT isn't good for them, especially if they are subject to it every day. They ain't built on ceramics any more!

But generally they're okay up to the low 60's.

Generally tho', you expect up to high-40s in hot countries down to low-30's for cold countries on a standard coolers.

AMD CPU's especially should run very cool when idle with CnQ/PowerNow enabled.

Airflow is the key; My server is built to be silent so maintaining decent airflow is tricky, but even so the CPU is in the low 30's even in summer. The chipset is a lot hotter but that's just because Asus suck  :Evil or Very Mad: 

My main beef is getting the damned temp sensors on this new 'board to read accurately.

I'm seriously considering sticking the A8N32 back in and setting this one on fire!  :Evil or Very Mad: 

(To be fair, I had loads of initial problems with that 'board too; It's just a bitch having to go through it all again. Oh how I hate the modern Asus!!)

----------

## energyman76b

and not being on a ceramic substrate is even better for them. Also the few temp cycles don't kill them. Or laptops would be mass dying all the time.

Oh wait, they do - yes, because nvidia fucked up, not because of the CPUs.

----------

## musv

Only to add an additional comment: 

I have the same problem with my Athlon X2 6000. In normal mode it's running with 1000 Mhz and quite cool. gkrellm tells me something of 40°C. When compiling it rises up to 120°C. But I guess, that temperatures aren't really correct. Also I had several times the problem, that the computer started a reboot when it was getting to hot. 

When I opened the case I found a really simple cooler with a big fan. Nevertheless that thing seemed to me the real reason for the temperature problem. Until now I'm cleaning that fan every 2 months. So it's possible to compile the updates without thermal shock.  :Smile:  Maybe the most cheaper computer won't have a suitable cooling system. I don't know.

----------

## doublehp

The temperature given by Gkrellm MUST be realistic; if value does not seem ok:

- check the equations in sensord (and the units C/F) => software problem

- check in the bios that the temp is the same as a temp sensor placed at the right place => if values are not good, then the sensors on the MB are borked

- check your system is properly cooled, and that heatsinks are place correctly => computer assembled improperly

120 seems an equation problem; is the chip was really at 120, it should have burnt (CPU should burn between 92 and 108° depending on models; usually 95 for AMD and 102 for Intel).

----------

## musv

Today I tried to compile qt-webkit. As the result the CPU finally burned through.  :Sad: 

I hate that cheap cooling stuff.

----------

## F1r31c3r

 *musv wrote:*   

> Today I tried to compile qt-webkit. As the result the CPU finally burned through. 
> 
> I hate that cheap cooling stuff.

 

Ouch liquid nitro next time lol akasa fans are cheap enough now a days but silicon paste has a tendency to dry out and insulate more than conduct heat worth while renewing it every year or so

----------

