# Machine is freezing

## ScR4tCh

Hi Folks.

I am not able to find any solution for my current problem.

My new Desktop Machine (AMD Phenom II 955 4x, MSI 790GX-G58,GB RAM, ...) freezes in irregular intervals, sometimes it resets on it's own.

I am using gentoo-sources 2.6.30-r2, haven't tried vanilla yet.

By freezing I mean complete lock-ups, no console, no X not even sysrq. Also /var/log/messages and dmesg don't show kernel faults or similar severe logs.

I am relatively sure that it is no temperature issue. I checked the temperature after each freeze (resetting). Normally CPU and Sys temp need some time to cool down, but every thing seems to be within normal range (~45°C)

I've tried several kernel configurations as well as genkernel to eliminate possible configuration faults of mine, but nothing seems to work.

Even a bios update didn't change anything. All components are completely new. I don't think that it could be an PSU issue (due to the fact, that these crashes seem to happen in irregular intervals).Broken memory could be a reason, too, I'm just running memtest.

My question is, if anybody of you guys had similar problems with multicore amd systems (it is my first MC system).

Or if there are possible hints besides of gentoo docs and well-known (as procesor type,cflags,kernel options).

Thanks in advance!

----------

## judepereira

If it freezes X, just try ti use vesa and see it happens again. If not, then the problem would be with your video driver

----------

## ScR4tCh

Hi,

I already considered it, because i do have a nvidia card installed. But wouldn't just X stop working or wouldn't my kernel log some fault if only the vc causes the crashes ? As said, even sysrq doesn't show any effect nor anything other (no console, no ssh no nothing).

In the meantime i kept memtest running (4 passes so far) and there were no errors so far.

But thanks anyway,I'll give it a try.

----------

## pappy_mcfae

Is it a laptop or desktop? Do you have another OS installed? Does it work with that OS? Does your machine work with a different kernel version? Have you opened it up to see if the CPU heatsink might be clogged with dust, or whether or not the fan is running? If not, that would be a really good place to start.

Blessed be!

Pappy

----------

## ScR4tCh

It's a Desktop machine, no other OS,didn't try another kernel version yet (but vanilla latest will be my next choice to test with). I have opened it up, as said, all components are new I built it on my own , no dust, fan operational (I already eliminated temperature as cause).

Maybe it could be a chipset issue ? I didn't find much about running AMD 790gx under linux.

thx

----------

## krinn

 *ScR4tCh wrote:*   

> (I already eliminated temperature as cause).
> 
> 

 

How you did? just because some bios report 45° ?

You should (if you can, on today's computer it should be the case) underclock your cpu.

----------

## ScR4tCh

Well, yes and no  :Wink: .

I also used to cycle the speed down to 800Mhz (using powersave govenor), same effect. But I'll underclock it and test (after my other tests).

But you are right, bios might be lying about the temp ... sadly lm_sensors does not yet support k10 to validate it.

----------

## energyman76b

the internal diode is broken anyway - that is why all mainboards have a standard temp sensor. Install lm_sensors, run sensors-detect.

But seriously, sudden reboot - sounds like triple fault. Freezing is either irq handler dead or kernel panic of the worst kind.

These are usually symptoms of:

bad ram

bad psu

sometimes increasing ram voltage a bit can help (I have a ram stick unastable at 1.80V and rock solid at 1.85V for example).

Sometimes simple removing the ram sticks and putting them back helps. Increase ram voltage a bit. Problem persists? Try different ram, then different psu.

----------

## ScR4tCh

Hi, here's the current "state". Switched to kernel 2.6.29-r5 (genkernel) and (gentoo) stable nvidia drivers. No change, System freezed anyway.

I re-checked the memory manufacturer's commendation about voltage (between 1.7 and 1.8 Volts), and changed from auto to 1.71V.

The sys is up for 7.5 hours now at nearly the same workload then before, no faulty behaviour so far .... hope dies last  :Wink: .

Funnily enough, "free" shows much lesser caching then before (I did forget to mention that I had heavy memory consumption and quick rising "cached" value. I have no idea if this was a kernel issue or was mem related (my other systems : 2.6.29x gentoo, 2.6.2x kubuntu both amd64 as well) did not show such a behaviour, never ever.

As for temperature sensors, lm_sensors (sensors-detect) marks the CPU temp sensor as "to be implemented", so there is no way to totally eliminate overheating. I found a kernel patch activating the support, but at this time I'm not willing to risk more instability.

I'm curious if the machine will be up tomorrow morning or be freezed .... Thanks for all replies so far, I hope I can stop bugging you with this problem soon  :Wink: 

----------

## ScR4tCh

Shhhh.... It freezed again, ~2,5h after my post as it seems. Now I increased voltage to 1.76V (maximum) and give it a nother try. Next I'm going to check the PSU. Maybe just change it and have a look ....

----------

## energyman76b

which Voltage? CPU?

Don't touch that! Only raise memory voltage - and only a small amount!

----------

## aricart

The problem you are experiencing may indeed be a thermal issue. This goes double if the fans aren't being activated to cool things down for some reason. You may want to use those patches to gain thermal support, and then work from there.

----------

## ScR4tCh

Hell no, I'd never touch CPU Voltage. I'm not a total hardware noob  :Wink:  . The thing I did was to change the Memory Voltage from "Auto" to first 1.71V after the next crash (approx 10 hours after that) I raised it to the next possible level 1.76V (regarding the manufacturer, these modules are working between 1.7V and 1.8V, so I exhausted all possebilies). 1.76V however let to instability and I had a freeze after several minutes.

Now with 1.71V it is running again.

I also rechecked if all power connectors are sitting tightly ,pulled out the modules and placed them back in another order.

the System is running sice ~9 hours so far ... . I Also found a k10temp module and I am now able to see my CPU-temperature, its ok, nearly constant at 42°C, so I finally guess that there is no severe temperature problem ( The box itself is relatively cool inside).

----------

## cheater512

I'll throw my 2c in.  :Smile: 

My brand new Phenom X3 also freezes, but very rarely. Perhaps once a week?

Same symptoms. Nothing in the logs. I did run memtest and it came back clean.

With a Kubuntu install disc I was lucky to have a couple of hours. Not sure what was with that.

Temperatures are all around the 40 - 50C mark when idle.

They can start going to just over 50C when under load.

I also have a heater in the room sometimes (its Winter in Aus) but at ground level the air is still cool.

Its not a big deal for me. Just a quirk.

----------

## ScR4tCh

Ok, the machine keeps freezing, but I think that I've probably found the reason (after reading and reading).

It may be a nvidia problem, I got serveral Xid Erroros in the messages and found articles about an "old" bug and possible problems using nvidia drivers along with mutlicore machines.

The first thing I did was to turn composite off and the machine kept running for 24 hours. After playing nexuiz for a while, the system crashed again (after leaving the game).

So I'll try to stay away from opengl applications for a while to verify it. 

@cheater512:

Could it be the same problem with your machine ? Are you running a nvidia card with closed-source drivers ? Did you encounter any "Xid" messages after or while running opengl capable apps ?

----------

## cheater512

Nope that cant be the same for mine.

My box has a AMD chipset (ATI is everywhere in my lspci) with a ATI Radeon HD4830.

OpenGL works fine (abit slowly - open source drivers) with no errors.

Could be that your nVidia stuff is a symptom rather than the cause or your problem completely unrelated to mine.

----------

## ScR4tCh

Okay, ... so the search continues ... . So far it's running quite well ... . The next time it freezes I'm gonna pull out the nvidia board and try my luck with the onboard ati card.I'm really on my last nerve with this problem ... .

----------

## pappy_mcfae

Did you turn off the internal ATI card in the BIOS?

Blessed be!

Pappy

----------

## 360soso

The limitation of this method:

1. It is impossible for the manufacturers to add all the controllers you need to the database since there are too many of them, not to mention the new controllers keep coming out; users will find there are just limited controllers supported; and once the target controller is not included in the database, users can’t do nothing but give it up.

2. The controller emulator matches the controller by mainly the controller model, which may lead to a false match because even controller chips with the same model number may contain different contents, especially when they are not manufactured by the same factory at the same time.

from:http://www.xlycn.com

----------

## ScR4tCh

@pappy_mcfae

Well, that is the problem, there is no possibility to turn the card off, i just can set the "first" Graphics Adapter.

@360soso

I'm not really sure what you are talking about, sorry  :Wink: , Which database ?

In the meantime I tried my luck with manually setting memory timings to the values given by the manufacturer,  It seem to run stable ... again.

I did found some posts in other forums describing similar problems with 790gx chipsets and phenom II multicore processors. There was one solution to just use RAM bank 3 and 4 that worked for the user , but this makes no sense for me at all.

There are also some older news describing a lockup bug in AMD processors (but for older phenom cores).

The Xid Errors however seem to be a nvidia-only problem (also found and verified several bugs, for instance running SWT Applications with tray icon, which are "destroying" plasma and leading to Xid outputs in messages).

Uptime ~12h ... Again I'm very curios about how long it'll stay alive today.

Have a nice day

----------

## pappy_mcfae

 *ScR4tCh wrote:*   

> @pappy_mcfae
> 
> Well, that is the problem, there is no possibility to turn the card off, i just can set the "first" Graphics Adapter.

 

Yeah, that could well be the issue. I'm glad my BIOS allowed me to turn off the Intel GPU when I installed my nvidia. That eliminated any possibility of such issues. If the issues remain, try using the onboard video and see if that fixes things. If not, I'll go out on a limb and say you probably have mobo issues.

Blessed be!

Pappy

----------

