# Hard freeze and won't boot for several minues afterwords

## Jenden

Given the lack of details I have to give I don't hold out much hope that anyone can help, but hopefully someone can at least point me in the right direction for how to troubleshoot the issue...

here's the behavior I'm seeing:

When running a certain application in Wine (Mass Effect) my system will occasionally experience a hard freeze (note it only happens when running this application).  Generally first the game will crash, then I've got about 30-60 seconds in at the desktop before X stops responding, at which point I'm in a hard freeze.  Trying to switch to a virtual terminal with ctrl+alt+F1 just hangs the keyboard up (before I try that the numlock light will go on and off if the button is pressed, but afterwords the light stops responding).  I've got the alt-sysrq hotkey set up and have tried alt+sysrq+r to get my keyboard back and switch over to a vterm, but no luck.  I've also tried to do alt+sysrq+c (I've got kexec compiled in) to at least get a crash report, but that fails as well.  alt+sysrq+b does succeed in rebooting the system.  One thing to note is that one the few times I've been able to get off a couple console commands on an open xterm before X freezes on me it looks like the windows app ('Mass Effect.exe') is still running but is stuck in uninterpretable sleep.  I have no idea what its waiting on though.

However, once the system reboots it hangs before POST (after it has identified the processor but before it checks RAM/Disks).  So far I've been able to get it working again by shutting off power (hard switch on the power supply) and letting it sit for 5-10 minutes then trying again.

I have no idea what's going on really... If I were to take a guess I'd say it sounds like a bug in wine is exposing a bug in the kernel which is exposing a bug in the BIOS/hardware, but I'm a bit out of my league here.

I've tried running memtest and fsck, both came up clean.  I also tried removing each of my two memory sticks individually (this was while the system was in a non-booting state), but that seemed to have no effect.

Anyone have any suggestions for how to troubleshoot this thing?

System specs:

Motherboard: ASUS M2N-SLI Deluxe

Processor: AMD athlon 64 X2 6400+

2GB RAM (2x1GB)

Kernel: gentoo-sources-2.6.29-r1 (same issue on 2.6.29)

I'd give more details but my system is currently in the unbootable phase, which seems to be taking longer and longer.

----------

## pappy_mcfae

Sounds to me like it might be time to lift the cover and hit the heat sink(s) with compressed air. Blow out the power supply while you're at it.

Blessed be!

Pappy

----------

## jmartos

It definetely sounds like you have a cooling problem with your system. I would try what the pappy recommended, but also check your other system fans and make sure your box is getting plenty of fresh air. WHen you do manage to get get past the POST, go in and check your idle processor and system tems. You also want to check you fan speeds. post the temp and speeds here and we can check if they are within range or are they marginally high at idle and it only takes a high system load to take it over the top.

----------

## Jenden

here's the output from lm_sensors:

it8716-isa-0290

Adapter: ISA adapter

VCore:     +1.01 V  (min =  +0.00 V, max =  +4.08 V)   

VDDR:      +3.23 V  (min =  +0.00 V, max =  +4.08 V)   

+3.3V:     +0.00 V  (min =  +0.00 V, max =  +4.08 V)   ALARM

+5V:       +4.78 V  (min =  +0.00 V, max =  +6.85 V)   

+12V:     +11.33 V  (min =  +0.00 V, max = +16.32 V)   

in5:       +0.00 V  (min =  +0.00 V, max =  +4.08 V)   ALARM

in6:       +0.00 V  (min =  +0.00 V, max =  +4.08 V)   ALARM

5VSB:      +4.78 V  (min =  +0.00 V, max =  +6.85 V)   

VBat:      +2.98 V

fan1:     2109 RPM  (min =    0 RPM)                   

fan2:     1121 RPM  (min =    0 RPM)                   

fan3:        0 RPM  (min =   10 RPM)                   ALARM

temp1:       +38°C  (low  =    -1°C, high =  +127°C)   sensor = diode   

temp2:       +43°C  (low  =    -1°C, high =  +127°C)   sensor = invalid   

temp3:        -5°C  (low  =    -1°C, high =  +127°C)   sensor = invalid   

vid:      +0.538 V

k8temp-pci-00c3

Adapter: PCI adapter

Core0 Temp:

             +39°C

Core1 Temp:

             +39°C

All the fans seem to be working fine.  I didn't suspect heat at first since its only triggered by the one game (other games or processor intensive activities don't experience the same issues), but now that you mention it the pattern does seem to fit.  Especially given that the air temperature here has been hitting record highs the past few days.

I'll pick up some air and clean things out to see if that helps.  I'll also try to get the temperature stats from right before a freeze.

----------

## pappy_mcfae

Yes, getting the temp before the freeze is a must. I'm pretty sure you'll find that program works the CPU and the GPU almost to death. Also, check the temp under your BIOS, just in case there's a difference between what the sensors are showing, and what the temp actually is.

Blessed be!

Pappy

----------

## NeddySeagoon

Jenden,

I don't believe, your sensors output

VCore: +1.01 V seems very low

VDDR: +3.23 V seems high

+3.3V: +0.00 V is not possible with an ATX PSU

The other voltages are possible.

This makes me thing you have not set up sensors.conf for your motherboard, which calls the reported temperatures into question.

I'm with Pappy on the heat issue, although I would not use compressed air. Dry air is a good insulator and applying it to your PC may cause static damage. A stiff brush, to dislodge the grot is far safer.

----------

## Jenden

I hadn't had the sensors up and running previously, so I just ran the sensors-detect script.  Everything seemed to run fine...  I'll check the temperature/voltage readings in the BIOS and see what they say.

Dan

----------

## pappy_mcfae

 *NeddySeagoon wrote:*   

> Dry air is a good insulator and applying it to your PC may cause static damage. A stiff brush, to dislodge the grot is far safer.

 

Actually, the action of a brush will generate more static than moving air, unless the brush is made of ESD safe plastic or metal. All the computer shops where I worked had a compressor somewhere. When you get a machine that looks like it spent far too much time with Gomez and Morticia Addams, you have to do something.  :Smile: 

Blessed be!

Pappy

----------

## eccerr0r

Sometimes I wonder if I'm going to accidentally scrape (a SMT pin), move (a DIP switch), or dislodge (a socketed device) something if I use a stiff brush to clean electronics.  I'll continue to use compressed air to clean dust; the biggest issue I have with compressed air is that it tends to spin fans, sometimes faster than what they were designed for...

(I use my car tire/air tool compressor to clean dust.  My biggest worry is transporting the electronic devices such that it's in range of the air hose, and /that/ action is more prone to static damage...  I try not to use lung-compressed air though...spittle can do a number on electronic parts.)

----------

## NeddySeagoon

Jenden,

sensors-dectect found your sensors ok but it cannot know which motherboard you have and what scale factors are applied in the hardware.

You have to poke about in sensors.conf to set up the sensors it found for your motherboard.

----------

## Jenden

I went and grabbed the sensors.conf file specifcally for my board off of lm-sensors.org wiki.  Seems to give similar results (pasted below).  When I get home I'll try rebooting and see what the BIOS reports.

 *Quote:*   

> it8716-isa-0290
> 
> Adapter: ISA adapter
> 
> VCore:     +1.06 V  (min =  +0.00 V, max =  +4.08 V)   
> ...

 

----------

## pigeon768

Note the conspicuous absence of video card temps in your lm_sensors. Given that the lockup happens during gaming, and not, for instance, compiling (which is more CPU intensive than games) I would suspect the problem is with your video card overheating, not your CPU.

Run your system with the case open and visually inspect your fans. If your GPU fan isn't spinning, I guarantee you that's your problem.

----------

## Jenden

A brief update.

I restarted the system and it immediately had problems booting (same issue as before, freeze after detecting processor but before checking memory/disks).  The system had been idling for a number of hours before that, so there shouldn't have been excessive heat generation.

After I managed to get it to start back up I let it idle at the bios screen overnight and checked the temperature this morning, showed a CPU temp of 66 and m/b temp of 49 (with CPU and case fans spinning at 3096 and 1128 respectively).

I guess I shouldn't be too surprised if it is heat issues... the current cooling system is a result of a mistake and has very sub-optimal fan placement... I just didn't suspect it since I haven't changed anything and haven't had heat problems with the system before.

The video card sensors don't get picked up by lm_sensors, but I can get them from the nvidia-settings utility.  It idles at about 67, though from what I read that tends to be pretty average for the card and I shouldn't worry until it breaks 90.

----------

## NeddySeagoon

Jenden,

 *Quote:*   

> ... showed a CPU temp of 66 and m/b temp of 49 ...

 You dont say if the numbers are C or F.

If thats C its far too high for idle, if its F you have a low ambient temperature.

----------

## Jenden

Sorry, they were in C... Yea, that is a really high idle temp.

----------

