# Random Total Freeze

## niskel

I have had this problem for a while now. Often, but not always, usually when there is heavy I/O on an HD, my system will completely freeze. No input is accepted, no ssh connections are accepted, no nothing. I put panic=5 as a kernel parameter but the machine never reboots after it freezes like it is supposed to. I have tried two different hard drives and I have tried both ext3 and ReiserFS on both hard drives. I have not always had this problem, it is new within the last few months. It is terribly frustrating, especially when I am trying to use the machine remotely. Does anyone have any suggestions? My machine specs are in my sig if that helps.

----------

## papal_authority

I have a machine like this too that I'm working on. I've done a memtest (no errors), checked the logs (nothing), disabled all powersaving features (in the BIOS, kernel and services)  and now I've replaced every single piece of hardware except the HDs (although of course I've fscked them). I'm replacing the HDs this week and I hope that fixes it or else I'm just at a loss...

----------

## niskel

Actually, this was happening in Knoppix as well when I was fscking. This indicates to me that it must be some hardware issue or buggy drivers (I don't remember ever having problems back when I had a Windows partition). What HDs are you using that are causing you to have the same problems? What motherboards or ATA/SATA controllers have you tried?

----------

## papal_authority

 *niskel wrote:*   

> Actually, this was happening in Knoppix as well when I was fscking. This indicates to me that it must be some hardware issue or buggy drivers (I don't remember ever having problems back when I had a Windows partition). What HDs are you using that are causing you to have the same problems? What motherboards or ATA/SATA controllers have you tried?

 

The current configuration is:

MoBo: ECS741GX-M

hda: WDC WD400JB-00ETA0 (40gb)

hdb: WDC WD1200JB-00EVA0 (120gb)

Anything like yours?

----------

## niskel

Well, the hard drive I am using is a Western Digital (Caviar SE16). But the similarities end there.  :Confused:  The rest of my setup is in my sig.

----------

## brot

Try to load failsafe defaults in the bios, and try that again. Then, dont set any mem:fsb ratio, and if nothing helps update your bios.

Memtest has nothing to say when the timings for the ram are too tight. I once had 4 runs in memtest stable, but my linux didnt boot with the timings.

----------

## niskel

Just about everything in the bios is default short of setting the boot order. I have never touched RAM timings. Though I really should try memtest because I never have. Can someone point me to some instructions? I know it involves adding a Grub entry. Also, I am up for updating the BIOS except for how do you do it on a machine without Windows or a floppy drive (Asus A8V Deluxe).

----------

## jxn

this would probably have no effect, but have y'all tried turning dma on/off?

----------

## niskel

Mine is an SATA drive, DMA doesn't apply the same way. hdparm doesnt show anything in regards to DMA on my drives.

----------

## jxn

 *niskel wrote:*   

> Mine is an SATA drive, DMA doesn't apply the same way. hdparm doesnt show anything in regards to DMA on my drives.

 

ah, didn't know how SATAs worked as I've just got olde skool ATAs.  The only reason I asked is because I was playing around just now and enabled dma and everything immediately froze up, but it turned out to be a fluke, because I tried it again a few more times and nothing's frozen up.

----------

## papal_authority

 *brot wrote:*   

> Try to load failsafe defaults in the bios, and try that again. Then, dont set any mem:fsb ratio, and if nothing helps update your bios.

 

Been there, done that. Didn't solve my problem  :Sad: 

----------

## richard.scott

 *niskel wrote:*   

> I have had this problem for a while now. Often, but not always, usually when there is heavy I/O on an HD, my system will completely freeze. No input is accepted, no ssh connections are accepted, no nothing. I put panic=5 as a kernel parameter but the machine never reboots after it freezes like it is supposed to. I have tried two different hard drives and I have tried both ext3 and ReiserFS on both hard drives. I have not always had this problem, it is new within the last few months. It is terribly frustrating, especially when I am trying to use the machine remotely. Does anyone have any suggestions? My machine specs are in my sig if that helps.

 

I had the same problem with a VIA Mini-ITX board. 

I fixed my problem by removing SMP from the kernel as that seemed to be causing it to hang under load.   :Shocked: 

----------

## lbrtuk

Do you have a null modem (serial) cable and another machine with RS232?

If there's an oops or kernel panic, you'll probably be able to catch it on a serial console. See /usr/src/linux/Documentation/serial-console.txt for more information.

----------

## jxn

speaking of kernels, I'm using gentoo-sources-2.6.15-r1.  what are the rest of us on/does it make any difference if I might upgrade/downgrade?

----------

## bexamous2

"Though I really should try memtest because I never have. Can someone point me to some instructions? "

You already said you had a knoppix cd, just stick it in and when it gives the option to use different kernel crap, just type 'memtest' and hit enter and it'll go...   no settings just let it run for awhile...  usually I let it go overnight one day.  Obvious hardware problems show up pretty fast but I've had errors occur after 4 hours of perfect runs.  Generally amd64 systems are very picky with their ram...  i have cosair ram and it says its cas2 but if you read the fine print it says it only supports cas2.5 in amd64 systems.

----------

## jxn

just to let y'all know, I've replaced my HD with a different one, copying everything over to the new one, and I'm having the same exact problem as before, it seems.

----------

## dan2003

This sounds like a hardware problem.. i have had no issues for months until the other day i gave doom3 a shot... after a few mins it locked solid.. subsequent attempts revealed the same effect, i was all to quick to blame it on nvidias driver but it turned out my machie was overclocked 100 MHz and was the cause of the problem! i had forgot all about the fact it was and only remembered cos i happened to notice in ksensors it was running at 1900 instead of the stock 1800MHz of an AMD64 3000+.

I have tried running memetest for hours and hours, almost 24 with the thing at 1900MHz and it shows no probelms whatsoever.. Tho trying to go much higher than 1900 reveals problems.

Memtest is not the definitive test.

Now your machine may not be overclocked, but this does not necessarily mean its not a hardware problem. Coolling problems, a weak or failing PSU and the dreaded  Capacitor Plague will all cause similar effects. I have personally suffered from the cap problem in a PSU. the first i new about it was the colors on my display! i started getting random flickering red dots! especially noticeable when watching films. After this i had a Harddrive fail and also started getting lockups. Investigation revealed the caps had bulged and one of them split at the top in the power supply.. I replaced them and it was back to normal[/url]

----------

