# Computer repeatedly crashes.

## johntramp

Hi,  I am not sure what is causing this, but every so often my computer will crash to the point where nothing will respond and ssh is not open.  It is happening fairly often but I can see no pattern to it. I can compile programs for hours with no problem and then it will crash when using the internet with no other processes running.  It has happend from a knoppix live cd aswell as my gentoo install so that means it is a hardware fault, right? I have run memtest86 for 13 hours with no errors,  is this test 100% reliable?  Are there any other similar tests or anything else I can try to figure out the problem?

Thanks

-John

----------

## nlightn

What kind of hardware are you running?  What kind of "crash" are you experiencing?  Hard locks?  Random power-offs?  I'd be particularly suspicious of the power supply and/or hard disk.

----------

## solarium_rider

I too have been experience this lately.  Sometimes it'll happen while idle, sometimes while compiling, sometimes while browsing.  I can't really seem to figure it out.  I've had a few crashes that actually spit out errors, and the error was related to an "Interrupt," so i'm think it's kernel related.  I didn't write it down, so I forget the exact error message. 

 Last night i went to recompile firefox and checked it about 15 minutes later and the video was completely corrupted and everything was locked up.  Typically when it locks the cpu also shoots to 100% which isn't to weird I suppose for a lock.  If I have sound playing sometimes it will keep playing for a short bit, but eventually stop.  I'm not sure how long though, I usually fall asleep streaming music and someties i'll wake up and it'll be dead.

I think we should compile a list of people having the same issues and their relevent hardware/software configurations and see if we can find a pattern. 

Hardware:

cpu - x86 athlon xp

video - nvidia

sound - audigy 

input devices - gravis joystick, usb mouse

other - usb printer (thought it was crashing before this was connected)

Software:

kernel - 2.6.8.1-ck8 (w/ power management enabled)

X - xorg-x11 6.7.0-r2

video - nvidia drivers

wm - fluxbox

browser - firefox

configs:

CFLAGS="-march=athlon-xp -m3dnow -msse -mfpmath=sse -mmmx -Os 

-pipe -fforce-addr -fomit-frame-pointer -frerun-cse-after-loop

-frerun-loop-opt -maccumulate-outgoing-args -ffast-math"

USE="3dnow mmx -nls sse -kde -gnome tiff -arts alsa mozilla dvd dvdr divx4linux

     joystick xvid directfb fbcon -cups -wmf -tcpd -esd cups ppds usb"

I also have the "cool" bit set on the kt333 chipset to allow the HALT cpu cmd when idling.

I'm quite suspect of the kernel/kernel configuration, seems 2.6.8 and above started crashing alot.

----------

## nxsty

Update your kernel to ck9! Ck8 has a bug that can cause crashes.

----------

## johntramp

I have had some problems in the past with this ram,  but it started working again so I left it at that.  My setup is as as follows...

Linux odysseus 2.6.9-rc1-love1 #2 SMP Fri Oct 1 15:06:34 NZST 2004 i686 AMD Athlon(tm) XP 2600+ AuthenticAMD GNU/Linux

Albatron KX600s Pro motherboard

athlonxp 2600+

nvidia gfx

onboard via sound

x-org

fluxbox

firefox

...so quite similar to yours solarium_rider

The thing is tho it has been happend when running off knoppix aswell so I think it has nothing to do with my software.

I am going to try installing something like debian sid to see if the problem is still there.

----------

## jschellhaass

What does cat /proc/interrupts show (any conflicts with the video card)?

You may want to try with acpi=off.

jeff

----------

## Mben

do you have any dust in your case? overheats get me every few months if i dont blow out my case (i have cats  :Smile:  )

good luck

----------

## firephoto

I just read about the ck8 problem on cons mailing list. Seems that was what hit me at random sometimes. I also had this happen with ck7 once or twice but nothing in the logs, just a lockup. It's related to having preempt turned on in the kernel I guess. Is preemempt good or bad, my system seems "quicker" with preempt on.

I ended up going back to plain 2.6.8.1 without reiser4 for now since the newer rc's have nvidia issues.  :Sad: 

----------

## johntramp

 *jschellhaass wrote:*   

> What does cat /proc/interrupts show (any conflicts with the video card)?
> 
> You may want to try with acpi=off.
> 
> jeff

  *Quote:*   

> [root /home/john]$ cat /proc/interrupts                                                                         
> 
>            CPU0       
> 
>   0:     224314    IO-APIC-edge  timer
> ...

 how would I go about trying with acpi=off ?

What does that mean / do?

----------

## johntramp

 *Mben wrote:*   

> do you have any dust in your case? overheats get me every few months if i dont blow out my case (i have cats  )
> 
> good luck

 I dont think it is heat because my bios has a warning / alarm ~5 degres before the hard shutdown.  I will look into that tho, maybe set up the lm_sensors or whatever it is that measures the temps.

----------

## jschellhaass

I don't see nvidia listed anywhere.  If you run cat /proc/interrupts within a terminal under X the nvidia card should show up on one of the interrupts.  I'm just  wondering if you have a IRQ conflict between the nvidia card and something else.

In order to boot without acpi add acpi=off to the kernel line of the boot manager.   In grub.conf  it would be something like this.

```

kernel /bzImage-2.6.8-gentoo-r5 root=/dev/hde3 vga=791 acpi=off

```

jeff

----------

## Mben

 *johntramp wrote:*   

>  *Mben wrote:*   do you have any dust in your case? overheats get me every few months if i dont blow out my case (i have cats  )
> 
> good luck I dont think it is heat because my bios has a warning / alarm ~5 degres before the hard shutdown.  I will look into that tho, maybe set up the lm_sensors or whatever it is that measures the temps.

 

blow it out anyway. my bios never classifies it as overheated but the computer locks anyway. just take some compressed air or a vaccume (air works better usually but be carefull not to use too high a pressure or let the fans overspeed) to the fans and vets

good luck

----------

## johntramp

I have had a look and there was a little dust that has been through the cpu fan and been blown onto the ram, I have vaccumed this out and a little around the fans. Also I have swapped the ram with another computer to see how that goes.

 *Quote:*   

> I don't see nvidia listed anywhere. If you run cat /proc/interrupts within a terminal under X the nvidia card should show up on one of the interrupts. I'm just wondering if you have a IRQ conflict between the nvidia card and something else. 

 Does me not having installed the nvidia drivers yet affect this? I am still running the 2d nv drivers that came with the install.

I will try that kernel line aswell, see if that makes a difference too.

Thanks

----------

## johntramp

I have just noticed, when the computer 'hung' that the music I was listening to, about ~1min later started again for a second or so, and then stopped. This would happen about once every minute or so, so I was able to reboot the computer without a hard boot.  This too is still happening in knoppix even with the ram being replaced.

Reading from the bios, after the computer being idle for hours

 *Quote:*   

> System temp: 28degres C / 82degres F
> 
> CPU temp: 35degres C / 95degres F
> 
> Any possibilities on what else this could be ??

 

----------

## johntramp

I realised that I had put a UPS in line with the computer about a week ago, the same time since this started happening. I have now moved it out of the way and things seem to be looking up.

I had the computer and ups feeding the computer,  maybe that was just too much for it ?

I will see how it is in a couple of hours... hopefully it is sorted  :Very Happy: 

Thanks for your help if so.

----------

## Incabulos

Sounds like your CPU is operating at a pretty normal temperature, the shutdowns/crashing certainly isnt caused by it overheating.

I'd check the load on your UPS too, most have a serial cable via which you can monitor load, run time remaining, uptime, and so on. If its overloaded then power will fluctuate to all connected devices in a fairly bad way I assume, sudden shutdowns or lockups might be a power problem.

'dmesg | tail' will show you the last events the kernel has seen, this might help in diagnosing things. You might also want to tone down the more aggressive compiler optimisations in your make.conf too if they are set, and recompile the most crucial components with the more conservative settings ( glibc & kernel come to mind ).

HTH.

----------

## Incabulos

Sounds like your CPU is operating at a pretty normal temperature, the shutdowns/crashing certainly isnt caused by it overheating.

I'd check the load on your UPS too, most have a serial cable via which you can monitor load, run time remaining, uptime, and so on. If its overloaded then power will fluctuate to all connected devices in a fairly bad way I assume, sudden shutdowns or lockups might be a power problem.

'dmesg | tail' will show you the last events the kernel has seen, this might help in diagnosing things. You might also want to tone down the more aggressive compiler optimisations in your make.conf too if they are set, and recompile the most crucial components with the more conservative settings ( glibc & kernel come to mind ).

HTH.

----------

## Mben

if you have a regular powerstrip try just taking the ups out of the system

----------

## johntramp

yes,  I have taken the ups out, and now it has been up for about 4 hours and it seems to be fine  :Smile: 

there is no serial port out of the ups, I assume it is fairly old as it was given to me for free.

Thanks for your help, 

-John

----------

## johntramp

 :Crying or Very sad:  it's happening again now without the UPS  :Sad: 

----------

## Moloch

Forever I was having problems with my system crashing when using athcool to set the cool bit for my KT333.

I kept it off, finally one day I got tired of having a hot CPU and listening to that damn temperature sensitive processor fan whine at almost high.

So I spent about a week going through kernel settings and found nothing. Then moved to BIOS settings after I set my BIOS to safe mode defaults, athcool worked. I believe the problems lies in a couple of settings. First the enhance performance setting for both RAM and AGP caused the lockups. Also the CPU decode setting. It has 3, normal, fast, and ultra. Normal and fast work fine, ultra locks it up.

I really don't notice any performance change between all these settings, so I'm happy to have found the issue.

I've also heard of some boards turn to the cooling bit on by default and you can use athcool to turn it off and see if that makes a difference.

----------

## johntramp

I have had a little look in my bios, I will go and look a little deeper.  I have not done any overclocking tho or changed anything like that in the bios.

Another thing is that I can leave the computer on it's own, downloading or compiling or whatever and it is fine.  Soon as I jump back on the internet or anything it will lock up again :S

----------

## Moloch

Well if it definately seems internet oriented. Then, how are you connected? Ethernet, dial-up, some usb crap, etc? What drivers are you using? Kernel modules, something from portage, etc?

----------

## johntramp

well the thing is I dont think it is software as it happens in knoppix aswell. I can also leave my computer on the internet downloading on DC and that can run flawlessly for hours.

I will try installing another distro somewhere else and see if it still happens there, maybe a stable debian.

----------

## johntramp

 *Incabulos wrote:*   

> 'dmesg | tail' will show you the last events the kernel has seen, this might help in diagnosing things. 

  *Quote:*   

> [root /var/log]$ dmesg | tail                                                                                                       
> 
> ReiserFS: sda1: found reiserfs format "3.6" with standard journal
> 
> ReiserFS: sda1: using ordered data mode
> ...

 

----------

## firephoto

Could be an IRQ bug on your mobo, try physically moving the network card to another slot if it's a pci card. There might be an IRQ setting in the bios if it's a built in nic. Also check the "plug n play" setting in your bios. I had mine turned on but for whatever reason I can't enable it in the kernel so I turned it off in the bios.

Could be a flakey psu too, either getting old and tired or it's a cheapy that puts out lots of watts at the not so needed voltage. Some Athlon 64 boards take a lot of 12 volt amps, 25+ in some setups.

----------

## Moloch

I noticed in your dmesg the line

 *Quote:*   

> r8169: eth0: link up

 

Now I've never heard of those cards causing stability problems. They are however pieces of junk. My last problem with one is it would run extremely slow on a 10mb hub with autodetect. I had to manualy specify the speed for it to work right. I know you said the DC can run fine, but IRQ conflicts could be related. I would at least try to move the card around in different slots or try a completely different ethernet card if possible.

Also to add if you look in the kernel config. There are a bunch of different driver options for the rtl8139. You may want to play around there, I have no idea which options would be the best though.

----------

## solarium_rider

 *nxsty wrote:*   

> Update your kernel to ck9! Ck8 has a bug that can cause crashes.

 

I didn't bother w/ a upgrade right away, since 2.5.8.1-ck7 was crashing on me too.

The 2.6.8.1-ck9 kernel seems stable thus far...been up for 40 hours.

----------

## johntramp

ok, well I have been playing around with it a bit. Ended up puling the whole thing apart for a good clean out of all the dust. I then moved the pci cards and the ram sticks to other slots.

I formatted my hda drive and installed knoppix onto a partition and it seemed to be running ok.  I managed to write about 15 cd's and internet etc. was fine.

Last night I started installing gentoo again. This time I decided to install stable with reiserfs instead of unstable on reiser4. The kernel is a little older this time too, a 2.6.8 instead of a 2.6.9.

So far I am up to compiling xorg and fluxbox etc.   all seems to be well so far,  hopefully.. fingers crossed.. it will be good.  I will be happy if it doesnt cost me anything,  might even go out and get another 512 ram  :Wink: 

ps. Moloch, you say you have had problems with the card? I have 2 cards so I do not know which one you are talking about, the onboard one I used to use was crap, best speeds I ever got were around 4mb/sec. Since then I have gotten a 1000mbit card which has been great. I have my 1000mbit at eth0 tho which I assume is the r8169 so I dont think I have come across the same problems as you have.

----------

## Moloch

 *johntramp wrote:*   

> ok, well I have been playing around with it a bit. Ended up puling the whole thing apart for a good clean out of all the dust. I then moved the pci cards and the ram sticks to other slots.
> 
> I formatted my hda drive and installed knoppix onto a partition and it seemed to be running ok.  I managed to write about 15 cd's and internet etc. was fine.
> 
> Last night I started installing gentoo again. This time I decided to install stable with reiserfs instead of unstable on reiser4. The kernel is a little older this time too, a 2.6.8 instead of a 2.6.9.
> ...

 

Oh! crap,  :Shocked:  just realized. I saw the r8169 and read that as 8139. The RTL 8139 chipset is crap. You have an 8169, which I know nothing about. Sorry to confuse you there, oh well. At least it seems like you've managed to get things running correctly.

----------

## johntramp

an easy mistake to make  :Smile:     it is still going,  looking good.  Thanks for all your help... I have no idea what the problem was. All I can do is hope it doesnt come back I suppose  :Smile: 

----------

## Moloch

 *johntramp wrote:*   

> an easy mistake to make     it is still going,  looking good.  Thanks for all your help... I have no idea what the problem was. All I can do is hope it doesnt come back I suppose 

 

Well problems shouldn't just come back. It would obviously have to be a change that was made. If your like me, you my just do any emerge -u world, install a new kernel. Change something in the BIOS and have no clue what in the hell broke something. If your careful you will be able to spot the troublemaker before things become difficult to track.

----------

## johntramp

yeah,  seems like things are breaking every other day....

one little fright tho, my mouse stopped working before, even tho the keyboard was responding, and also the computer was running slow as, like it was over vnc on a 56k connection   :Confused: 

----------

## feld

 *Moloch wrote:*   

> Forever I was having problems with my system crashing when using athcool to set the cool bit for my KT333.........

 

gentoo-dev-sources-r7 has Make CPU Idle Calls When Idle under PowerManagement -> APM. Maybe that works better? I dunno im gonna test here........

-Feld

----------

