# System freezes sometimes and I have no idea why.

## Adam15906

I am using NFS to store Mp3s to play and many other files, can something with this freeze a machine? My machine used to freeze but only because I had corrupted memory but I changed the memory and it's fixed. The kernel paniced but I recompiled it sucessfully and it hadn't froze since two days ago, and now just 5 or 10 minutes ago it froze. How can NFS cause this and no one else be experienceing this? /var/log is telling me nothing. It's a 1.13 GHZ/192MB machine. When it freezes the busy light as I call it which is yellow on most PCs that has the container is steadily lit.

----------

## jkt

you can try to mount the NFS share with soft,retry=2 options.

you should also check your memory with memtest86

----------

## Adam15906

The kernel is panicing and it seems to be something related to NFS, I am trying to fix it, I already tried recompiling it. I am using NFS in the kernel. I am also using the latest NFS Utilities.

----------

## jkt

a) is your kernel compiled for correct CPU?

b) do you have memory issues? give the memtest86 a try...

----------

## Adam15906

B, I know is done because memtest86 checked twice and no errors. So I will double check a.

----------

## jkt

 *Adam15906 wrote:*   

> B, I know is done because memtest86 checked twice and no errors. So I will double check a.

 

I'll suggest you to let it pass through all the tests, not just the normal ones.

----------

## Adam15906

It was the KNOPPIX version and I let it go through 2 times. 2 times on the second PASS bar not STATUS bar.

----------

## jkt

 *Adam15906 wrote:*   

> It was the KNOPPIX version and I let it go through 2 times. 2 times on the second PASS bar not STATUS bar.

 

yeah, but it desn't run all test by defalt, you have to go to configuration->test selection and select "all tests".

----------

## Adam15906

I see.

----------

## Adam15906

I updated to the lastest kernel and installed it and I hope everything is ok now. I had the kernel messed up before with no devsfd and I think I messed up the .config and when I re did it I just copied over it on /dev/hda1 when I think I should have just delted it then copied it to make sure.

----------

## Adam15906

Could my machine have paniced because it was 5 hours behind the client? I thought it was the right time on it but somehow it wasn't. Must have been due to the storm a while ago.

----------

## jkt

use NTP for synchronization. no, that shouldn't cause such a crash...

anyway, it's possible that linux kernel is broken on your hw, or (more probably) you have some hw issues...

anyway, if it "freezes", what actually happens? do you get some debug output on console? are you able to toggle numlock status? switch between virtual consoles?

----------

## Adam15906

Well it paniced and said that it could not sync, I couldn't ever really take down the panic on a pen and pad because I usually leave the system with the monitor off and just SSH in, but I still leave the monitor off, but stopped it from blanking out, so I just did setterm -blank 0. It paniced and said not syncing.

----------

## Adam15906

I get a very close error to this:

http://www.linuxsa.org.au/pipermail/linuxsa/2004-November/074841.html

----------

## jkt

you can try some kernel patches like kmsgdump to save these dumps on floppy. another way is to setup serial console on crashing machine and log the messages on another.

and as the replay on mailing list suggests, report it (with debugging enabled) either on bugzilla.kernel.org or lkml.

----------

## Adam15906

I will look in to the serial but first I need more information. Got any links?

----------

## jkt

look for "serial console howto". you'll need serial cable (aka "nullmodem") and some terminal emulator.

----------

## Adam15906

I only have a serial to parallel cable, can I use it?

----------

## jkt

no. isn't your cable just an interface between canon-9 and canon-25 connectors?

----------

## irf2003

 *Adam15906 wrote:*   

> I am using NFS to store Mp3s to play and many other files, can something with this freeze a machine? My machine used to freeze but only because I had corrupted memory but I changed the memory and it's fixed. The kernel paniced but I recompiled it sucessfully and it hadn't froze since two days ago, and now just 5 or 10 minutes ago it froze. How can NFS cause this and no one else be experienceing this? /var/log is telling me nothing. It's a 1.13 GHZ/192MB machine. When it freezes the busy light as I call it which is yellow on most PCs that has the container is steadily lit.

 

sounds like some bad hd sectors to me

hth

happy gentooing

----------

## Adam15906

How can I easily fix this? I know in Windows, but I don't want to wipe things. Also it didn't take 5 minutes to install Gentoo even for a Gentoo geek. So I don't want to eat up my NW bandwidth all over again.

----------

## jkt

 *Adam15906 wrote:*   

> How can I easily fix this? I know in Windows, but I don't want to wipe things. Also it didn't take 5 minutes to install Gentoo even for a Gentoo geek. So I don't want to eat up my NW bandwidth all over again.

 

no way to fix bad blocks on hdd. if you want to check that the drive is ok (I personally think that it is as the failure won't cause hardlock IMHO), try `dd if=/dev/hda of=/dev/null bs=1M`. it'll take quite a long time, it'll read content of entire hard drive, so if it freezes, you have problem related to disk i/o.

----------

## stettler

Have you tried CPUburn? It shows you quite quick how stable your system is, just to exclude that possibility.

----------

## jkt

or prime95, mersene prime number tester. available even for windoze, but I had some troubles getting the sources, but they are available.

----------

## stettler

To exclude harddisc problems, I would use bonnie++ and smartctl (really useful on every system).

----------

## jkt

 *stettler wrote:*   

> To exclude harddisc problems, I would use bonnie++ and smartctl (really useful on every system).

 

why do you think that simple `dd` wn't do the job?

----------

## stettler

From the smartctl manpage

 *Quote:*   

> smartctl  controls  the Self-Monitoring, Analysis and Reporting Technology (SMART) system built into many ATA-3 and later ATA, IDE and SCSI-3 hard drives. The purpose of SMART is to monitor the reliability of the hard drive and predict drive failures, and to carry out different types of drive selftests. 

  which is IMHO a little bit more than simply using a hd by executing 'dd'.

----------

## jkt

 *stettler wrote:*   

> From the smartctl manpage
> 
>  *Quote:*   smartctl  controls  the Self-Monitoring, Analysis and Reporting Technology (SMART) system built into many ATA-3 and later ATA, IDE and SCSI-3 hard drives. The purpose of SMART is to monitor the reliability of the hard drive and predict drive failures, and to carry out different types of drive selftests.   which is IMHO a little bit more than simply using a hd by executing 'dd'.

 

oh, I see. I assumed it is only some kind of hdd-benchmark tool. Thanks for hint.

----------

## Adam15906

It locked up after dd. Damn and all my important stuff is on it. Is this going to be the biggest data loss of my life or hard drive loss?

----------

## Adam15906

It looks like my problem was over a bad secondary IDE configuration.

I had two HDs on the Primary and one HD and one CD-rom on the Seconday. I removed the two second HD's since I never really use them and they are just about 200MB each. One was probably either bad or I accidently messed up the jumper settings on the hard drive on the secondary. Thankfully it is not kernel panicing anymore.

----------

## Adam15906

It looks like fdisk had a way lower number of cylinders set then when I completly wipped my drive and started over. It had only like 2096 before the wipe and got like 37000 or some in cylinders after the wipe of the drive. This soulds like this was the problem to me, how about you?

----------

