# system hangs/freezes intermittently

## kheston

I've got a weird problem.

I'm running 2.6.11-gentoo-r11 on an HP Netserver LPr (Dual P3 600) and I'm witnessing system freezes from which I can't recover.  Today's occurred when I was attempting to do a CVS commit from another machine (Win2K).  However, it's frozen up at other times while I've been updating MySQL or PostGresQL tables.  The problem often doesn't occur for a week or two at a time. 

I think it may have something to do with disk I/O as it appears to occur during ops that require a disk write, but I can't tie the problem to any single operation.  This particular machine is my PGSQL/MYSQL/CVS server, but I have four of them (same hardware) all running Gentoo.  My web server (same Gentoo version with different packages installed) sees a lot of activity but does not have the freeze/hang problem.  The other two are light-use experimental machines.  Since the drives are swappable, I moved them around and this freeze/hang problem followed the drive.  When I dd'd the drive to another drive, the problem followed the image, too.  It does not appear to be related to a single piece of hardware.

Any ideas about where to look/what to try for me?

Thanks!

 :Smile: 

----------

## dpc

Are there any kernel messages on your console?  (Ctrl+Alt+F1)  This might help narrow it down.

I had what seems to be a similar problem (on 2.6.11-gentoo-r11 as well) and it turns out my filesystem (reiserfs) was corrupt and I had to use the --rebuild-tree option to correct it.  I was worried that I was  going to lose data, but it went fine - http://dcortesi.com/2005/08/17/get-your-reiserfsck-on/ - but I would still recommend taking proper precautions if you go that way.

----------

## didl

Is DMA working properly on the drive?

----------

## kheston

it's a SCSI drive.

----------

## kheston

dpc,

I'll try the keystrokes and the fsck.

Thanks.

----------

## dpc

Oddly enough, I was running subversion on that server (another source code control product like CVS) and ran into the problems when syncing to the repository.  How odd...

If you are using reiserfs, make sure you run reiserfsck --check first to see if there are any issues.

Good luck!

----------

## kheston

I'm on ext2.  I'll post what I end up figuring out.

----------

## didl

You could also use 

```

sys-apps/smartmontools

```

to run a selftest on the drive to rule out any hardware problems.

----------

## kheston

Happened again.

Ctrl+Alt+F1 doesn't return anything...console is unresponsive.  Smartmontools appear to be useless for my configuration.  Is S.M.A.R.T. available for SCSI?

I'm looking into whether it has something to do with spin down or some other power-conserving feature.

----------

## kheston

How do I disable all power-saving features?  Also, is there a log that tracks idle time?

----------

## kheston

It wasn't the power-saving features on the motherboard, that was a dead end.  It appears to be a memory issue (only 128M installed and running lots of daemons).  I'm guessing that upping the ram will fix it.  Will post if it does.

----------

## kheston

I was memory bound.  128MB was apparently not enough.  I've upped the server to 1GB and haven't had any issues.

----------

## kheston

It's back.  I've installed a GIG of RAM and the problem didn't go away.  Funny, it really did look memory bound...thought this was going to do it. 

It isn't a "freeze up" like I initially stated.  That is to say, all processes that were previously running stay running.  However, it stops allowing any new processes to spawn.  Not only is this true of the ones I attempt to spawn from the command line, but also those spawned by running services like CRON and XINETD.

Any pointers you may have are appreciated!

----------

## drwook

Hmm...  Almost sounds like it could possibly be a memory leak if it's been less bad with more RAM but still got to this position eventually, are you running anything unusual, probably a daemon...?

----------

## kheston

Here's a process listing (ps -e):

init

migration/0

ksoftirqd/0

migration/1

ksoftirqd/1

events/0

events/1

khelper

kthread

vesafb

kacpid

kblockd/0

kblockd/1

pdflush

pdflush

aio/0

kswapd0

aio/1

cifsoplockd

jfsIO

jfsCommit

jfsCommit

jfsSync

xfslogd/0

xfslogd/1

xfsdatad/0

xfsdatad/1

xfsbufd

kseriod

scsi_eh_0

kirqd

udevd

khubd

syslog-ng

atd

cvsd

named

mysqld_safe

mysqld

mysqld

mysqld

mysqld

mysqld

mysqld

mysqld

mysqld

mysqld

mysqld

mysqld

postmaster

postmaster

postmaster

postmaster

smbd

smbd

nmbd

sshd

cron

agetty

agetty

agetty

agetty

agetty

agetty

smbd

mysqld

postmaster

mysqld

sshd

bash

----------

## medaille

To me, it sounds like something is overheating and the computer either can't interface with it and it freezes or it hits a critical limit and stops working as a safety measure.

----------

## kheston

Well, if that's the case, I have an overheating component in two machines...

The problem is also occurring on my web server, which has the exact same hardware configuration, although it doesn't happen nearly as often as it does on my DB server.

----------

## kheston

Almost a year later and I still get these funny freeze-ups.  Fortunately, it only requires a reboot about twice a month.  It's a bit of a headache, but not enough of one to move me to another distro.   :Smile: 

----------

## mbar

Recently I also had random freezes (few a day). They were "fixed" by installing new PSU (power unit) and new CPU. I personally think the old PSU was too weak or b0rked from power grid spike (I remeber it happened one sunday).

----------

