# Waking up to a locked up computer..

## Larcen

When I've woken up the past three days I've sat at my desk and cut my monitor on to a black screen. Common, so I move the mouse around..nothing.  Hit a few keys on the keyboard..nothing. Not even the numlock or capslock keys change anything on the keyboard itself. So, I have to hit restart on the case to get any kind of response.  :Sad:   Anyone got any ideas how to track down the problem? I'm using latest Gentoo sources, 2.6.12 r9 I do believe, latest nvidia stable drivers, ReiserFS on my hda disk, 1.5 gigs of ram, P4 1.6 processor, and Fluxbox. I've even logged out of Fluxbox one night and left it at the KDM screen overnight, only to wake with the same issue. 

*hangs head in defeat.*

----------

## platojones

First thing to do is check the log files (/var/log/messages and /var/log/syslog) to see if anything unusual happens during the night.  Could be a kernel OOPs or something else going on.

----------

## Larcen

Oddly enough, I don't have either of those..

```
(root@shell)[/var/log]-> ls

Xorg.0.log      Xorg.1.log  apache2  dmesg       genkernel.log  lastlog  news     samba    scrollkeeper.log  wtmp     xferlog

Xorg.0.log.old  apache      cups     emerge.log  kdm.log        mysql    ntp.log  sandbox  webmin            xdm.log

(root@shell)[/var/log]-> 

```

----------

## platojones

 *Quote:*   

> 
> 
> Oddly enough, I don't have either of those.. 
> 
> 

 

Wow, that is extremely odd.  Does this box stay on the internet all night by any chance?  Unless there's been massive disk corruption resulting in the loss of your log files, I can't think of a good reason why those aren't there.

----------

## Larcen

My computer has been on, except for power outtages, and internet outtages, 24/7 for approx. 4 years now. Never have I, even when using windows, had such a consistant problem with lockups. My computer has been running fine all day since 6:30 est this morning, games have been played on it all day with 0 performance loss, we'll see if its locked up when I get up in the morning.  Oh, and for the logging..heh..metalog wasn't running, nor has it been added to rc-update until now. Maybe we'll have some logs in hte morning, or if I'm really lucky, it won't be locked up instead.  :Sad: 

----------

## Larcen

```
Aug 25 18:48:51 [kernel] ReiserFS: hda3: warning: vs-5150: search_by_key: invalid format found in block 4489311. Fsck?

Aug 25 18:59:18 [kernel] hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }

```

I am finding a handful of these now in /var/log/everything/current file. :/

----------

## platojones

Ok, well, I think your answer is starting to emerge here.  Those kind of entries in the log are a clear sign that your HD is not happy.  Bad disk drive perhaps?  Depending on the drive, 4 yrs. hits the MTBF window of most disk drives these days.  A dying drive will cause no end of trouble of which lockups would be only one of many.  I would start shopping immediately and if you have anything important on that drive, I would look into backing it up ASAP.

Of course, this could be caused by some recent system-level software change, but if you can rule that out, I would take the advise I listed above post-haste!

----------

## Larcen

This particular drive is only about a year and a half old, give or take a few months. :/ Under hdparm tests, it has always returned the same resulsts, high 700's on one, and mid 50's on the other. Assuming it -is- a hard drive, what type and brand would you suggest? I've only ever used Maxtor drives. Both I use now are Maxtor, one is 160 gig, 8mb cache, 7200rpm, the hda drive you saw above is 40gig, 8mb cache, 7200rpm.

----------

## platojones

Well, you pay's your price, you takes your chances.  I just bought my first Maxtor last month (200 GB, 7200 RPM, 8MB cache, $120), so I'm no expert on Maxtor.  I use the thing as a backup drive on my LAN.  I have no idea what their quality is, but at that price point (and I've seen the same drive as low as $99.00 at Best Buy a couple of weeks ago), I'm guessing that this particular model isn't made for the long haul.

What I do know is that, in absense of some kind of massive kernel misconfiguration, those kind of log entries usually indicate hardware failure.  MTBF is just that:  a "mean time" between failure.  Some drives cruise right through it and go on for years beyond.  Others don't even get anywhere near the window.  I've only had one drive go nuts on me in my life and that was a Quantum.  At the time, they were renowned for high quality, but this one went to hell within 2 years.  And it drove me nuts trying to figure out the problem.  I had it configured for dual-boot with Linux and Windows and it would boot fine under Linux, but would hang under Windows.  Spent 6 hrs and $300 dollars on line with Windows tech support trying to figure out why it wouldn't boot with their OS and at the end they mentioned that the disk drive might be bad....I said impossible, boots fine under Linux and they told me that 'that means nothing.  Linux may boot a bad drive fine, but they had were certain the drive was bad!  They were right.  Within a few days, I started seeing bad crap in my linux log files like you see now.  Soon after that, nothing would boot.

As for recommendations?  Like I said, I haven't had enough 'good-drive/bad-drive' experience to give you an honest answer.  Maybe someone with more experience can give some better recommendations.  I would expect (though I would never guarantee it) that some of the higher end drives like the Barracuda series might be a good choice.  I have no experience with those however, so take my advise with a grain of salt.

All of this assumes, of course, that you haven't made any drastic kernel changes that might affect the FS or HD drivers.    Going from a stable version of Reiser 3 to unstable Reiser 4 could well account for these kind of issues as well.

----------

## Larcen

I'm still using stable Reiser, for the time being until Gentoo puts 4 in the LiveCD's. Nevertheless, it wasn't locked up this morning.. But! it is read only mode.  Can it go read only without even logging out of my Fluxbox session? I Figured the / had to unmount, then remount to be mounted read only. I'm no genius however. Least it wasn't locked up right? :/  I'm trying fsck, if that doesn't take care of it, I guess I'll start shopping for a new main hard drive.

----------

## Larcen

Well, I ran fsck and it said it finished successfully. Being everything was still read only, I was going to restart. Bad idea. *cries.*  Now, when it comes to the 'Mounting root...' part of booting I get the horrid 'couldnt mount root, /newroot invalid argument type shell or specify which partition to boot etc etc etc'  I ran out of time, so when I get home I'll boot the livecd, run fsck again and pray I can mount at least long enough to get some shit off it.  :Sad: 

----------

## platojones

Yep, the driver will automatically switch the drive to read only if it encounters too many errors.  BTW, if you need to fsck the '/' partition, the only way to do it is to boot from a live cd and fsck.  Running fsck on a mounted partition doesn't work so well.

----------

## Larcen

So I see, I finally got the livecd to boot, and looking in /dev I only find hda, not hda1 or 2 or 3. *hangs his head in defeat.*  Is it a total loss you think, honestly? Hell, I can't even fdisk /dev/hda without getting a screen full of those seek errors and bad crc's  :Sad: 

----------

## platojones

 *Quote:*   

> 
> 
> Is it a total loss you think, honestly? Hell, I can't even fdisk /dev/hda without getting a screen full of those seek errors and bad crc's 
> 
> 

 

Honestly, I think so, as far as keeping this drive alive much longer.  Two good bits of news.  As I reported earlier, big, fast drives are pretty cheap.  I can't attest to the quality, since you only know how good they are after a period of time, but for $$$/spec, I think drives are cheaper than they have ever been in history ($.50 per GB).  Second, you can probably salvage a good but of stuff off the old drive, so don't just chuck it in the garbage if you 've got stuff you want on it.  I'm no expert in data recovery either, but there are some outstanding free utils out there (knoppix comes to mind right away) with detailed docs about how you might go about it.  You might be able to keep it installed (as a second drive) and mount it read only to see what you can get off of it.  As long as the disk is spinning and anything shows up at all, you can certainly recover stuff.  The only caveat there is that in the 2 'trashed drive' cases I've dealt with (one mine  and the other, my father's), it seems the only files I could not recover were the ones that were important  :Very Happy:    I just chalk that up to personal bad luck though.

Anyway, I would *not* keep it running any longer until you get a replacement installed.  It's clearly on it's last legs and things will only get worse for your data from here on out.  Just find the replacement drive of your choice, and focus on getting what you can off the old one.

----------

## Larcen

Thanks for your help and insight platojones, just figured I would let you know where I'm at now. I finally got the livecd to see my drive and partitions, I ran a fsck check and --rebuild-tree, to my utter amazement and joy, it fixed my problems..for now.  I've been running 12 hours now and been keeping an eye on my log file, there hasn't been any of those strange entries yet thankfully. Hopefully the corruptions were fixed, and all will be well again. I'm still going to go buy a nice new hard drive, and keep everything important backedup incase I have to make the switch. Again, many thanks.

----------

## scoon

Hey there, 

Interesting post.  I have had the same problem as you.  Other things to try are different IDE cables.  I run 2 maxtors 24/7 as well (a 120g and a 250).  Also, I had a power supply that was going bad and the timings on my mobo were all afu, so I put in a new power supply and all is well again (there was nothing wrong with the drive to begin with).

regards, 

scoon

----------

## Larcen

Yeah, I had to change my ide cables as well, they were giving some weird names to my drives on 'detecting drives' at the main bios boot screen. Nevertheless, cables changed, disks were fsck'd and it seems to be running just fine, pushing 14hours.  :Smile: 

----------

## doblebo

good to hear of your luck. it brings me hope. i was running a 2.6 kernel with reiserfs and got the same dmesg output. i'm now running fsck.reiserfs --rebuild-tree (second time as the first attempt resulted in kernel panic - on the latest livecd).

----------

## Kaapeli

If you have any doubt about the disk condition, you should check the smart values on the disk. That's the easiest way to see if the disk is about to die.

emerge smartmontools

smartctl -a /dev/hdX

That will show you the smart data on the disk.

smartctl -t long /dev/hdX

That will command the disk to do a self test which will read the whole disk surface and stop on the first unreadable sector. The disk can be used normally during the test. After the test is finished (may take couple of hours on big disks) run smartctrl -a again to view the test results.

----------

## Larcen

thanks, I'll have to give this a try when I get home from work.

----------

