# Linux hangs/freezes - hdb: lost interrupt

## hannson

Hi I've got a weird problem with my computer. I left it on overnight like always and when I came back the morning after everything was in a frozen-like state. I could swich terminals (alt 1-7) but couldn't login in any of them until my computer snapped out of it. This has been happening repeatedly sinve yesterday and I have no idea what's wrong. 

It's definetly not a DMA problem.

I swiched from X to modular X (because I first thought it was that kind of problem)

I checked my hdd with a SMART tool and it passed the test.

I also checked tempature and it was 35°C on motherboard and CPU

My computer setup is:

AMD64 3000+

512MB ram

Asus A8n-deluxe:

NFORCE4-SLI chipset 

Currently (writing in another computer now) I'm running System Monitor in the broken computer and it shows 25% Memory usage and 100% CPU, falling to 1% every 4 - 6 minutes. 

It took around 7 minutes to go between pages in FF.

Any ideas?

BTW I can't see any process using more than 2 - 7% CPU... maybe all together 14% at most. Still the monitor says 100% usageLast edited by hannson on Wed Jul 12, 2006 8:28 pm; edited 1 time in total

----------

## coolsnowmen

100% memory at idle? what process is doing this?

top should tell you

----------

## hannson

No sorry, 100% CPU usage.  I'll try 'top' when I get home!

----------

## coolsnowmen

yeah, post whatever process is taking up 100% 

also, can you restart? and see if it still goes 100%?

I have problems with nano running 100 (quit badly) for no reason enough times for me to put

USE=minimal infront of it.

----------

## hannson

Ok. I've tried restarting. I've had to hardreboot several times because the GUI hangs for too long and I can't login on other terminals in the meanwhile

----------

## coolsnowmen

I think you need to find out what is breaking.

So I would boot with the live cd, to chroot, delete xdm from the runlevel

This will let you test each part of the gui indepenantly...and first prove that it is part of the gui, And not something even earlier.

Then just start X, etc...

----------

## hannson

Something tells me this is a damaged hdd problem... 

The cpu is pegged at 99% or 100% with most of the utilization in 'wa' (wa stands for "I/O wait state"). 

Am I right? :-S

EDIT: My dmesg is getting filled with hdb: lost interrupt

----------

## coolsnowmen

Did you start it w/o xdm?

How about w/o ANY services (networking/samba/cups/...)?

 *Quote:*   

> EDIT: My dmesg is getting filled with hdb: lost interrupt

 

Thats not good, something bad along the hd chain.  I'ld put the drive in another computer and try to read it, and backup anything important on it if you havn't already....When you put it in the other computer, use a known good cable...maybe you are lucky and its just a bad cable...

hdb?...what happened to hda? what is your fstab?

----------

## hannson

No I was running everything. My guess is that either the cable isn't good enough or the drive is overheating - because it works for a while before it hangs. I recently moved the harddrive to another cable (because the old couldn't use DMA for some reasons) and moved it into the hdd rack inside the case (was using a hotswap drive in a 5.25 slot). Now I'm using the ASUS cable that came with my motherboard - It should be good enough! 

My root/boot drive is /dev/hdb - there is no hda

Just that, I used smarttools (smartctl) to check the drive and it passed the test. :-/

----------

## ZomAur

I'm having a similar problem on my server. The motherboard is a VIA. Since DMA isn't working, i've turned it off. Was working fine until about a week ago, when I started to get lockups once or twice a week. Now, sometimes I can't even boot!

The drive isn't even five months old, so I'm more suspicious about the motherboard. I've had this problem before, one drive just died, the other got the "lost interrupt", so I put it away in a box. Since those two were over two years old I just thought I was unlucky. Now I'm not so sure.

I'm using kernel 2.6.17-gentoo-r7, and all the disks have been IDE-disks. Temperature is normal, never been above 40 as far as I know.

----------

## st0ne

hi,

same problem on my core2duo system with sata harddrives...

sometimes, the whole system hangs... HDD-Led is lighting everytime, and all is freezing completly...   :Sad: 

only solution is an hard-reset...

i don't know what it is, but i think it's some problem in kernel... i have also problems with heavy ide-transfers... cpu goes to almost 99% wa (only one core) in top...

so the system comes inresponsible for any interaction.

my kernel is: gentoo-sources-2.6.18-r3

i've testet the kernel with and without preemtion... but it has no effect.

greez st0ne

----------

## neonman

Same problem here. https://forums.gentoo.org/viewtopic.php?p=3746291

The problem isn't with the CPU wait %, the core of the problem is that the disks become extremly slow, wich in turn causes the high wait % (processes waiting on I/O to complete causes wait % in top) So this has nothing to do with CPU usage etc, it's a problem with I/O.

I got this problem after upgrading my CPU, RAM and mobo. Same disks, and they worked just fine before the upgrade.

I can also add that I get the same problem with both the ata_piix driver and ahci.

When I reboot my system can work just fine for 10 minutes up to a day or so, then this starts.

Something is causing the SATA ports to timeout, the driver the rescans the port and re-initizalise the port. After this happens the system becomes slow as hell, and I get the low I/O performance. So maby there's 2 problems.. The first being that the port resets(check my post for dmesg output etc) and the second being that after the port re-initzialises performance suffers.

----------

