# Entire computer hard-locks...

## MadOtis

Hello all,

I have a problem that has finally driven me to the edge of throwing this damn computer out a window...

It locks up for apparently no reason.  I am fairly confident that is has something to do with X or video, since I can usually reproduce it every time I drag anything around on the screens (I am running Xinerama with 1 LCD panel and 1 monitor) .  But, it doesn't matter if I am dragging from one screen to the other, or moving something around in just one screen.  It will lock up 100% of the time after playing Aisle Riot and dragging a card around, too.  There is nothing in the log files, because as soon as it locks up, it does a complete system lock up.  I cannot <Ctrl><ALT><Backspace>, or <CTRL><ALT><Fx> to get to a different tty, etc.  I cannot SSH in from another machine.  The only option is to power-reset by unplugging and plugging the power back in.

Here are my machine particulars:

NVidia 5600 FX Ultra w/ 256Mb ram (AGP)

AMD Athlon 64 processor (2Ghz)

DFI LanParty nF3 250Gb motherboard (with bios at "failsafe" settings)

Dual Maxtor 160Gig drives (NOT RAID) on motherboard ports

OS: Started as Gentoo 2004.3 (64 bit) but has been 'emerge -uDN world' about once a month since then.

Modular X (it did it with pre 7.x versions as well, I only upgraded to see if I could make the problem go away)

Gnome (latest in portage)

NVidia drivers: 8756 (tried all releases back to 6111 with associated GLX emerges as well) 

Here is the output of 'emerge --info'

```

Portage 2.1_pre10-r5 (default-linux/amd64/2005.0, gcc-3.4.6, glibc-2.4-r2, 2.6.16-gentoo-r7 x86_64)

=================================================================

System uname: 2.6.16-gentoo-r7 x86_64 AMD Athlon(tm) 64 Processor 3000+

Gentoo Base System version 1.6.14

ccache version 2.3 [disabled]

dev-lang/python:     2.3.5-r2, 2.4.2

dev-python/pycrypto: 2.0.1-r5

dev-util/ccache:     2.3

dev-util/confcache:  0.4.2

sys-apps/sandbox:    1.2.17

sys-devel/autoconf:  2.13, 2.59-r7

sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r1

sys-devel/binutils:  2.16.1

sys-devel/libtool:   1.5.22

virtual/os-headers:  2.6.11-r2

ACCEPT_KEYWORDS="amd64"

AUTOCLEAN="yes"

CBUILD="x86_64-pc-linux-gnu"

CFLAGS="-O2 -march=athlon64 -pipe"

CHOST="x86_64-pc-linux-gnu"

CONFIG_PROTECT="/etc /opt/OpenNMS/etc /opt/OpenNMS/webapps/opennms/WEB-INF/web.xml /usr/kde/2/share/config /usr/kde/3.3/env /usr/kde/3.3/share/config /usr/kde/3.3/shutdown /usr/kde/3.4/env /usr/kde/3.4/share/config /usr/kde/3.4/shutdown /usr/kde/3/share/config /usr/lib64/mozilla/defaults/pref /usr/share/X11/xkb /usr/share/config /var/qmail/control"

CONFIG_PROTECT_MASK="/etc/eselect/compiler /etc/gconf /etc/terminfo /etc/env.d"

CXXFLAGS="-O2 -march=athlon64 -pipe"

DISTDIR="/usr/portage/distfiles"

FEATURES="autoconfig distlocks metadata-transfer sandbox sfperms strict"

GENTOO_MIRRORS="http://distfiles.gentoo.org http://distro.ibiblio.org/pub/linux/distributions/gentoo"

MAKEOPTS="-j2"

PKGDIR="/usr/portage/packages"

PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude='/distfiles' --exclude='/local' --exclude='/packages'"

PORTAGE_TMPDIR="/home/portage"

PORTDIR="/usr/portage"

PORTDIR_OVERLAY="/usr/local/portage"

SYNC="rsync://rsync.gentoo.org/gentoo-portage"

USE="amd64 X alsa apache2 avi berkdb bitmap-fonts cli crypt cups dri dvd dvdr eds emboss encode foomaticdb fortran gif gimpprint gnome gpm gstreamer gtk gtk2 gtkhtml imlib innodb isdnlog jpeg kde lzw lzw-tiff mp3 mpeg ncurses nls nptl nptlonly opengl pam pcre pdflib perl png ppds pppd python quicktime readline reflection sdl session spell spl ssl tcpd tiff truetype-fonts type1-fonts usb xinerama xorg xpm xv zlib elibc_glibc input_devices_mouse input_devices_keyboard kernel_linux userland_GNU video_cards_nvidia"

Unset:  ASFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, LC_ALL, LDFLAGS, LINGUAS, PORTAGE_RSYNC_EXTRA_OPTS

```

Any help or suggestions would be HUGELY appreciated at this point, I've lost too much work to let this happen any longer, so, if it means replacing X or replacing hardware, I'll do it, I just need to know what to replace.

Thanks in advance!

----------

## didymos

"Doc, it hurts when I do this..."

"OK, don't do that then."

Does anything besides Aisle Rot cause trouble? I know the answer is yes, but I mean, what specifically?

----------

## NeddySeagoon

MadOtis,

I suspect the PSU or overheating. Does running with the case open help?

If you have varable speed fans turn them to max.

Check the CPU heatsink for grot and clean with a stiff brush.

A power supply can onlt be tested by substitiution. Can you borrow one for a few days?

PSUs are commodities - you get what you pay for. Middle of the range ones are quite good enogh.

----------

## MadOtis

As I mentioned, aside from Aisle Riot, just about any time I do anything in Gimp, it hard locks there as well.  Basically, anything that seems to be graphics intensive like dragging or manipulating graphics in Gimp.

I don't think it is a heat issue... I can easily reproduce it with the cover off.  I am very anal about keeping dust and crap out of my computer and I vacuum it out about once every month or so (so I know none of my fans are clogged).  I've even let the machine cool overnight (powered off), fired it up first thing in the morning and immediately logged in to X and started Aisle Riot immediately and caused it to lock inside of a minute or so.  Basically, however long it took to boot, start X, log in, start Aisle Riot, and drag a few cards around.

----------

## didymos

How about stuff that does graphics using the console framebuffer?  Also, I'm curious as to what happens when Xinerama is disabled.

----------

## bobbymcsteels

I'm havin a similar problem when I emerge something.... downloads file fine but during compiling it locks up, had the trouble with mplayer aswell...

----------

## whig

Check your ram with memtest86. Do the lock ups happen in Windows? FreeBSD?

----------

## bobbymcsteels

the only problems I had with windows was that it was windows  :Razz: 

I tried emerge --update --deep --newuse world in gentoo with x and it froze up, so rebooted logged in hit cntl+alt+f1 and tried it and it worked fine, and havent had that problem as yet, but I'm think its a problem with x.

----------

## troymc

Anything in /var/log/messages? /var/log/Xorg.0.log? dmesg?

If you have xdm/gdm/kdm set up to start automatically, then /var/log/Xorg.0.log will get overwritten unless you catch it during the boot (or boot single-user). Disable xdm "rc-update del xdm", cause a lock-up & reboot. The log file won't be overwritten.

dmesg is volatile, so you have to be active in catching it. Open a console & run something like: "while sleep 10; do dmesg > /tmp/dmesg.out; done".  This will dump the dmesg output every 10 seconds into a file. Now just cause a lock-up & maybe some info will show up.

 *MadOtis wrote:*   

> 
> 
> The only option is to power-reset by unplugging and plugging the power back in.
> 
> 

 

A kindler, gentler solution may help here. Hopefully your kernel has MagicSysRq built in.

When it "hard locks", press and hold ALT-SysRq and the press - in order - S, U, B. (SysRq is the Print Screen Key)

This will sync the disks, umount filesystems & reboot the system, and possibly save you some filesystem corruption.

If you are on a text console, you will actually get some output from these, but in X you may see nothing until the reboot.

troymc

----------

## Cintra

How is your RAM organised?

I just upgraded 768MB to 1.5GB and started getting hard locks, probably a mismatch between the memory cards, so I took out the 512MB and put in another 1GB card of the same make. Today I managed a 4-hour update with no problems.. 

But, it might be something like a loose cable e.g. to a firewire or USB drive..

Mvh

----------

## MadOtis

ok, let me try to take some of these in order...

I don't run anything regularly that uses the console framebuffer.  I am about 99.8% of the time running X so I can leverage multiple windows (I'm not even close to proficient with SCREEN).

I've not disabled Xinerama yet.  To be honest, I never thought it could be the problem...  I've had Xinerama configured in since the machine was new and hadn't started hard-locking until a few months ago.  I'll try disableing it and going back to one monitor for a while and see if it makes a difference.

I've run memtest86 overnight for a period of about 14 hours...  I didn't get a singe error.  I don't get the lockups in windows, and I like to play HL2 and that DOES exercise the GPU pretty heavily.  I've tried a knoppix cd a few times and so far, no lock-ups there.  But, it's not configured for Xinerama, etc.  Haven't tried any BSD systems, I don't have any partitions available to try and install it.

So far, I get NOTHING in any logfiles, but, I will follow those directions (as soon as I finish this post) and see if I can capture anything at all.

I guess it could be a ram panel mismatch, however, I have been pretty careful to keep both panels (I've got a 1Gig and a 512Meg panel) as DDR400.  But, I did the memory upgrade about a year ago and it's only been locking up for a few months.

And now that you mention it, I have been having problems with my printer lately.  It's a USB printer and unless I have the cable pushed all the way in and slightly wedged upward, the machine goes into a hard-lock state during boot-up, specifically, when cups starts up, and that does indeed happen in Windows too (at some point during boot-up).  But, how can a loose or problematic USB port or cable affect me when I am dragging things around the screen?  I never put those two things together.

----------

## NeddySeagoon

MadOtis,

I think the agpgart driver may be implicated.

Some combinations of North Bridge chips and graphics cards don't play nicely together with all the speed options turned on. 

Windows knows this in many cases and turns them off. In Linux, you hace to do this by hand.

Some good things to turn off are fast writes and side band addressing. How you do this dependx on your video card driver.

It only applies to AGP cards.

----------

## MadOtis

Cool!!!  The SYSREQ keys worked!!!  However, I did still have some Reiser transactions replayed when I restarted... I assume that was due to having stuff loaded in memory when I had to restart.

ok, here's the butt-end of my Xorg.0.log

```

(--) Mouse0: PnP-detected protocol: "ExplorerPS/2"

(II) Mouse0: ps2EnableDataReporting: succeeded

(II) 3rd Button detected: disabling emulate3Button

(WW) NVIDIA(1): WAIT (0, 7, 0x8000, 0x00007700, 0x00007700, 0)

(WW) NVIDIA(1): WAIT (2, 6, 0x8000, 0x00007700, 0x00007724, 0)

```

here's (I guess) the relevent portion of /var/log/messages when I hard-locked

```

May 13 11:00:01 grunt-call cron[14722]: (root) CMD (test -x /usr/sbin/run-crons && /usr/sbin/run-crons )

May 13 11:00:01 grunt-call cron[14724]: (root) CMD (rm -f /var/spool/cron/lastrun/cron.hourly)

May 13 11:10:01 grunt-call cron[14740]: (root) CMD (test -x /usr/sbin/run-crons && /usr/sbin/run-crons )

May 13 11:16:06 grunt-call NVRM: Xid (0001:00): 13, 0002 01023900 00000039 00000328 00000000 00000100

May 13 11:16:13 grunt-call SysRq : Emergency Sync

May 13 11:16:13 grunt-call Emergency Sync complete

May 13 11:16:14 grunt-call SysRq : Emergency Remount R/O

May 13 11:51:52 grunt-call syslog-ng[8272]: syslog-ng version 1.6.9 starting

```

The 35 minute time difference is thanks to NTP... later on in the file when ntp starts up, I get this:

```
May 13 11:52:13 grunt-call usbcore: registered new driver usb-storage

May 13 11:52:13 grunt-call USB Mass Storage support registered.

May 13 11:18:11 grunt-call ntpd[9239]: ntpd 4.2.0a@1.1190-r Mon Mar 27 21:10:33 EST 2006 (1)

May 13 11:18:11 grunt-call ntpd[9239]: precision = 1.000 usec

May 13 11:18:11 grunt-call ntpd[9239]: Listening on interface wildcard, 0.0.0.0#123

May 13 11:18:11 grunt-call ntpd[9239]: Listening on interface lo, 127.0.0.1#123

```

... and bios actually has the correct time... go figure

So, while I was never seeing the NVRM: log message before was probably due to the freeze and log file never being flushed to disk before I power-reset.  If this does turn out to be a video card issue, any suggestions on what I should replace it with?  I've always used medium-grade nVidia cards and if I need to replace, I want something that will "rock my world."

Thanks again!

----------

## troymc

My google'ing seems to indicate that this may be a driver bug.

What drivers are you using?

troymc

----------

## Nate_S

I had this exact problem like a year ago.  I finally figured out that my Geforce 5200 and my Via KT333 northbridge weren't getting along, despite the fact that the card worked in every other computer I tried it in.  I finally just gave up and traded it with my brother's ati radeon 9000, which I use to this day (and the 5200 runs beautifully in his box, too... go figure.)  

What northbridge and graphics card do you have out of curisioty?

-Nate

----------

## olger901

What Powersupply do you have?

Not just in terms of the amount of watts but the brand aswell please.

----------

## NeddySeagoon

MadOtis,

Lets see if some remote diagnostics can narrow the problem down a little.

Set up sshd and VNCserver (do not export your root window) on the PC that hangs and a ssh client and VNCviewer on another PC.

Log in remotely ove ssh and start something CPU intensive, like

```
emerge system
```

 After thats been running for a while, so the box is nicely warmed up, open a VNC session and play solitare - you reported that solitare cause lock-ups earler in the thread.

You are using the CPU and memory on the problem machine but not the AGP slot or graphics card. All thats happening there is the graphics card is refreshing the display - nothing new is being drawn. The VNC server is drawing in a piece of main memory and the image is compressed and exported over the network.

I suspect that this will work. If so, the finger points to the combination of North Bridge and Graphics Card.

----------

## MadOtis

It's looking more like the video card and/or drivers now...  I did exactly that (sshd and VNCserver) and no lock ups after about 20 games of solitare and Gimp image manipulation simultaneously.  Went back to the macine in question, exited out of screen blanker, started up Aisle Riot, drug one card and it locked up.

ok, so, now any suggestions on what I can tell my wife as to why I want to spend $400 on a new video card instead of spending $49 at the local wal-mart for one?   :Smile: 

----------

## NeddySeagoon

MadOtis,

What Northbridge and what Video card do you have?

lspci output will probably tell all. Post your motherbaord make and type too please.

There may be a documented hardware bug that prevents them working together properly.

There may also be a workaround but it will probably involve a loss of performance.

Its too soon to think of spending any money yet.

----------

## MadOtis

Drat... burst my dreams of a new "sooper-dooper-gamestravaganza-mondo-mack-daddy" video card!   :Wink: 

Seriously, I have an nVidia card on an nVidia chipset mobo... I figured they should work well together.  Regardless, here is the output of lspci:

```

00:00.0 Host bridge: nVidia Corporation nForce3 250Gb Host Bridge (rev a1)

00:01.0 ISA bridge: nVidia Corporation nForce3 250Gb LPC Bridge (rev a2)

00:01.1 SMBus: nVidia Corporation nForce 250Gb PCI System Management (rev a1)

00:02.0 USB Controller: nVidia Corporation CK8S USB Controller (rev a1)

00:02.1 USB Controller: nVidia Corporation CK8S USB Controller (rev a1)

00:02.2 USB Controller: nVidia Corporation nForce3 EHCI USB 2.0 Controller (rev a2)

00:05.0 Bridge: nVidia Corporation CK8S Ethernet Controller (rev a2)

00:06.0 Multimedia audio controller: nVidia Corporation nForce3 250Gb AC'97 Audio Controller (rev a1)

00:08.0 IDE interface: nVidia Corporation CK8S Parallel ATA Controller (v2.5) (rev a2)

00:09.0 IDE interface: nVidia Corporation CK8S Serial ATA Controller (v2.5) (rev a2)

00:0a.0 IDE interface: nVidia Corporation CK8S Serial ATA Controller (v2.5) (rev a2)

00:0b.0 PCI bridge: nVidia Corporation nForce3 250Gb AGP Host to PCI Bridge (rev a2)

00:0e.0 PCI bridge: nVidia Corporation nForce3 250Gb PCI-to-PCI Bridge (rev a2)

00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration

00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map

00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller

00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control

01:00.0 VGA compatible controller: nVidia Corporation NV31 [GeForce FX 5600 Ultra] (rev a1)

02:06.0 FireWire (IEEE 1394): VIA Technologies, Inc. IEEE 1394 Host Controller (rev 46)

```

Hopefully there is documented or easy fix.

Thanks again!

----------

## NeddySeagoon

MadOtis,

Google is silent on problems with your chipset/graphics card combination.

Here are some low cost things to try, in increasing order of expense ...

Look in /proc/driver/nvidia/agp/status, you will find something like

```
Status:          Enabled

Driver:          AGPGART

AGP Rate:        8x

Fast Writes:     Disabled

SBA:             Enabled
```

Try different drivers, you have the choice of agpgart provided by the kernel or by the nVidia module. You may need a kernel rebuild, since the nVidia provided agpgart cannot run if the kernel module is loaded.

Try different APG rates - I think this is either a BIOS oprion or an nvidia kernel module option.  Look in /etc/modules.d/nvidia

Turn off fast writes if its on. Its unreliable on many motherboards.

Try various Side BAnd Addressing (SBA) setting. This is really and EMC measure but its worth trying.

After that - try replacing the power suppy. If you can borrow one, do that first, since a PSU replacement is probably less time consuming.

A marginal or cheap PSU can cause the symptoms you report. What PSU - make and model do you have now?

You need to look at it to determine that.

Now it gets really expensive. Its 50/50 your graphics card or motherboard. Possibly static damage to one or the other.

Can you try the card in another PC. If it works, it doesn't prove anything

----------

## zxy

I encountered almost the same problem on my brothers computer.

It freezes from time to time.  Sometimes only amule runs, today it froze when he ran unrar with amule in background.

Here is the part from /var/log/messages

```
Sep 15 03:01:01 localhost cron[5726]: (root) CMD (rm -f /var/spool/cron/lastrun/cron.daily)

Sep 15 03:10:01 localhost cron[5728]: (root) CMD (test -x /usr/sbin/run-crons && /usr/sbin/run-crons )

Sep 15 03:20:01 localhost cron[5750]: (root) CMD (test -x /usr/sbin/run-crons && /usr/sbin/run-crons )

Sep 15 03:30:01 localhost cron[5762]: (root) CMD (test -x /usr/sbin/run-crons && /usr/sbin/run-crons )

Sep 15 09:18:55 localhost syslog-ng[3654]: syslog-ng version 1.6.11 starting

Sep 15 09:18:55 localhost syslog-ng[3654]: Changing permissions on special file /dev/tty12

Sep 15 09:18:55 localhost 000003000 end: 0000000017ff3000 type: 4

```

and here is the second 

```
Sep 15 09:20:01 localhost cron[5115]: (root) CMD (test -x /usr/sbin/run-crons && /usr/sbin/run-crons )

Sep 15 09:30:01 localhost cron[5132]: (root) CMD (test -x /usr/sbin/run-crons && /usr/sbin/run-crons )

Sep 15 09:37:03 localhost syslog-ng[3760]: syslog-ng version 1.6.11 starting

Sep 15 09:37:03 localhost syslog-ng[3760]: Changing permissions on special file /dev/tty12

Sep 15 09:37:04 localhost 000003000 end: 0000000017ff3000 type: 4

Sep 15 09:37:04 localhost copy_e820_map() start: 0000000017ff3000 size: 000000000000d000 end: 0000000018000000 type: 3

Sep 15 09:37:04 localhost copy_e820_map() start: 00000000ffff0000 size: 0000000000010000 end: 0000000100000000 type: 2

```

It always freezes when cron is active. 

Any ideas what would be the problem and possibly solution here?

zxy

----------

## zxy

I just checked out the cron jobs.

Daily cronjobs were prelink and slocate,   and monthly was makewhatis

Maybe prelink and slocate with amule together made the problem. Could be something with the hard drive, because at that time harddrive was used extensively.

----------

