# obscure bug in nvidia-drivers or udev?

## Bloodsurfer

Well, I don't know wether it is nvidia's, udev's, the kernel's or even my hardware's fault, but I thought starting here first would be appropriate.

Sometimes when booting my system, exactly at the moment udev is inserting the nvidia module, a strange bug appears. It does happen, maybe once in a week or so. Once every few days. After a reboot it's gone.

When that bug appears, it will continue to boot, but xorg won't start, instead I only get a black screen with a (not blinking) cursor. System is still running, I can shut it down by pressing the power off button, screen does stay black but it does shut down after a few seconds.

Here is a picture I've made of the error: http://img293.imageshack.us/img293/3817/nvidiafehleruq8.jpg

I didn't find something similar, neither here nor with google.

Well, here a few generic infos for you:

```
emerge --info

Portage 2.1.2-r10 (default-linux/amd64/2006.1/desktop, gcc-4.1.2, glibc-2.5-r0, 2.6.20-gentoo x86_64)

=================================================================

System uname: 2.6.20-gentoo x86_64 Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz

Gentoo Base System release 1.12.9

Timestamp of tree: Wed, 21 Feb 2007 12:20:01 +0000

dev-java/java-config: 1.3.7, 2.0.31-r3

dev-lang/python:     2.4.4

dev-python/pycrypto: 2.0.1-r5

sys-apps/sandbox:    1.2.18.1

sys-devel/autoconf:  2.13, 2.61

sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10

sys-devel/binutils:  2.17

sys-devel/gcc-config: 1.3.14

sys-devel/libtool:   1.5.23b

virtual/os-headers:  2.6.20

ACCEPT_KEYWORDS="amd64 ~amd64"

AUTOCLEAN="yes"

CBUILD="x86_64-pc-linux-gnu"

CFLAGS="-O2 -pipe -march=nocona -fomit-frame-pointer"

CHOST="x86_64-pc-linux-gnu"

CONFIG_PROTECT="/etc /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/share/X11/xkb /usr/share/config"

CONFIG_PROTECT_MASK="/etc/env.d /etc/env.d/java/ /etc/gconf /etc/java-config/vms/ /etc/revdep-rebuild /etc/terminfo"

CXXFLAGS="-O2 -pipe -march=nocona -fomit-frame-pointer"

DISTDIR="/usr/portage/distfiles"

FEATURES="autoconfig distlocks metadata-transfer parallel-fetch sandbox sfperms strict"

GENTOO_MIRRORS="http://ftp.uni-erlangen.de/pub/mirrors/gentoo ftp://ftp.uni-erlangen.de/pub/mirrors/gentoo ftp://ftp.wh2.tu-dresden.de/pub/mirrors/gentoo http://mirrors.sec.informatik.tu-darmstadt.de/gentoo/ http://ftp-stud.fht-esslingen.de/pub/Mirrors/gentoo/ ftp://ftp-stud.fht-esslingen.de/pub/Mirrors/gentoo/ "

LINGUAS="de"

MAKEOPTS="-j3"

PKGDIR="/usr/portage/packages"

PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"

PORTAGE_TMPDIR="/usr/portage/tmp"

PORTDIR="/usr/portage/tree"

PORTDIR_OVERLAY="/usr/portage/local/layman/xeffects /usr/portage/local/layman/berkano /usr/portage/local/mixed"

SYNC="rsync://rsync.europe.gentoo.org/gentoo-portage"

USE="X aac aalib alsa amd64 berkdb bitmap-fonts branding cairo cdr cli color-console cracklib crypt css cups dbus divx dri dv dvd dvdnav dvdr emboss encode esd fam filepicker filter_default firefox flac fortran gdbm gif glitz gpm gtk2 h264 hal iconv imagemagick isdnlog java jpeg kde kdehiddenvisibility kqemu lame ldap libcaca libg++ mad midi mikmod mozbranding moznopango mp3 mp4 mpeg musepack ncurses newspr nls nptl nptlonly nvidia offensive ogg opengl oss pam pcre perl png ppds pppd python qt3 qt4 quicktime readline realmedia reflection sdl session smp spell spl ssl svg tcpd theora truetype truetype-fonts type1-fonts unicode utf8 vcd vorbis wmp x264 xml xorg xscreensaver xv xvid zlib" ALSA_CARDS="hda-intel" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route share shm softvol" ELIBC="glibc" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="de" USERLAND="GNU" VIDEO_CARDS="nvidia vesa fbdev nv"

Unset:  CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, LC_ALL, LDFLAGS, PORTAGE_RSYNC_EXTRA_OPTS

```

System is Core2Duo E6600, 2GB Corsair RAM, nvidia GeForce 7900 GS, passively cooled, on an Asus P5B deluxe mainboard.

Using nvidia-drivers-1.0.9746.

Does anyone have a clue what this could be? It is a stable system, even if running two days and emerging lots of stuff, nothing happens, it doesn't crash. This bug while booting is the only one, when it runs it runs.

----------

## mark_alec

Moved from Desktop Environments to Kernel & Hardware.

Looks like a kernel bug to me.

----------

## davidgurvich

Try disabling preemption in the kernel.  Also the follwing post from LKML seems to suggest there may be a hardware problem, at least with the opteron.

http://lkml.org/lkml/2005/6/7/34

 *Quote:*   

> /From  Jacob Martin <>
> 
>   Subject  PROBLEM: OOPSes in PREEMPT SMP for AMD Opteron Dual-Core with Memhole Mapping
> 
>   Date  Tue, 7 Jun 2005 03:18:33 -0400
> ...

 

----------

## Bloodsurfer

Thanks for that quote. Well, it is rather old, I think that bug should be fixed by now. But nonetheless I did disable preemption to do at least something, and will wait now how long the machine runs/boots without that error (didn't appear again since three or four days ago...). Let's hope for the best...

----------

## Bloodsurfer

Nope, disabling preemption didn't help. This morning I had to wait almost ten minutes till my box was finally running, after x reboots...

Another screenshot, more readable than the first one... I had time to make that one because there came a filesystem check after the error  :Laughing: 

I hope very much it is not hardware related, but I think that's hard to imagine, because otherwise it runs just fine and very stable, even if the graphics card and the rest are under full load for hours... It runs for days without crashing.

----------

## Captain Newbie

 *Bloodsurfer wrote:*   

> snip

 

The next message in that thread says it all:

 *Quote:*   

> Reproduce without the nvidia driver please.

 

----------

## Simius

Is your box overclocked?

Maybe you should run memtest86 for a day or so, and then some serious number crunching test for just as long, and see if any errors pop up - that would rule out hardware errors.

 *Bloodsurfer wrote:*   

> Nope, disabling preemption didn't help. This morning I had to wait almost ten minutes till my box was finally running, after x reboots...
> 
> Another screenshot, more readable than the first one... I had time to make that one because there came a filesystem check after the error 
> 
> I hope very much it is not hardware related, but I think that's hard to imagine, because otherwise it runs just fine and very stable, even if the graphics card and the rest are under full load for hours... It runs for days without crashing.

 

----------

## Bloodsurfer

 *Simius wrote:*   

> Is your box overclocked?

 

Nope, everything's standard, nothing overclocked.

I will do that memtest thing, but that has to wait till the weekend...

Can you recommend a special "number cruncher"?  :Wink: 

 *Captain Newbie wrote:*   

> Reproduce without the nvidia driver please.

 

Well, as you'll imagine, that's kind of hard with a bug that only occurs when inserting the nvidia module...  :Confused: 

I'm afraid I can't do that.

----------

## Bloodsurfer

In the meantime I made some memtests and such, all run fine without errors for days...

I also compiled a new kernel (2.6.20-gentoo-r1), installed the new nvidia-drivers and updated the BIOS of my motherboard. 

The problem still persists, same as before.

Playing with my kernel settings (disabling preemption, disabling the kernel-apgart to use the one from nvidia, etc) also didn't help. 

Here's a new screenshot (the old one was rather difficult to read, so I made better one this morning).

----------

## Bloodsurfer

Well, after I tried almost everything I come to the conclusion that this bug seems to be gentoo-sources related.

I tried it this morning with vanilla-sources-2.6.20.2, and the bug does not happen there! I sat there for full 45 minutes doing nothing but reboots, and there was no bug. Not a single oops in over 30-40 reboots. Using the same config (I don't use anything in my config that's not in vanilla-sources too).

Then I rebooted again to gentoo-sources, and there was the oops again at first reboot. So it really seems some of the gentoo-patches are causing this...

----------

## Matteo Azzali

 *Bloodsurfer wrote:*   

> Well, after I tried almost everything I come to the conclusion that this bug seems to be gentoo-sources related.
> 
> 

 

Did you used the same exact config file for both kernels? If not please test.

----------

