# nvidia + X + kernel => high latency/crashing

## redwood

I recently retired my old socket A  2.8Ghz Athlong T-bird into service as an zaptel/asterisk server

and bought a new 4200+ amd64 am2 / ASUS M2N4-SLI / Nvidia GeForce 7300GS / 1G ram barebones computer. 

I put my old hard drives and Turtle Beach sound card into my new computer and recompiled a new i386 kernel for my new system

using nvidia modules for my mb's chipset/bridges/ethernet/AC97sound. I also recompiled nvida-drivers and alsa-driver.

gentoo-sources => 2.6.18-gentoo-r3

nvidia-drivers => 1.0.9631

alsa-driver => 1.0.13

gcc => i686-pc-linux-gnu-4.1.1

CFLAGS="-O2 -march=k8 -pipe"

CHOST="i686-pc-linux-gnu"

CXXFLAGS="${CFLAGS}"

ALSA_CARDS="cs46xx intel8x0"

INPUT_DEVICES="keyboard mouse evdev"

VIDEO_CARDS="nv nvidia vesa"

As soon as I start X the system starts randomly stalling/freezing and the hard disk light stays lit. Sometimes the sytem recovers and I can get ~10 minutes of work done before it again locks up.

Other times I eventually crash and corrupt my ext2 lvm2 volumes /var/tmp /tmp and /mnt/backups.

At first I thought the problem was due to a recent "emerge -uvD world" which upgraded dbus and broke

nearly all of gnome and a good portion of kde. I solved the dbus problem by installing dbus-glib and dbus-qt3-old and doing a complete "revdep-rebuild" followed by an upgrade of hal/pmount to ~x86 testing versions. According to revdep-rebuild my system is now OK.

But as soon as I started X and began some work the system hung for a few minutes.

So I went to another computer on my network and took a look at the tail of dmesg which shows

a lot of the following:

gameport: CS46xx Gameport is pci0000:01:06.0/gameport0, speed 1704kHz

PCI: Setting latency timer of device 0000:05:00.0 to 64

NVRM: loading NVIDIA Linux x86 Kernel Module  1.0-9631  Thu Nov  9 17:38:10 PST 2006

ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x380000 action 0x2

ata4.00: (BMDMA stat 0x20)

ata4.00: tag 0 cmd 0xc8 Emask 0x10 stat 0x51 err 0x84 (ATA bus error)

ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x380000 action 0x2

ata3.00: (BMDMA stat 0x20)

ata3.00: tag 0 cmd 0xc8 Emask 0x10 stat 0x51 err 0x84 (ATA bus error)

ata4: soft resetting port

ata3: soft resetting port

ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)

ata4.00: configured for UDMA/133

ata4: EH complete

ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)

ata3.00: configured for UDMA/133

ata3: EH complete

SCSI device sdc: 488397168 512-byte hdwr sectors (250059 MB)

sdc: Write Protect is off

sdc: Mode Sense: 00 3a 00 00

SCSI device sdc: drive cache: write back

SCSI device sdb: 488397168 512-byte hdwr sectors (250059 MB)

sdb: Write Protect is off

sdb: Mode Sense: 00 3a 00 00

SCSI device sdb: drive cache: write back

ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x380000 action 0x2

ata3.00: (BMDMA stat 0x20)

ata3.00: tag 0 cmd 0xc8 Emask 0x10 stat 0x51 err 0x84 (ATA bus error)

ata3: soft resetting port

ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)

ata3.00: configured for UDMA/133

ata3: EH complete

SCSI device sdb: 488397168 512-byte hdwr sectors (250059 MB)

sdb: Write Protect is off

sdb: Mode Sense: 00 3a 00 00

SCSI device sdb: drive cache: write back

ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x180000 action 0x2 frozen

ata4.00: (BMDMA stat 0x21)

ata4.00: tag 0 cmd 0xc8 Emask 0x4 stat 0x40 err 0x0 (timeout)

ata4: port is slow to respond, please be patient

ata4: port failed to respond (30 secs)

ata4: soft resetting port

ata4: port is slow to respond, please be patient

ata4: port failed to respond (30 secs)

ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)

ATA: abnormal status 0xD0 on port 0x967

ATA: abnormal status 0xD0 on port 0x967

ATA: abnormal status 0xD0 on port 0x967

ATA: abnormal status 0xD0 on port 0x967

ATA: abnormal status 0xD0 on port 0x967

ata4.00: qc timeout (cmd 0xec)

ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4)

ata4.00: revalidation failed (errno=-5)

ata4: failed to recover some devices, retrying in 5 secs

ata4: hard resetting port

ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)

ata4.00: configured for UDMA/133

ata4: EH complete

SCSI device sdc: 488397168 512-byte hdwr sectors (250059 MB)

sdc: Write Protect is off

sdc: Mode Sense: 00 3a 00 00

SCSI device sdc: drive cache: write back

ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x180000 action 0x2 frozen

ata3.00: (BMDMA stat 0x21)

ata3.00: tag 0 cmd 0xc8 Emask 0x4 stat 0x40 err 0x0 (timeout)

ata3: port is slow to respond, please be patient

ata3: port failed to respond (30 secs)

ata3: soft resetting port

ata3: port is slow to respond, please be patient

ata3: port failed to respond (30 secs)

ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)

ATA: abnormal status 0xD0 on port 0x9E7

ATA: abnormal status 0xD0 on port 0x9E7

ATA: abnormal status 0xD0 on port 0x9E7

ATA: abnormal status 0xD0 on port 0x9E7

ATA: abnormal status 0xD0 on port 0x9E7

ata3.00: qc timeout (cmd 0xec)

ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4)

ata3.00: revalidation failed (errno=-5)

ata3: failed to recover some devices, retrying in 5 secs

ata3: hard resetting port

ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)

ata3.00: configured for UDMA/133

ata3: EH complete

SCSI device sdb: 488397168 512-byte hdwr sectors (250059 MB)

sdb: Write Protect is off

sdb: Mode Sense: 00 3a 00 00

SCSI device sdb: drive cache: write back

These problems with my hard drives locking up only seems to happen when I'm running X.

I've tried kde/gnome/xfce4/fluxbox/twm with the exact same problem.

THANKS in advance for any ideas/suggestions on what is amiss.

Here's some more info on my system:

# lspci

00:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3)

00:01.0 ISA bridge: nVidia Corporation CK804 ISA Bridge (rev f3)

00:01.1 SMBus: nVidia Corporation CK804 SMBus (rev a2)

00:02.0 USB Controller: nVidia Corporation CK804 USB Controller (rev a2)

00:02.1 USB Controller: nVidia Corporation CK804 USB Controller (rev a3)

00:04.0 Multimedia audio controller: nVidia Corporation CK804 AC'97 Audio Controller (rev a2)

00:06.0 IDE interface: nVidia Corporation CK804 IDE (rev f2)

00:07.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3)

00:08.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3)

00:09.0 PCI bridge: nVidia Corporation CK804 PCI Bridge (rev f2)

00:0a.0 Bridge: nVidia Corporation CK804 Ethernet Controller (rev f3)

00:0b.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev f3)

00:0c.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev f3)

00:0d.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev f3)

00:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)

00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration

00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map

00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller

00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control

01:06.0 Multimedia audio controller: Cirrus Logic CS 4614/22/24/30 [CrystalClear SoundFusion Audio Accelerator] (rev 01)

05:00.0 VGA compatible controller: nVidia Corporation GeForce 7300 GS (rev a1)

# cat /proc/interrupts

           CPU0       CPU1

  0:    5671297     187676          XT-PIC  timer

  1:       3238          9    IO-APIC-edge  i8042

  6:          0          3    IO-APIC-edge  floppy

  8:          1          1    IO-APIC-edge  rtc

  9:          0          0   IO-APIC-level  acpi

 12:      12103        115    IO-APIC-edge  i8042

 14:        177         12    IO-APIC-edge  ide0

 15:        273         12    IO-APIC-edge  ide1

 50:          0          0   IO-APIC-level  ehci_hcd:usb1

 58:          0          0   IO-APIC-level  CS46XX

217:     593317          1   IO-APIC-level  ohci_hcd:usb2, eth0

225:       1340          5   IO-APIC-level  libata, NVidia CK804

233:      52406          9   IO-APIC-level  libata

NMI:          0          0

LOC:    5858860    5858859

ERR:          0

MIS:          0

# lsmod

Module                  Size  Used by

nvidia               4705972  0

snd_cs46xx             71624  0

snd_intel8x0           26140  0

snd_ac97_codec         84004  2 snd_cs46xx,snd_intel8x0

snd_ac97_bus            2304  1 snd_ac97_codec

snd_seq_midi            6048  0

snd_pcm_oss            34976  0

snd_mixer_oss          13568  1 snd_pcm_oss

snd_seq_oss            27392  0

snd_seq_midi_event      5888  2 snd_seq_midi,snd_seq_oss

snd_seq                41808  5 snd_seq_midi,snd_seq_oss,snd_seq_midi_event

via_rhine              18056  0

snd_rawmidi            17440  2 snd_cs46xx,snd_seq_midi

snd_seq_device          6284  4 snd_seq_midi,snd_seq_oss,snd_seq,snd_rawmidi

snd_pcm                59140  4 snd_cs46xx,snd_intel8x0,snd_ac97_codec,snd_pcm_oss

snd_timer              16772  2 snd_seq,snd_pcm

i2c_nforce2             6016  0

snd                    40036  11 snd_cs46xx,snd_intel8x0,snd_ac97_codec,snd_pcm_oss,snd_mixer_oss,snd_seq_oss,snd_seq,snd_rawmidi,snd_seq_device,snd_pcm,snd_timer

soundcore               7136  1 snd

snd_page_alloc          7304  3 snd_cs46xx,snd_intel8x0,snd_pcm

# emerge --info

Portage 2.1.2_rc3-r3 (default-linux/x86/2006.0, gcc-4.1.1, glibc-2.4-r4, 2.6.18-gentoo-r3 i686)

=================================================================

System uname: 2.6.18-gentoo-r3 i686 AMD Athlon(tm) 64 X2 Dual Core Processor 4200+

Gentoo Base System version 1.12.6

Last Sync: Thu, 14 Dec 2006 11:00:01 +0000

ccache version 2.3 [enabled]

dev-java/java-config: 1.3.7, 2.0.30

dev-lang/python:     2.4.3-r4

dev-python/pycrypto: 2.0.1-r5

dev-util/ccache:     2.3

sys-apps/sandbox:    1.2.17

sys-devel/autoconf:  2.13, 2.60

sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2

sys-devel/binutils:  2.16.1-r3

sys-devel/gcc-config: 1.3.14

sys-devel/libtool:   1.5.22

virtual/os-headers:  2.6.17-r2

ACCEPT_KEYWORDS="x86"

AUTOCLEAN="yes"

CBUILD="i686-pc-linux-gnu"

CFLAGS="-O2 -march=k8 -pipe"

CHOST="i686-pc-linux-gnu"

CONFIG_PROTECT="/etc /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/lib/fax /usr/lib/mozilla/defaults/pref /usr/share/X11/xkb /usr/share/config /var/spool/fax/etc"

CONFIG_PROTECT_MASK="/etc/env.d /etc/env.d/java/ /etc/gconf /etc/java-config/vms/ /etc/revdep-rebuild /etc/splash /etc/terminfo /etc/texmf/web2c"

CXXFLAGS="-O2 -march=k8 -pipe"

DISTDIR="/usr/portage/distfiles"

FEATURES="autoconfig ccache distcc distlocks loadpolicy metadata-transfer parallel-fetch sandbox sfperms"

GENTOO_MIRRORS="http://gentoo.osuosl.org/ http://distfiles.xgl-coffee.org/ http://www.schokokeks.org/~hanno/snapshots ftp://ftp.gtlib.cc.gatech.edu/pub/gentoo ftp://ftp.ussg.iu.edu/pub/linux/gentoo ftp://ftp.ucsb.edu/pub/mirrors/linux/gentoo/ http://gentoo.seren.com/gentoo http://gentoo.chem.wisc.edu/gentoo/ ftp://gentoo.chem.wisc.edu/gentoo/ http://cudlug.cudenver.edu/gentoo/ ftp://cudlug.cudenver.edu/pub/mirrors/distributions/gentoo/ http://gentoo.mirrors.pair.com/ ftp://gentoo.mirrors.pair.com/ http://gentoo.ccccom.com ftp://gentoo.ccccom.com http://gentoo.mirrors.tds.net/gentoo ftp://gentoo.mirrors.tds.net/gentoo http://gentoo.netnitco.net ftp://gentoo.netnitco.net/pub/mirrors/gentoo/source/ http://mirror.tucdemonic.org/gentoo/ http://mirrors.acm.cs.rpi.edu/gentoo/ ftp://ftp.ndlug.nd.edu/pub/gentoo/ ftp://gentoo.agsn.ca/ http://open-systems.ufl.edu/mirrors/gentoo http://gentoo.llarian.net/ ftp://gentoo.llarian.net/pub/gentoo http://gentoo.binarycompass.org http://gentoo.mirrored.ca/ ftp://gentoo.mirrored.ca/ http://mirror.datapipe.net/gentoo http://mirror.datapipe.net/gentoo http://gentoo.eliteitminds.com http://gentoo.cs.lewisu.edu/gentoo/ ftp://linux.cs.lewisu.edu/gentoo/ http://prometheus.cs.wmich.edu/gentoo http://modzer0.cs.uaf.edu/public/gentoo/ http://mirror.usu.edu/mirrors/gentoo/ ftp://mirror.usu.edu/mirrors/gentoo/ http://lug.mtu.edu/gentoo"

MAKEOPTS="-j2"

PKGDIR="/usr/portage/packages"

PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"

PORTAGE_TMPDIR="/var/tmp"

PORTDIR="/usr/portage"

PORTDIR_OVERLAY="/usr/local/portage /usr/portage/local/layman/xeffects"

SYNC="rsync://deeds.acjlaw.net/gentoo-portage"

USE="x86 16bit 3dnow 3dnowext 7zip S3TC X X509 Xaw3d a52 aac aalib acl acpi activefilter akode alsa alsa_cards_cs46xx alsa_cards_intel8x0 amd ao aotuv apache2 apm arts artswrappersuid artworkextra asf audiofile bash-completion bdf beagle berkdb bidi bigpatch bitmap-fonts bittorrent bl bonobo cairo cdda cdf cdio cdparanoia cdr cdrom cgi chroot cjk clamav clamd clanJavaScript cli corba cpudetection cracklib crypt css cups curlwrappers dbus dbx dga dhcp dillo dio directfb dlloader dmi dpms dri dts dv dvb dvbplayer dvd dvdr dvdread dynagraph eap-tls ecc edl eds effects elibc_glibc emboss emoticon encode enscript epiphany epson escreen esd evo exif exscalibar fame fastcgi fat fb ffmpeg firefox flac flash flatfile fmod font-server fontconfig foomaticdb fortran fping fpx ftp gb gcj gdbm gif gimp gimpprint glitz gnokii gnome gnustep gnutls gphoto2 gpm graphviz gs gstreamer gstreamer010 gstreamer08 gtk gtk2 gtkhtml gzip hal hardened hardenedphp hash hbci hddtemp hdf hdf5 hfs hlapi hpn iconv id3 idn imagemagick imap imlib inkjar input_devices_evdev input_devices_keyboard input_devices_mouse ipod ipv6 isdnlog jack jack-tmpfs java javascript jbig jce jfs jikes joystick jpeg jpeg2k jumpplay justify kde kdeenablefinal kdepim kdexdeltas kerberos kernel_linux kexi kipi koffice-plugin kqemu krb4 ladcca ladspa lame lapack lcms libcaca libclamav libg++ libgda libsamplerate libwww lids live lm_sensors logitech-mouse logrotate lpr lua lzo lzw mad maildir math matroska maya-shaderlibrary mbox mbrola mcve md5sum menubar mgetty mhash mikmod mime ming mjpeg mmx mmxext mod modplug motif mozsvg mp3 mp4 mp4live mpd-mad mpeg mpeg2 mpi mplayer musicbrainz mysql mysqli nas ncurses netjack netpbm network nextaw nforce2 nfs nls nptl nptlonly nsplugin ntfs nvidia odbc ofx ogg oggvorbis ole on-the-fly-crypt openal openexr opengl oss pam panel-plugin pcre pda pdf pdfkit perl pfpro php plotutils plugin pmu png posix postgres povray ppds pppd preview-latex python qemu-fast qt3 qt4 quicktime quotas quotes rar rc5 rdesktop readline real reflection reiser4 reiserfs rtc rtsp sasl sblive scanner sdl sensord server session seti setup-plugin sftp sftplogging shout silverxp skins slp smartcard sms sndfile sockets sox speedo spell spl spreadsheet sql sqlite sqlite3 sse sse2 ssl stream submenu subtitles svg svga svgz tcpd tetex tga theora thesaurus threads thumbnail thunar-vfs tidy tiff timidity tokenizer tomsfastmath toolbar transcode truetype truetype-fonts type1 type1-fonts udev unicode userland_GNU v4l vcd vcdimager vdr vfat vhosts video_cards_nv video_cards_nvidia video_cards_vesa videos vidix vim vim-with-x visualization vmdbmysql vmdbpostgres vorbis win32codecs wma wordperfect wsconvert wv x264 xanim xattr xcomposite xfs xine xml xorg xpm xprint xscreensaver xsettings xv xvid xvmc yaepg yv12 zlib"

Unset:  CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, LC_ALL, LDFLAGS, LINGUAS, PORTAGE_RSYNC_EXTRA_OPTS

----------

## yabbadabbadont

Try using the 'nv' driver instead of 'nvidia' and be sure to use eselect to change your opengl to the xorg version.  If it helps, then at least you know where to start experimenting to get a solution.

----------

## redwood

I regenerated a new xorg.conf "Xorg --configure"

and mv'd xorg.conf.new /etc/X11/xorg.conf

Then I started up an xfce4 session.

Next I opened up an xterm and tried an "emerge -puvDt world" to get some disk activity.

and watched `tail -f /var/log/everything/current`:

Dec 15 01:39:27 [kernel] ata4.00: limiting speed to UDMA/66

Dec 15 01:39:27 [kernel] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x180000 action 0x2 frozen

Dec 15 01:39:27 [kernel] ata4.00: (BMDMA stat 0x21)

Dec 15 01:39:27 [kernel] ata4.00: tag 0 cmd 0xc8 Emask 0x4 stat 0x40 err 0x0 (timeout)

Dec 15 01:39:34 [kernel] ata4: port is slow to respond, please be patient

Dec 15 01:39:57 [kernel] ata4: port failed to respond (30 secs)

Dec 15 01:39:57 [kernel] ata4: soft resetting port

Dec 15 01:40:01 [cron] (root) CMD (test -x /usr/sbin/run-crons && /usr/sbin/run-crons )

Dec 15 01:40:04 [kernel] ata4: port is slow to respond, please be patient

Dec 15 01:40:27 [kernel] ata4: port failed to respond (30 secs)

Dec 15 01:40:27 [kernel] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)

Dec 15 01:40:27 [kernel] ATA: abnormal status 0xD0 on port 0x967

                - Last output repeated 4 times -

Dec 15 01:40:57 [kernel] ata4.00: qc timeout (cmd 0xec)

Dec 15 01:40:57 [kernel] ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4)

Dec 15 01:40:57 [kernel] ata4.00: revalidation failed (errno=-5)

Dec 15 01:40:57 [kernel] ata4: failed to recover some devices, retrying in 5 secs

Dec 15 01:41:02 [kernel] ata4: hard resetting port

Dec 15 01:41:03 [kernel] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)

Dec 15 01:41:03 [kernel] ata4.00: configured for UDMA/66

Dec 15 01:41:03 [kernel] ata4: EH complete

Dec 15 01:41:03 [kernel] SCSI device sdc: 488397168 512-byte hdwr sectors (250059 MB)

Dec 15 01:41:03 [kernel] sdc: Write Protect is off

Dec 15 01:41:03 [kernel] SCSI device sdc: drive cache: write back

And some more dmesg output:

ata3: COMRESET failed (device not ready)

ata3: hardreset failed, retrying in 5 secs

ata3: hard resetting port

ata3: port is slow to respond, please be patient

ata3: port failed to respond (30 secs)

ata3: COMRESET failed (device not ready)

ata3: reset failed, giving up

ata3.00: disabled

ata3: EH complete

sd 2:0:0:0: SCSI error: return code = 0x00040000

end_request: I/O error, dev sdb, sector 35252182

sd 2:0:0:0: SCSI error: return code = 0x00040000

end_request: I/O error, dev sdb, sector 70644942

Buffer I/O error on device dm-5, logical block 1053

lost page write due to I/O error on dm-5

sd 2:0:0:0: SCSI error: return code = 0x00040000

end_request: I/O error, dev sdb, sector 70886502

Buffer I/O error on device dm-5, logical block 61456

lost page write due to I/O error on dm-5

sd 2:0:0:0: SCSI error: return code = 0x00040000

end_request: I/O error, dev sdb, sector 76407982

Buffer I/O error on device dm-6, logical block 393241

lost page write due to I/O error on dm-6

sd 2:0:0:0: SCSI error: return code = 0x00040000

end_request: I/O error, dev sdb, sector 76408014

Buffer I/O error on device dm-6, logical block 393245

lost page write due to I/O error on dm-6

Buffer I/O error on device dm-6, logical block 393246

lost page write due to I/O error on dm-6

Buffer I/O error on device dm-6, logical block 393247

lost page write due to I/O error on dm-6

sd 2:0:0:0: SCSI error: return code = 0x00040000

end_request: I/O error, dev sdb, sector 76417382

Buffer I/O error on device dm-6, logical block 395611

lost page write due to I/O error on dm-6

sd 2:0:0:0: SCSI error: return code = 0x00040000

end_request: I/O error, dev sdb, sector 76432854

sd 2:0:0:0: SCSI error: return code = 0x00040000

end_request: I/O error, dev sdb, sector 76449574

sd 2:0:0:0: SCSI error: return code = 0x00040000

end_request: I/O error, dev sdb, sector 76457318

sd 2:0:0:0: SCSI error: return code = 0x00040000

end_request: I/O error, dev sdb, sector 76468070

Buffer I/O error on device dm-6, logical block 408372

lost page write due to I/O error on dm-6

Aborting journal on device dm-5.

ReiserFS: dm-0: warning: vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [21119 260 0x0 SD]

ext3_abort called.

EXT3-fs error (device dm-5): ext3_journal_start_sb: Detected aborted journal

Remounting filesystem read-only

sd 2:0:0:0: SCSI error: return code = 0x00040000

end_request: I/O error, dev sdb, sector 76469094

Buffer I/O error on device dm-6, logical block 408690

lost page write due to I/O error on dm-6

ata3: exception Emask 0x10 SAct 0x0 SErr 0x150000 action 0x2 frozen

sd 2:0:0:0: rejecting I/O to offline device

sd 2:0:0:0: rejecting I/O to offline device

sd 2:0:0:0: rejecting I/O to offline device

Buffer I/O error on device dm-6, logical block 409085

lost page write due to I/O error on dm-6

sd 2:0:0:0: rejecting I/O to offline device

sd 2:0:0:0: rejecting I/O to offline device

sd 2:0:0:0: rejecting I/O to offline device

sd 2:0:0:0: rejecting I/O to offline device

sd 2:0:0:0: rejecting I/O to offline device

sd 2:0:0:0: rejecting I/O to offline device

sd 2:0:0:0: rejecting I/O to offline device

sd 2:0:0:0: rejecting I/O to offline device

sd 2:0:0:0: rejecting I/O to offline device

sd 2:0:0:0: rejecting I/O to offline device

sd 2:0:0:0: rejecting I/O to offline device

sd 2:0:0:0: rejecting I/O to offline device

sd 2:0:0:0: rejecting I/O to offline device

sd 2:0:0:0: rejecting I/O to offline device

sd 2:0:0:0: rejecting I/O to offline device

sd 2:0:0:0: rejecting I/O to offline device

sd 2:0:0:0: rejecting I/O to offline device

sd 2:0:0:0: rejecting I/O to offline device

sd 2:0:0:0: rejecting I/O to offline device

sd 2:0:0:0: rejecting I/O to offline device

sd 2:0:0:0: rejecting I/O to offline device

sd 2:0:0:0: rejecting I/O to offline device

sd 2:0:0:0: rejecting I/O to offline device

sd 2:0:0:0: rejecting I/O to offline device

sd 2:0:0:0: rejecting I/O to offline device

sd 2:0:0:0: rejecting I/O to offline device

sd 2:0:0:0: rejecting I/O to offline device

sd 2:0:0:0: rejecting I/O to offline device

sd 2:0:0:0: rejecting I/O to offline device

sd 2:0:0:0: rejecting I/O to offline device

sd 2:0:0:0: rejecting I/O to offline device

sd 2:0:0:0: rejecting I/O to offline device

sd 2:0:0:0: rejecting I/O to offline device

sd 2:0:0:0: rejecting I/O to offline device

sd 2:0:0:0: rejecting I/O to offline device

sd 2:0:0:0: rejecting I/O to offline device

sd 2:0:0:0: rejecting I/O to offline device

sd 2:0:0:0: rejecting I/O to offline device

sd 2:0:0:0: SCSI error: return code = 0x00010000

end_request: I/O error, dev sdb, sector 76470118

sd 2:0:0:0: rejecting I/O to offline device

sd 2:0:0:0: rejecting I/O to offline device

sd 2:0:0:0: rejecting I/O to offline device

raid1: Disk failure on sdb3, disabling device.

        Operation continuing on 1 devices

sd 2:0:0:0: rejecting I/O to offline device

sd 2:0:0:0: rejecting I/O to offline device

__journal_remove_journal_head: freeing b_committed_data

sd 2:0:0:0: rejecting I/O to offline device

EXT3-fs error (device dm-5): ext3_find_entry: reading directory #327701 offset 0

RAID1 conf printout:

 --- wd:1 rd:2

 disk 0, wo:1, o:0, dev:sdb3

 disk 1, wo:0, o:1, dev:sdc3

RAID1 conf printout:

 --- wd:1 rd:2

 disk 1, wo:0, o:1, dev:sdc3

sd 2:0:0:0: rejecting I/O to offline device

sd 2:0:0:0: rejecting I/O to offline device

REISERFS: abort (device dm-3): Journal write error in flush_commit_list

REISERFS: Aborting journal for filesystem on dm-3

sd 2:0:0:0: rejecting I/O to offline device

sd 2:0:0:0: rejecting I/O to offline device

ata3: port is slow to respond, please be patient

ata3: port failed to respond (30 secs)

ata3: soft resetting port

ata3: port is slow to respond, please be patient

ata3: port failed to respond (30 secs)

ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)

ata3: EH pending after completion, repeating EH (cnt=4)

ata3: EH complete

ata3.00: detaching (SCSI 2:0:0:0)

So I'm going to reboot this system before my volumes are trashed.

----------

## Dominique_71

It look like a hardware problem to me. proc/interrupts show you at nvidia share its IRQ with libata. You must try to change the IRQ setting in your bios and/or move around some card(s) in your box. Further reading: http://www.gentoo.org/doc/en/articles/hardware-stability-p2.xml

----------

## redwood

I upgraded to the Beta nvidia-drivers.

I did still get a system error message:

Dec 18 12:29:19 [kernel] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x380000 act

ion 0x2

Dec 18 12:29:19 [kernel] ata4.00: (BMDMA stat 0x20)

Dec 18 12:29:19 [kernel] ata4.00: tag 0 cmd 0xc8 Emask 0x10 stat 0x51 err 0x84 (

ATA bus error)

Dec 18 12:29:19 [kernel] ata4: soft resetting port

But the sata harddrive recovered and seems to be working without further errors in

'tail -f /var/log/everyything/current'

I'll keep monitoring it for a while though.

I don't know that libata is sharing an irq with the video card.

From reading in a forum, I learned that not all video cards require an irq.

the "NVIDIA CK804" may be the AC'97 onboard sound (it's kind of confusing since all the

mobo chips are "nforce4 CK804")

I also tried the VESA driver, and everything worked OK without harddisk problems, but graphics was

very, very slow. I could not configure xorg.conf with "nv". All I got was a black screen with no "X"

----------

## Dominique_71

If you are using the nvidia driver, it will use an IRQ (needed for the 3D for what I know). But the 2D nv driver will not use any IRQ.

IRQs in a PC are a mess. The first 8080 PC was having a PIC interface with only 8 IRQ chanels. With the 286, this PIC interface was upgraded to 16 IRQ. It is not much in modern PCs, so we have now the APIC interface that can manage more IRQ. The problem with the APIC is at it is not a completly new interface but a new level over the PIC. I thing at it is why you have this shared IRQ.

If you take a look in your bios, you will see at the IRQ bios setting know only about the PIC (16 IRQs). So, if you want to try to trim this issue, and I recommand you to do so, you will get a more reliable and stable system, the best thing to do is to disable the APIC in linux with noapic in grub or by disable it in the bios. So, you will find the same IRQ in the bios as in /proc/interupts, and it will be easier to find its way thru this problem. When done, it is up to you to use the PIC or the APIC. Some peoples are saying at the APIC interface have more overun as the PIC.

A simple way to win an IRQ is to add an acpi=off boot parameter in grub. It will completly disable the ACPI and will not work with a laptop.

----------

## kamelli952

Try using the 2.6.19.1 kernel. I had about the same problem, and that seemed to solve it.

----------

## Stolz

 *redwood wrote:*   

> As soon as I start X the system starts randomly stalling/freezing and the hard disk light stays lit. Sometimes the sytem recovers and I can get ~10 minutes of work done before it again locks up.
> 
> Other times I eventually crash and corrupt my ext2 lvm2 volumes /var/tmp /tmp and /mnt/backups.

 

I was having similar problems, but I didn't see any errors on dmesg. The problem was >=2.618 forces IOMMU, and IOMMU forces kernel's AGP. The solution is in this post.

Hope it helps.

--Stolz

----------

## redwood

I upgraded the kernel to gentoo-sources 2.6.19-r2

and the M2N4-SLI bios to version 704 

(which did not go off without glitches. After using the EZ-Flash Update bios utility, upon reboot the system would only beep -- I had to unplug the computer, remove the battery, re-jumpter the BIOS pins to reset the flash ram. After doing all this and turning the

system back on it finally booted with the new 704 bios)

I also probably made some changes to the kernel configuration, but I've finally got a system that seems to be running X without trashing my filesystems.

Today (via ssh)  I'm getting some error messages from dmesg like the following:

ReiserFS: dm-13: warning: vs-13075: reiserfs_read_locked_inode: dead inode read from disk [295593 297014 0x0 SD]. This is likely to be race with knfsd. Ignore

audacious invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0

 [<c013ac4d>] out_of_memory+0x6c/0x18f

 [<c013c0b3>] __alloc_pages+0x1fa/0x284

 [<c013d590>] __do_page_cache_readahead+0xbd/0x1e8

 [<c04bf15a>] io_schedule+0x26/0x30

 [<c041622a>] <4>metalog invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0

dm_table_any_congested+0x32/0x48

 [<c0414881>] dm_any_congested+0x2f/0x35

 [<c013a0aa>] filemap_nopage+0x176/0x348

 [<c013ac4d>]  [<c01426bc>] __handle_mm_fault+0x166/0x7a0

 [<c012d334>] out_of_memory+0x6c/0x18f

hrtimer_try_to_cancel+0x3c/0x42

 [<c013c0b3>] __alloc_pages+0x1fa/0x284

 [<c012d44e>] hrtimer_nanosleep+0x3d/0xf0

 [<c013d590>]  [<c0112a78>] do_page_fault+0x219/0x51d

 [<c012d1ce>] __do_page_cache_readahead+0xbd/0x1e8

hrtimer_wakeup+0x0/0x18

 [<c0425c1c>]  [<c011285f>] sock_aio_write+0xf6/0x102

do_page_fault+0x0/0x51d

 [<c04c0679>] error_code+0x39/0x40

 [<c013a0aa>]  =======================

Mem-info:

DMA per-cpu:

CPU    0: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0

CPU    1: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0

Normal per-cpu:

CPU    0: Hot: hi:  186, btch:  31 usd:   4   Cold: hi:   62, btch:  15 usd:  15

CPU    1: Hot: hi:  186, btch:  31 usd:  30   Cold: hi:   62, btch:  15 usd:  47

Active:104768 inactive:105694 dirty:0 writeback:0 unstable:0 free:1822 slab:4282 mapped:115 pagetables:1093

filemap_nopage+0x176/0x348

DMA free:3544kB min:68kB low:84kB high:100kB active:3608kB inactive:3408kB present:16256kB pages_scanned:13290 all_unreclaimable? yes

 [<c01426bc>] lowmem_reserve[]:__handle_mm_fault+0x166/0x7a0

 [<c012ae9c>]  0 873

But at least the system isn't crashing. I had been running audacious + audacity just fine before the holidays (but from the above dmesg, maybe something's crashed? -- I'll see when I get back into the office)

Following the Gentoo Wiki on installing ardour + jack, 

I tried to compile a kernel with "realtime lsm" built as a module, but kept getting a kernel panic during boot

despite adding the modules "capability" and "realtime" to /etc/modules.autoload/kernel-2.6

----------

## Dominique_71

If you want to do serious audio work, I can recommand you to install a kernel from this overlay: Pro-Audio Gentoo Overlay Wiki forum thread. Both the realtime-lsm and rlimits (with and without pam) are supported to manage the priorities. But be aware at you will get in trouble with such a rt kernel and your shared IRQ. Both 2.6.16-rt29 and 2.6.19-rt15 are working fine in my box with the rt-lsm, gensplash and the nvidia driver (2.6.19-rt15 don't work with the alsa-driver but work fine with the in-kernel alsa driver. 2.6.16-rt29 work fine with both alsa drivers.).

Capability and realtime (realcap on 2.6.19) must be build as modules, but only realtime (or realcap) must be in /etc/modules.autolaod.d/kernel-2.6. You must add an option to tell the module to use the realtime cap for the audio group and you must be in the audio group:

```
realtime   gid=18
```

----------

