# System unstable

## Lawless

I know it's nearly impossible to tell me how to solve this but I try it anyway.

What I have:

500 mhz VIA C3 Eden machine with new ram modules

What goes wrong:

The system is highly unstable. There are weeks when it crashes 3 to 4 times a day and then there are times when it runs for 20 days or more till the next crash.

And when it crashes it really hangs - no log entry, no kernel panic - the system simply stops.

The system hangs - totally random - while:

- copying large amounts of data over the network (onboard nic)

- plugging in or out some of my usb devices

- working with my wlan stick

- emerge metadata

- playing an mp3 (while playing a wmf stream does work forever)

- doing just nothing!

- a lot of other things...

In short - it crashes in nearly any situation but it's not always reproducable.

What I did so far:

- I built in new ram

- I ran memtest and others for hours and days without a single error

- I ran stress with cpu, io, mem and hdd workers for days without a crash

- I completly reinstalled gentoo more than once

- I rebuilt world without any cpu optimizations

What system do I use:

I'm having this problems since kernel 2.4.x and nearly every 2.6 kernel release currently running on 2.6.16-r1.

I tried several different kernel configs (e.g. with and without preempt, etc...)

I frequently update the system (with stable portage tree) so there are no old buggy packages that can cause this.

What else?...

Since that machine does survive every stress and hardware test - since it does compile packages without ever having seg faults or internal compiler error or whatever - I can not really believe that it's a hardware thing. 

It's just like it sometimes wants a break from work....

So if you have any additional suggestions what I could do to track down the cause of this behaviour...........

----------

## unclecharlie

lawless,

Weird question- Do you hear anything when the computer crashes? (i.e.- the 'click' of a bad HD...)

If you can make it crash, open up the case and listen to the hard drive. If it lets out a noticeable 'click' or 'click click click' when the computer crashes then you may need a hard drive. (this happens because of a bad block on the HD or a contaminant like a piece of dirt... It's intermittent because the computer only crashes when the HD reads/writes to THAT block or hits the dirt/dust/whatever. Usually there is also an audible 'click' as the drive head hits that bad block.)

If there is no 'click' then get a multimeter and check the power supply voltages (or just switch it out for one that you know is good if you can.). A bad power supply causes weird intermittent errors like you're having also...

If the power supply and the hard drive are o.k. then you're looking at a software problem maybe. 

Charlie

----------

## Lawless

The hard disk never made any problems - no sound and no warnings from smart.

The power supply voltages also look good and stable. No ups and downs...

----------

## gwolf

Had similar problem few years ago and the reason was defective motherboard.

----------

## Voltago

Please post your

```
emerge info
```

Just in case you are using -march=c3, consider using -march=i686 instead, the c3 option is, at least in my experience, somewhat dangerous. A EPIA ME6000 server box of mine is running almost for a year now (with single uptimes of several weeks each) without stability problems using

```
CHOST="i686-pc-linux-gnu"

CFLAGS="-Os -mtune=i686 -pipe -s"
```

Oh, and you need >=gcc-3.4 for using the c3 processor as a i686-pc-linux-gnu arch.

----------

## Lawless

The old via c3 is not a real i686 because it's lacking the cmove command.

So I was always using i586 - but now I see with the new kernel the c3 arch option causes in i686 and I don't know why... 

That can't be _the_ cause since I had i586 with all the older kernels... 

My current cflags - I used a lot of other combinations. I think I will remove the frame-pointers again...

```

CFLAGS="-mcpu=i585 -fomit-frame-pointer"

CHOST="i586-pc-linux-gnu"

```

My emerge info

```

Portage 2.0.54 (default-linux/x86/2005.0, gcc-3.4.4, glibc-2.3.5-r2, 2.6.16-rc1 i686)

=================================================================

System uname: 2.6.16-rc1 i686 VIA Samuel 2

Gentoo Base System version 1.6.14

dev-lang/python:     2.3.5-r2, 2.4.2

sys-apps/sandbox:    1.2.12

sys-devel/autoconf:  2.13, 2.59-r6

sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r1

sys-devel/binutils:  2.16.1

sys-devel/libtool:   1.5.22

virtual/os-headers:  2.6.11-r2

ACCEPT_KEYWORDS="x86"

AUTOCLEAN="yes"

CBUILD="i586-pc-linux-gnu"

CFLAGS="-mcpu=i585 -fomit-frame-pointer"

CHOST="i586-pc-linux-gnu"

CONFIG_PROTECT="/etc /usr/kde/2/share/config /usr/kde/3/share/config /usr/share/config /var/qmail/control"

CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d"

CXXFLAGS="-mcpu=i585 -fomit-frame-pointer"

DISTDIR="/usr/portage/distfiles"

FEATURES="autoconfig distlocks sandbox sfperms strict"

GENTOO_MIRRORS="ftp://ftp.tu-clausthal.de/pub/linux/gentoo/ ftp://sunsite.informatik.rwth-aachen.de/pub/Linux/gentoo ftp://linux.rz.ruhr-uni-bochum.de/gentoo-mirror/ ftp://ftp.uni-erlangen.de/pub/mirrors/gentoo ftp://ftp.join.uni-muenster.de/pub/linux/distributions/gentoo ftp://ftp.wh2.tu-dresden.de/pub/mirrors/gentoo ftp://mirrors.sec.informatik.tu-darmstadt.de/gentoo/"

MAKEOPTS="-j2"

PKGDIR="/usr/portage/packages"

PORTAGE_TMPDIR="/var/tmp"

PORTDIR="/usr/portage"

SYNC="rsync://rsync.gentoo.org/gentoo-portage"

USE="x86 acpi alsa apache2 apm audiofile avi berkdb bitmap-fonts bzip2 cdparanoia crypt cups curl dba eds emboss encode expat ffmpeg foomaticdb fortran gd gdbm gpm gstreamer imlib ipv6 libg++ libwww lirc mad mhash mikmod motif mp3 mpeg ncurses nls nptl nptlonly ogg oggvorbis oss pam pcre pdflib perl python quicktime readline samba session spell ssl tcpd tiff truetype-fonts type1-fonts udev usb userlocales v4l vorbis xml2 zlib userland_GNU kernel_linux elibc_glibc"

Unset:  ASFLAGS, CTARGET, LANG, LC_ALL, LDFLAGS, LINGUAS, PORTDIR_OVERLAY

```

----------

## ebichu

Try disabling the "Longhaul" driver in the kernel (Power management options -> CPU Frequency scaling -> VIA Cyrix III Longhaul) or any CPU frequency governor daemons you have running. You will lose the power saving features but gain much stability. The Longhaul driver seems to cause DMA lock-ups

Also, there are BIOS upgrades on the VIA website that change the initialization of the southbridge chip to avoid some problem with large DMA transfers locking up.

In my case, the BIOS upgrade seemed to make the lock-ups less frequent, but since disabling "longhaul" my little server at home has stayed up for a couple of weeks. (But as luck would have it I just tried ssh'ing to it from work and it timed out, so maybe it has crashed today! [EDIT] The server was still running when I got home, but the internet connection had gone down due to an unrelated problem with my modem.)

----------

## Lawless

I will definitly test this out. Thank you.

If I can say this in that situation - I'm happy that I'm not the only one  :Wink: 

I had a debian installed to be sure it's not a gentoo-I-compiled-something-wrong thing - but I simply couldn't endure that distribution  :Wink: 

So I copied my gentoo back...

Hmm, now I have to burn a cd for the update - no floppy here...

----------

## ebichu

 *Lawless wrote:*   

> Hmm, now I have to burn a cd for the update - no floppy here...

 

I ended up installing DOS onto a spare hard drive, so I could save the old BIOS. I'm not sure which mobos have (or need) the BIOS update for this problem. Certainly the Epia PD has it.

----------

## Lawless

System is up for now over three days after disableing cpufreq - which is record for weeks...

So I'll try the bios update in case it's crashing again.

----------

