# Random reboot

## Henrik Olsen

I am plagued by random reboots on my server.

I am mystified and thought I finally cracked the nut today when a memtest86+ showed my single ram dimm to be faulty. I searched in the basement and found another dimm, and was sure that was it. But no oh no  :Sad: 

The problem has accelerated the last few days and it reboots very often. A snippet of 'last' shows 

```

henriko  pts/0        10.0.0.40        Sun Feb 17 17:14   still logged in   

reboot   system boot  2.6.23-gentoo-r3 Sun Feb 17 16:38          (00:35)    

henriko  pts/0        10.0.0.40        Sun Feb 17 17:09 - crash  (00:-30)   

reboot   system boot  2.6.23-gentoo-r3 Sun Feb 17 16:32          (00:41)    

henriko  pts/0        10.0.0.40        Sun Feb 17 17:04 - crash  (00:-32)   

reboot   system boot  2.6.23-gentoo-r3 Sun Feb 17 16:26          (00:48)    

reboot   system boot  2.6.23-gentoo-r3 Sun Feb 17 16:22          (00:52)    

reboot   system boot  2.6.23-gentoo-r3 Sun Feb 17 16:15          (00:59)    

reboot   system boot  2.6.23-gentoo-r3 Sun Feb 17 16:08          (01:05)    

reboot   system boot  2.6.23-gentoo-r3 Sun Feb 17 16:03          (01:10)    

reboot   system boot  2.6.23-gentoo-r3 Sun Feb 17 15:57          (01:17)    

reboot   system boot  2.6.23-gentoo-r3 Sun Feb 17 15:51          (01:22)    

reboot   system boot  2.6.23-gentoo-r3 Sun Feb 17 15:45          (01:28)    

reboot   system boot  2.6.23-gentoo-r3 Sun Feb 17 15:36          (01:37)    

reboot   system boot  2.6.23-gentoo-r3 Sun Feb 17 15:15          (00:00)    

reboot   system boot  2.6.23-gentoo-r3 Sun Feb 17 15:14          (00:00)    

henriko  pts/0        10.0.0.40        Sun Feb 17 15:29 - crash  (00:-15)   

reboot   system boot  2.6.23-gentoo-r3 Sun Feb 17 15:11          (00:03)    

henriko  tty1                          Sun Feb 17 14:52 - down   (00:00)    

henriko  tty1                          Sun Feb 17 14:52 - 14:52  (00:00)    

reboot   system boot  2.6.23-gentoo-r3 Sun Feb 17 14:23          (00:28)    

henriko  pts/0        10.0.0.40        Sun Feb 17 13:28 - down   (00:30)    

reboot   system boot  2.6.23-gentoo-r3 Sun Feb 17 08:47          (05:11)    

henriko  pts/0        10.0.0.40        Sun Feb 17 13:16 - crash  (-4:-28)   

reboot   system boot  2.6.23-gentoo-r3 Sun Feb 17 08:37          (05:21)    

henriko  pts/0        10.0.0.40        Sun Feb 17 13:13 - crash  (-4:-36)   

reboot   system boot  2.6.23-gentoo-r3 Sun Feb 17 08:27          (05:30)    

reboot   system boot  2.6.23-gentoo-r3 Sun Feb 17 08:19          (05:39)    

reboot   system boot  2.6.23-gentoo-r3 Sun Feb 17 08:17          (05:41)    

reboot   system boot  2.6.23-gentoo-r3 Sun Feb 17 08:11          (05:46)    

reboot   system boot  2.6.23-gentoo-r3 Sun Feb 17 08:02          (05:55)    

reboot   system boot  2.6.23-gentoo-r3 Sun Feb 17 07:52          (06:05)    

reboot   system boot  2.6.23-gentoo-r3 Sun Feb 17 07:50          (06:07)

```

Unfortunately my cdrom is broken mechanically so I haven't tried a livecd, but can borrow an external drive and try it.

All fans are working and nothing seems too hot.

Besides that, where do I look for errors on a problem like this? What log could be good for troubleshooting? 

I guess it's still a hardware problem despite the replaced dimm module.

I'm running a fairly new "2.6.23-gentoo-r3 #1 PREEMPT Mon Jan 21 08:36:06 CET 2008 i686 VIA Nehemiah CentaurHauls GNU/Linux".

```

 $ emerge --info

Portage 2.1.3.19 (default-linux/x86/2006.1, gcc-4.1.2, glibc-2.6.1-r0, 2.6.23-gentoo-r3 i686)

=================================================================

System uname: 2.6.23-gentoo-r3 i686 VIA Nehemiah

Timestamp of tree: Sat, 09 Feb 2008 15:00:04 +0000

ccache version 2.3 [enabled]

app-shells/bash:     3.2_p17-r1

dev-lang/python:     2.4.4-r6

dev-python/pycrypto: 2.0.1-r6

dev-util/ccache:     2.3

sys-apps/baselayout: 1.12.10-r5

sys-apps/sandbox:    1.2.18.1-r2

sys-devel/autoconf:  2.13, 2.61-r1

sys-devel/automake:  1.7.7, 1.9.6-r2, 1.10

sys-devel/binutils:  2.18-r1

sys-devel/gcc-config: 1.4.0-r4

sys-devel/libtool:   1.4.3-r1, 1.5.24

virtual/os-headers:  2.6.23-r3

ACCEPT_KEYWORDS="x86"

CBUILD="i686-pc-linux-gnu"

CFLAGS="-march=i686 -Os -pipe -mmmx -msse -mfpmath=sse -fomit-frame-pointer"

CHOST="i686-pc-linux-gnu"

CONFIG_PROTECT="/etc /usr/kde/3.1/share/config /usr/share/config /var/www/localhost/htdocs//mythweb/config"

CONFIG_PROTECT_MASK="/etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/terminfo /etc/udev/rules.d"

CXXFLAGS="-march=i686 -Os -pipe -mmmx -msse -mfpmath=sse -fomit-frame-pointer"

DISTDIR="/usr/portage/distfiles"

FEATURES="ccache distlocks metadata-transfer sandbox sfperms strict unmerge-orphans userfetch"

GENTOO_MIRRORS="http://ftp.uni-erlangen.de/pub/mirrors/gentoo ftp://ftp.wh2.tu-dresden.de/pub/mirrors/gentoo"

PKGDIR="/usr/portage/packages"

PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --filter=H_**/files/digest-*"

PORTAGE_TMPDIR="/var/tmp"

PORTDIR="/usr/portage"

SYNC="rsync://rsync.gentoo.org/gentoo-portage"

USE="acpi alsa apache2 berkdb bitmap-fonts cdr cli cracklib crypt cups dri dvd dvdr evms2 foomaticdb fortran gdbm gpm iconv imagemagick imap ipv6 isdnlog libwww lirc maildir midi mmx mudflap mysql ncurses nls nptl nptlonly openmp pam pcre perl ppds pppd python readline reflection samba sasl session spl sse ssl tcpd tiff truetype-fonts type1-fonts unicode usb x86 xorg zlib" ALSA_CARDS="trident" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" ELIBC="glibc" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" USERLAND="GNU" VIDEO_CARDS="apm ark chips cirrus cyrix dummy fbdev glint i128 i740 i810 imstt mach64 mga neomagic nsc nv r128 radeon rendition s3 s3virge savage siliconmotion sis sisusb tdfx tga trident tseng v4l vesa vga via vmware voodoo"

Unset:  CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, LC_ALL, LDFLAGS, LINGUAS, MAKEOPTS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, PORTDIR_OVERLAY

```

----------

## pdr

If this has been long term, and especially if it is newer motherboard/ram (eg ddr2), go into bios and check the memory voltages are correct for your dimms; I have had a number of recent-ish motherboards not get it right and set Vdimm at 1.8v when it was supposed to be 2.1v...

----------

## Henrik Olsen

 *pdr wrote:*   

> If this has been long term, and especially if it is newer motherboard/ram (eg ddr2), go into bios and check the memory voltages are correct for your dimms; I have had a number of recent-ish motherboards not get it right and set Vdimm at 1.8v when it was supposed to be 2.1v...

 

It's low power VIA C3M266-L motherboard from 2003 using DDR ram. BIOS settings seems correct, and I have tried to load a fail safe / default option too, and also manually setting everything as conservatively as possible.

----------

## Henrik Olsen

I cannot even complete a succesful boot anymore. It repeatably reboots a couple of seconds into the Gentoo boot sequence (after having chosen the kernel in grub). I tried replacing the CPU too (had another compatible VIA processor) - no luck  :Sad: 

Now data recovery is first priority. I had setup different volumes and RAID levels over two disks with EVMS. How do I proceed from here? Gentoo 2007.0 LiveCD (does it support EVMS)?

----------

## Henrik Olsen

 *Henrik Olsen wrote:*   

> How do I proceed from here? Gentoo 2007.0 LiveCD (does it support EVMS)?

 

I have burned the image, but the machine won't even boot the LiveCD. Same pattern/reaction, reboot after a few seconds of normal boot proces. I reboots the same place each time, at 'gentoo.igz...'.

Running memtest86+ over the night now on the dimm I replaced to see if this one should have a fault too, but I doubt it.

And as I wrote earlier, even the CPU has been replaced. I already replaced the motherboard last year (to one just like it). Could it be yet another one for the garbage, or?

----------

## pdr

The only time I've had spontaneous reboots has been from overheating, and from bios having the wrong Vdimm voltage set (in particular, with ddr2). Sorry.

----------

## TheMightyAeyeaws

lol  is the cpu fan spinning?   look up on the net how to measure powersupply voltages , under load too 

look for teh leaky capacitors on motherboard,  look for the motherboards grounded  or not grounded .  sometimes after years of hot  they get brittle,  and you move it or lay  it on its side and pcb cracks,

umm.  

 id switch in psupply  and if it dont help  take mb out and say  im mad as hell and im  not goona take it anymore and dirve over it with your oversized  rug office chair coaster

----------

## Henrik Olsen

 *TheMightyAeyeaws wrote:*   

> lol  is the cpu fan spinning?

 

Working fine, both in PSU and on CPU - even got en extra for the hd's, also working.

 *TheMightyAeyeaws wrote:*   

> look for teh leaky capacitors on motherboard

 

Actually 2 or 3 of the capacitors next to the cpu had a little brownish something on their tops which looked like rust. I removed it with the tip of my finger the first day I looked at the restart problem last week. Perhaps that's a sign...

Update: Just looked at wikipedia (http://en.wikipedia.org/wiki/Capacitor_plague) and indeed at least one of the capacitors looked exactly as the one with the bit of brown thingy on the top like shown on the picture to the right on the page. So I think you nailed it here! Thanks for help identifying the possible problems. 

Crazy I ran into both having a faulty dimm and blown capacitors.

 *TheMightyAeyeaws wrote:*   

> id switch in psupply  and if it dont help  take mb out and say  im mad as hell and im  not goona take it anymore and dirve over it with your oversized  rug office chair coaster

 

I don't have easy access to another PSU right now, but I will try it asap.

----------

