# Random segfaults

## kev009

I'm seeing a lot of random segfaults on a new server.  The server has ECC memory and EDAC is compiled into the kernel.  Best I can tell, these are not memory related errors.  I'm really at a loss for what can cause these and would like some help debugging.

Example (no, its not jjust distcc that segfaults but this is a pretty easy workload to trigger it):

```

distccd[15409]: segfault at 1 ip 00000000006f3e50 sp 00007fff138dfda8 error 4 in libc-2.9.so[673000+168000]

distccd[1687]: segfault at 1 ip 00000000006f3e50 sp 00007fff138dfda8 error 4 in libc-2.9.so[673000+168000]

distccd[19943]: segfault at 1 ip 00000000006f3e50 sp 00007fff138dfda8 error 4 in libc-2.9.so[673000+168000]

distccd[22155]: segfault at 1 ip 00000000006f3e50 sp 00007fff138dfda8 error 4 in libc-2.9.so[673000+168000]

distccd[21948]: segfault at 1 ip 00000000006f3e50 sp 00007fff138dfda8 error 4 in libc-2.9.so[673000+168000]

```

----------

## kev009

Ran memtest for 6 hours, no errors found and it didn't segfault.  Any ideas on what to test?

[root@X3650-A1 mc0]# pwd

/sys/devices/system/edac/mc/mc0

[root@X3650-A1 mc0]# cat ce_count

0

[root@X3650-A1 mc0]# cat ce_noinfo_count

0

[root@X3650-A1 mc0]# cat ue_count

0

[root@X3650-A1 mc0]# cat ue_noinfo_count

0

```

distccd[15409]: segfault at 1 ip 00000000006f3e50 sp 00007fff138dfda8 error 4 in libc-2.9.so[673000+168000]

distccd[1687]: segfault at 1 ip 00000000006f3e50 sp 00007fff138dfda8 error 4 in libc-2.9.so[673000+168000]

distccd[19943]: segfault at 1 ip 00000000006f3e50 sp 00007fff138dfda8 error 4 in libc-2.9.so[673000+168000]

distccd[22155]: segfault at 1 ip 00000000006f3e50 sp 00007fff138dfda8 error 4 in libc-2.9.so[673000+168000]

distccd[21948]: segfault at 1 ip 00000000006f3e50 sp 00007fff138dfda8 error 4 in libc-2.9.so[673000+168000]

ld-linux-x86-64[2568]: segfault at 608000 ip 00000000001fd58f sp 00007fff1b91cd78 error 6 in ld-2.9.so[1e5000+20000]

ld-linux-x86-64[2583]: segfault at 400000 ip 0000000000400dbb sp 00007fff67b99040 error 7 in doexec[400000+1000]

ld-linux-x86-64[2598] general protection ip:1f01f0 sp:7fff78bf7230 error:0 in ld-2.9.so[1e8000+20000]

ld-linux-x86-64[2726]: segfault at bff92a ip 0000000000bff92a sp 00007fff78f433a8 error 15 in fcore[afe000+23f000]

ld-linux-x86-64[2729]: segfault at 3fff89 ip 000000000040692a sp 00007fffc11d6678 error 6 in dvips[400000+2e000]

ld-linux-x86-64[2820]: segfault at 2a8 ip 00000000004ff99b sp 00007fff64f4eb20 error 4 in ld-2.9.so[4fc000+20000]

ld-linux-x86-64[2822]: segfault at d3d000 ip 0000000000a0e58b sp 00007fff4a2296b8 error 6 in ld-2.9.so[9f6000+20000]

ld-linux-x86-64[3029]: segfault at 604dbb ip 0000000000604dbb sp 00007fff2309b530 error 15 in gssdp-device-sniffer[604000+1000]

ld-linux-x86-64[3036]: segfault at 81eb70 ip 000000000081eb70 sp 00007fffee2e2ea8 error 15 in unzip[81e000+1000]

ld-linux-x86-64[3039]: segfault at 806178 ip 0000000000806178 sp 00007fff652c7e40 error 15 in indxbib[806000+1000]

__ratelimit: 2 callbacks suppressed

ld-linux-x86-64[3252]: segfault at 76492a ip 000000000076492a sp 00007fff231b5618 error 15

ld-linux-x86-64[3301]: segfault at 44992a ip 0000000000439dbb sp 00007fffde682b20 error 7 in qcatool2[400000+45000]

ld-linux-x86-64[3375]: segfault at 400000 ip 000000000040192a sp 00007fff5652c9c8 error 6 in asciitopgm[400000+2000]

ld-linux-x86-64[3557]: segfault at 66b92a ip 000000000066b92a sp 00007fffec0454a8 error 15

ld-linux-x86-64[3783] general protection ip:21f1f0 sp:7fffe94ccb00 error:0 in ld-2.9.so[217000+20000]

ld-linux-x86-64[4108]: segfault at 2a8 ip 00000000004ab99b sp 00007fff280e4cb0 error 4 in ld-2.9.so[4a8000+20000]

ld-linux-x86-64[4184]: segfault at 400039 ip 000000000040bddd sp 00007fff0c9fcea0 error 6 in ktuberling[400000+1a000]

ld-linux-x86-64[4185]: segfault at d3d000 ip 000000000092758b sp 00007fff616e3b68 error 6 in ld-2.9.so[90f000+20000]

ld-linux-x86-64[4209]: segfault at d4d92a ip 0000000000d4d92a sp 00007fff9a1025a8 error 15

ld-linux-x86-64[4317]: segfault at bb092a ip 0000000000bb092a sp 00007fff37d3d1d8 error 15 in fdebuginfo[afe000+23f000]

ld-linux-x86-64[4325] general protection ip:2531f0 sp:7fff15a720a0 error:0 in ld-2.9.so[24b000+20000]

ld-linux-x86-64[4398] general protection ip:4d992a sp:7fff5e1b3628 error:0 in fmaps[400000+4fe000]

ld-linux-x86-64[4560]: segfault at 3fff89 ip 000000000044f92b sp 00007fff68e5d2f8 error 6 in rhythmbox[400000+9f000]

ld-linux-x86-64[4649]: segfault at 400000 ip 000000000044adbb sp 00007fff2d52e9a0 error 6 in icedax[400000+4b000]

ld-linux-x86-64[4818]: segfault at 616000 ip 000000000020958b sp 00007fff7fbdf068 error 6 in ld-2.9.so[1f1000+20000]

ld-linux-x86-64[4842]: segfault at 93b92a ip 000000000093b92a sp 00007fffecdd5238 error 15

ld-linux-x86-64[4857]: segfault at 809590 ip 0000000000809590 sp 00007fffac062c38 error 15 in tree[809000+1000]

ld-linux-x86-64[4928] general protection ip:1ee1f0 sp:7fff4ace5310 error:0 in ld-2.9.so[1e6000+20000]

ld-linux-x86-64[5025]: segfault at b5b92a ip 0000000000b5b92a sp 00007fff7cdae248 error 15 in fhpd[afe000+23f000]

ld-linux-x86-64[5291]: segfault at 602000 ip 0000000000602000 sp 00007fff367952f8 error 15 in tiff2rgba[602000+1000]

ld-linux-x86-64[5352]: segfault at 69e000 ip 000000000020558b sp 00007fffc371db78 error 6 in ld-2.9.so[1ed000+20000]

ld-linux-x86-64[5353]: segfault at 600178 ip 0000000000600178 sp 00007fffeb21cd90 error 15 in khelpcenter[600000+5000]

ld-linux-x86-64[5479]: segfault at 602dbb ip 0000000000602dbb sp 00007fff15b79010 error 15 in kcmshell4[600000+5000]

ld-linux-x86-64[5564]: segfault at 63992a ip 000000000063992a sp 00007fff2ebd0068 error 15 in korgac[617000+48000]

ld-linux-x86-64[5667]: segfault at 400000 ip 000000000044e92a sp 00007fff49824c88 error 6 in openjade[400000+9b000]

ld-linux-x86-64[5970]: segfault at 316 ip 00000000004341f0 sp 00007fff8404a680 error 4 in ld-2.9.so[42f000+1d000]

ld-linux-x86-64[5997]: segfault at 60692a ip 000000000060692a sp 00007fff014168a8 error 15 in gcj-dbtool[605000+3000]

ld-linux-x86-64[6039] trap invalid opcode ip:402dbb sp:7fff05746be0 error:0 in envsubst[400000+6000]

ld-linux-x86-64[6183] general protection ip:2431f0 sp:7fff75b52180 error:0 in ld-2.9.so[23b000+20000]

ld-linux-x86-64[6241]: segfault at c3a92a ip 0000000000c3a92a sp 00007fff7d202698 error 15

ld-linux-x86-64[6378]: segfault at 400000 ip 000000000040192a sp 00007fffb6947de8 error 6 in pbmtoepsi[400000+2000]

ld-linux-x86-64[6584]: segfault at ca1100 ip 000000000063292c sp 00007fff85e9a2f8 error 4 in fdebugdump[400000+4fe000]

ld-linux-x86-64[6602]: segfault at 18 ip 0000000000400274 sp 00007fff0862cc60 error 6 in gouldtoppm[400000+1000]

ld-linux-x86-64[6770]: segfault at 7a7000 ip 00000000002d8593 sp 00007ffff0da11f8 error 6 in ld-2.9.so[2c0000+20000]

ld-linux-x86-64[6861]: segfault at 400056 ip 000000000040fdbe sp 00007fff78ba7020 error 7 in omfonts[400000+1b000]

ld-linux-x86-64[6894]: segfault at 1d ip 000000000041492f sp 00007fff234138a0 error 4 in gpk-repo[400000+28000]

ld-linux-x86-64[7054]: segfault at 400000 ip 0000000000400dbd sp 00007fffb73f5890 error 7 in kabcdistlistupdater[400000+3000]

ld-linux-x86-64[7154]: segfault at 600f70 ip 0000000000600f70 sp 00007fff4a755328 error 15 in grmic[600000+2000]

ld-linux-x86-64[7207]: segfault at 400000 ip 000000000040892c sp 00007fffff9a1e08 error 7 in gprof[400000+16000]

ld-linux-x86-64[7268]: segfault at 400000 ip 0000000000402dbb sp 00007ffff583bce0 error 6 in tail[400000+d000]

ld-linux-x86-64[7270]: segfault at 2a8 ip 00000000004dc99b sp 00007fff77c34800 error 4 in ld-2.9.so[4d9000+20000]

ld-linux-x86-64[7279]: segfault at 40ed60 ip 00000000001f2978 sp 00007fff71243e10 error 7 in ld-2.9.so[1ef000+20000]

ld-linux-x86-64[7280] trap invalid opcode ip:41f92a sp:7fff2842d8c8 error:0 in kttsd[400000+36000]

ld-linux-x86-64[7309]: segfault at 23fffff ip 00000000004d392a sp 00007ffff4a28d88 error 6 in oparchive[400000+13f000]

ld-linux-x86-64[7377]: segfault at 1f992a ip 00000000001f992a sp 00007fff6e4e0948 error 15

ld-linux-x86-64[7448]: segfault at 8eb92a ip 00000000008eb92a sp 00007fff5c24f6e8 error 15

__ratelimit: 1 callbacks suppressed

ld-linux-x86-64[7676]: segfault at b33000 ip 00000000003b557b sp 00007fff13789c18 error 6 in ld-2.9.so[39d000+20000]

ld-linux-x86-64[7699]: segfault at 62ddbb ip 000000000062ddbb sp 00007ffffa0ab520 error 15 in toc2cddb[62c000+9000]

ld-linux-x86-64[7939]: segfault at 161004011d5 ip 00000000001ea1f0 sp 00007fff1eeb14f0 error 4 in ld-2.9.so[1e2000+20000]

ld-linux-x86-64[7959]: segfault at 62e398 ip 0000000000402470 sp 00007ffffaae0f40 error 4 in git-unpack-file[400000+2e000]

ld-linux-x86-64[8066]: segfault at 31c ip 00000000004461f0 sp 00007fff729f6020 error 4 in ld-2.9.so[43e000+20000]

ld-linux-x86-64[8302] general protection ip:4de932 sp:7fffc9782be0 error:0 in myisampack[400000+140000]

ld-linux-x86-64[8304] trap invalid opcode ip:63192a sp:7fff869f1e88 error:0 in scribus[400000+c50000]

ld-linux-x86-64[8818]: segfault at 6066d0 ip 00000000006066d0 sp 00007fff0a9304a8 error 15 in zipnote[606000+1000]

ld-linux-x86-64[8826]: segfault at 806270 ip 0000000000806270 sp 00007fff82353980 error 15 in htstat[806000+5000]

ld-linux-x86-64[8828]: segfault at 815288 ip 0000000000481a48 sp 00007fff88f3b3d0 error 4 in ekiga[400000+209000]

ld-linux-x86-64[8839]: segfault at 6bf92a ip 00000000006bf92a sp 00007fff48427898 error 15

ld-linux-x86-64[9157]: segfault at 400065 ip 0000000000400dbd sp 00007fff47a0feb0 error 7 in pbmtopgm[400000+1000]

ld-linux-x86-64[9171]: segfault at 400000 ip 000000000042992a sp 00007ffffe23f6d8 error 6 in kspaceduel[400000+2b000]

ld-linux-x86-64[9230] general protection ip:436b96 sp:7fff0cc97870 error:0 in ld-2.9.so[434000+20000]

ld-linux-x86-64[9482]: segfault at 2a8 ip 000000000049099b sp 00007fff23dc2990 error 4 in ld-2.9.so[48d000+20000]

ld-linux-x86-64[9729]: segfault at 404c20 ip 00000000001fb93f sp 00007fff0c0624f8 error 6 in ld-2.9.so[1e4000+20000]

ld-linux-x86-64[9939]: segfault at 637dbb ip 0000000000637dbb sp 00007ffff0fc2440 error 15 in tc[637000+4000]

ld-linux-x86-64[9990] general protection ip:40092a sp:7fff275d9a78 error:0 in runlevel[400000+b000]

ld-linux-x86-64[10113] general protection ip:1e91f0 sp:7fffefb6a1a0 error:0 in ld-2.9.so[1e1000+20000]

ld-linux-x86-64[10266]: segfault at 0 ip 0000000000000000 sp 00007fffff1c3650 error 14 in useradd[400000+11000]

ld-linux-x86-64[10402]: segfault at 2a8 ip 000000000056099b sp 00007fff03126cf0 error 4 in ld-2.9.so[55d000+20000]

ld-linux-x86-64[10525]: segfault at 64d92a ip 000000000064d92a sp 00007fff8ae3e2a8 error 15

ld-linux-x86-64[10612]: segfault at 605ab8 ip 0000000000605ab8 sp 00007fffdbfa4838 error 15 in lgroupmod[605000+2000]

ld-linux-x86-64[10677]: segfault at 3cd8 ip 0000000000406ab8 sp 00007fffaad1d888 error 6 in grpconv[400000+7000]

ld-linux-x86-64[10952]: segfault at 600f70 ip 0000000000600f70 sp 00007fff8a6241e8 error 15 in cracklib-unpacker[600000+1000]

ld-linux-x86-64[11476]: segfault at 7fff5d060120 ip 0000000000bdf92a sp 00007fff38c600d8 error 4 in cc1plus[400000+7e5000]

ld-linux-x86-64[11483] trap invalid opcode ip:63c92a sp:7fff5baeaf68 error:0 in f951[400000+79a000]

ld-linux-x86-64[11496]: segfault at ce5e18 ip 0000000000433d70 sp 00007fff910073f8 error 4 in jc1[400000+6e5000]

ld-linux-x86-64[11956]: segfault at 620ac0 ip 0000000000406e78 sp 00007fffec951de0 error 4 in gvfsd-smb[400000+21000]

ld-linux-x86-64[12187]: segfault at 10 ip 00000000001ea207 sp 00007fff4ea1f050 error 4 in ld-2.9.so[1e2000+20000]

ld-linux-x86-64[12296]: segfault at 400000 ip 0000000000419dbb sp 00007ffffe0034a0 error 6 in gvfsd-network[400000+1d000]

ld-linux-x86-64[12335]: segfault at 48723005 ip 000000000063c92a sp 00007fff3b3477a8 error 4 in mysqld[400000+723000]

ld-linux-x86-64[14213]: segfault at 61b000 ip 000000000020f57f sp 00007fff79d7e1b8 error 6 in ld-2.9.so[1f7000+20000]

ld-linux-x86-64[14333]: segfault at 601590 ip 0000000000601590 sp 00007fff24c387e8 error 15 in unopkg.bin[601000+1000]

```

----------

## eccerr0r

can you try to underclock the CPU?  Could have gotten a bad CPU.

If it had been an old machine, check the CPU fan for dust contamination?

----------

## BitJam

It is interesting that memtest does not segfault and detected no errors.

Underclocking might check you CPU and cooling which is a good idea but I suspect the problem is related to your power supply and hard drives.  Are you using RAID?  What is your hard drive configuration?

Maybe you can track down the problem area by seeing what activities increase the failure rate.   For example, does the failure rate increase if you increase the disk activity?

You can also try using the cpuburn program (emerge cpuburn).  If cpuburn causes failures then it is unlikely that the fault lies in the hard drives.  But if cpuburn does not cause failures then the hard drives and power supply are more likely the culprits.  The cpuburn program comes with this warning:

```
** WARNING ***    This program is designed to heavily load CPU chips.

Undercooled, overclocked or otherwise weak systems may fail causing data

loss (filesystem corruption) and possibly permanent damage to electronic

components.  Nor will it catch all flaws.   *** USE AT YOUR OWN RISK ***
```

----------

## kev009

System is an IBM System x3650 with 2 Xeon E5420 and redundant PS.  Temperatures were very stable (under 120degF under full sustained load).

On a hunch, I reseated the CPUs and that solved it.  I remember having this problem a couple years ago in another system.  CPUs seem to unseat fairly easily after shipping and cause random flakiness.

Thanks for the pointers!  I will keep cpuburn in mind for any future problems.

----------

