# Sudden random general protection faults out of nowhere

## mpeg4v3

So over the past week or so I've been setting up my new HTPC. Everything was going fine, I've slowly gotten everything configured.

Then today, I come home from class, and go to emerge a program. It errors out midway through. I try again. The system hangs entirely. On the next reboot I check messages and see that I got a "general protection fault: 0000 [2] SMP".

Now, this happens completely randomly on everything I do. I can manage to compile gcc fine, but then it'll fault instantly on the unpacking phase if I try to do it again. Or it'll get halfway through compiling and fault. Or it'll fault while doing configure.

At first I figured this was, despite it all being brand new, a hardware problem. So I ran memtest, nothing; RAM is fine. I try to run all these cpu testing programs but they all error out immediately in Linux. However, if I boot a BartPE CD with Orthos on it that heavily stresses both cores, it runs perfectly fine with no errors. So I'm pretty sure it's not the CPU/RAM and something else instead.

I've also tried turning off SpeedStep in the bios and still no luck.

This problem is making me want to stab something and yell loud curses into the air repeatedly. I can't figure out what the hell is going on or what caused it. The last thing I did to this machine was enable cpufreq and turning it off has no effect. I've even tried the "echo 0 > /proc/sys/kernel/randomize_va_space" tip I saw in another thread and still nothing.

edit: oh also, pc specs are: Asus P5E-VM HDMI mobo, Intel E8400 CPU, G-Skill 4gb DDR2 800mhz RAM, and an old Seagate 200gb PATA hard drive.

----------

## mpeg4v3

So I left memtest running overnight last night for 8 hours, no errors. I also left the BartPE bootdisk running with Orthos on both cores for about 10.5 hours today and there were no errors either. I'm at my wit's end for this, I'm reduced to turning off random services and attempting compiles and seeing what happens. Would this be better posted in the general problems forum rather than AMD64?

----------

## Maedhros

Moved from Gentoo on AMD64 to Kernel & Hardware (by request).

----------

## coolsnowmen

If you have a bootable linux disk (the is working), can you start with that working kernel version and .config, 

The only think you might have to do is change your motherboard and HD drivers to built in.

Unless you are doing something special, looks like a [kernel] driver problem...

----------

## mpeg4v3

I just got home and booted off my amd64 liveusb stick, which booted fine. I chrooted in and ran revdep-rebuild which didn't find anything important. I then tried emerging again and it once again faulted out, which I guess just proves that it isn't the kernel either. I forgot to mention in my original post that I also ran a long test with smartctl and it returned no errors. And I don't think it's the motherboard itself because I figure Orthos would have stressed it pretty heavily the 10.5 hours it was running.

I already have everything compiled into the kernel.

The only other thing I can think of that I've done recently was try and get my Belkin UPS working by installing their shitty java software, but I couldn't even get it to detect the UPS and I'm not running it on boot so I doubt it is affecting it. I guess I'll try uninstalling it next and unplugging the UPS but that is just a long shot.

----------

## mpeg4v3

well, uninstalling the UPS software of course didn't help. here's the log of what happened just now when I ran revdep-rebuild. note that it does it not just for sed but for whatever random program is running- emerge, cc1, anything.

```

Feb 29 03:10:01 bluebeard stack segment: 0000 [1] SMP 

Feb 29 03:10:01 bluebeard CPU 0 

Feb 29 03:10:01 bluebeard Modules linked in:

Feb 29 03:10:01 bluebeard Pid: 20119, comm: sed Not tainted 2.6.23-gentoo-r8 #7

Feb 29 03:10:01 bluebeard RIP: 0010:[<ffffffff8025fbf9>]  [<ffffffff8025fbf9>] __alloc_pages+0x24/0x2b4

Feb 29 03:10:01 bluebeard RSP: 0018:fbff81010a783d18  EFLAGS: 00010202

Feb 29 03:10:01 bluebeard RAX: 0000000000000010 RBX: 0000000000000000 RCX: 0000000000000000

Feb 29 03:10:01 bluebeard RDX: ffff810000010370 RSI: 0000000000000000 RDI: 00000000000200d2

Feb 29 03:10:01 bluebeard RBP: 00000000000200d2 R08: 00002aaaab0066f0 R09: ffff81012245f9c0

Feb 29 03:10:01 bluebeard R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000

Feb 29 03:10:01 bluebeard R13: ffff810000010370 R14: ffff81012b6857f0 R15: 0000000000000000

Feb 29 03:10:01 bluebeard FS:  00002aaaab0066f0(0000) GS:ffffffff809b7000(0000) knlGS:0000000000000000

Feb 29 03:10:01 bluebeard CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b

Feb 29 03:10:01 bluebeard CR2: 00002aaaaaacc000 CR3: 000000010a0a4000 CR4: 00000000000026e0

Feb 29 03:10:01 bluebeard DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000

Feb 29 03:10:01 bluebeard DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400

Feb 29 03:10:01 bluebeard Process sed (pid: 20119, threadinfo ffff81010a782000, task ffff81012b6857f0)

Feb 29 03:10:01 bluebeard Stack: <0>general protection fault: 0000 [2] SMP 

Feb 29 03:10:01 bluebeard CPU 0 

Feb 29 03:10:01 bluebeard Modules linked in:

Feb 29 03:10:01 bluebeard cron[20121]: (root) CMD (test -x /usr/sbin/run-crons && /usr/sbin/run-crons )

Feb 29 03:10:01 bluebeard Pid: 20119, comm: sed Not tainted 2.6.23-gentoo-r8 #7

Feb 29 03:10:01 bluebeard RIP: 0010:[<ffffffff8020c8da>]  [<ffffffff8020c8da>] _show_stack+0xa7/0xf1

Feb 29 03:10:01 bluebeard RSP: 0018:ffffffff80a6fe68  EFLAGS: 00010046

Feb 29 03:10:01 bluebeard RAX: ffffffff809b7001 RBX: fbff81010a783d18 RCX: 0000000000000000

Feb 29 03:10:01 bluebeard RDX: fbff81010a783d00 RSI: ffffffff80a6ff58 RDI: 0000000000000000

Feb 29 03:10:01 bluebeard RBP: 0000000000000000 R08: 0000000000000008 R09: 0000000000000010

Feb 29 03:10:01 bluebeard R10: 0000000000000001 R11: ffff810129b5e4e0 R12: ffffffff80a6efc0

Feb 29 03:10:01 bluebeard R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff80a6afc0

Feb 29 03:10:01 bluebeard FS:  00002aaaab0066f0(0000) GS:ffffffff809b7000(0000) knlGS:0000000000000000

Feb 29 03:10:01 bluebeard CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b

Feb 29 03:10:01 bluebeard CR2: 00002aaaaaacc000 CR3: 000000010a0a4000 CR4: 00000000000026e0

Feb 29 03:10:01 bluebeard DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000

Feb 29 03:10:01 bluebeard DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400

Feb 29 03:10:01 bluebeard Process sed (pid: 20119, threadinfo ffff81010a782000, task ffff81012b6857f0)

Feb 29 03:10:01 bluebeard Stack:  ffffffff808fcd07 ffffffff80a6ff58 fbff81010a783d18 ffff81012b6857f0

Feb 29 03:10:01 bluebeard ffffffff80a6ff58 0000000000000000 fbff81010a783d18 ffff81012b6857f0

Feb 29 03:10:01 bluebeard 0000000000000000 ffffffff8020c9b1 ffffffff80a6ff58 ffffffff80a6ff58

Feb 29 03:10:01 bluebeard Call Trace:

Feb 29 03:10:01 bluebeard <#SS>  [<ffffffff8020c9b1>] show_registers+0x8d/0x100

Feb 29 03:10:01 bluebeard [<ffffffff806f3a65>] __die+0x80/0xbf

Feb 29 03:10:01 bluebeard [<ffffffff8020cb29>] die+0x41/0x53

Feb 29 03:10:01 bluebeard [<ffffffff8020d257>] do_stack_segment+0x5f/0x6d

Feb 29 03:10:01 bluebeard [<ffffffff8020c40d>] stack_segment+0x7d/0x90

Feb 29 03:10:01 bluebeard [<ffffffff8025fbf9>] __alloc_pages+0x24/0x2b4

Feb 29 03:10:01 bluebeard <<EOE>> 

Feb 29 03:10:01 bluebeard 

Feb 29 03:10:01 bluebeard Code: 48 8b 33 48 c7 c7 d9 71 84 80 31 c0 48 83 c3 08 49 ff c6 e8 

Feb 29 03:10:01 bluebeard RIP  [<ffffffff8020c8da>] _show_stack+0xa7/0xf1

Feb 29 03:10:01 bluebeard RSP <ffffffff80a6fe68>

```

----------

## jcat

Problems like this are almost always hardware, and are almost always very hard to pin down.

I had a problem on a computer recently that was related to RAM.  Two 8 hour memtest runs found absolutely nothing, but removing the offending pair of simms cured the issue.

So, although it's hard to speculate what the issue could be exactly, I wanted to say that just because memtest gave you the "all clear" doesn't mean it's not the memory!

It could also be a PSU issue.

Did you build this system yourself, or do you have a company to complain to?

Cheers,

jcat

----------

## mpeg4v3

Well, it turned out to be something beyond totally random: Gentoo did not like my CPU's multiplier being at 9.0. I dropped the multiplier down to 7.5, and raised the FSB to 400- the exact same 3ghz it was with 9x333- and it works perfectly fine now. In the 12 or so years I've been working on computers this is pretty much the most random fix I've ever seen before.

----------

## drescherjm

Was the memory frequency changed as a result?

----------

## mpeg4v3

No, I kept it at the exact same frequency.

----------

## jcat

It's not unusual for overclocking to cause such issues, not sure why you think it's a random solution   :Wink: 

I'm glad it's sorted now anyway  :Smile: 

Cheers,

jcat

----------

## mpeg4v3

the thing is I wasn't overclocking

I was running it at it's stock speed of 3ghz, which is a multiplier of 9 x 333mhz fsb. I fixed it by changing the multiplier to 7.5 and the fsb to 400mhz, which resulted in 3ghz still.

----------

## jcat

Hmmmm, maybe the RAM you have doesn't like running at 333MHz, it performs better at 400MHz.  I realise that you're supposed to be able to under clock RAM without issue, but RAM can be funny some times!

Whatever the reason, you can be sure it's not because Gentoo didn't like your multiplier, it has to be some aspect of your hardware that didn't like it.

Cheers,

jcat

----------

