# [solved] (heat) Hardware trouble with gentoo box

## schmeggahead

I have the following logs from a quad core box from tenshi:

   8: ata5: EH in SWNCQ mode,QC:qc_active 0x1 sactive 0x1

   8: dhfis 0x1 dmafis 0x1 sdbfis 0x0

   8: ata5: tag : dhfis dmafis sdbfis sacitve

   8: ata5: tag 0x0: 1 1 0 1

   8: ata5.00: exception Emask 0x1 SAct 0x1 SErr 0x400000 action 0x6 frozen

   8: sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00

   8: sd 4:0:0:0: [sdc] 976773168 512-byte hardware sectors (500108 MB)

   8: ata5: EH complete

   8: ata5: hard resetting link

   8: ata5.00: error: { ICRC ABRT }

   8: ata5.00: Ata error. fis:0x21

   8: sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

   8: sd 4:0:0:0: [sdc] Write Protect is off

   8: ata5: SError: { Handshk }

   8: ata5.00: status: { DRDY ERR }

   8: ata5: SWNCQ:qc_active 0x1 defer_bits 0x0 last_issue_tag 0x0

   8: ata5: ATA_REG 0x41 ERR_REG 0x84

   7: ata5.00: configured for UDMA/133

   6: res 41/84:04:3c:89:71/84:00:03:00:00/40 Emask 0x10 (ATA bus error)

   6: ata5.00: cmd 61/58:00:3c:89:71/00:00:03:00:00/40 tag 0 ncq 45056 out

   5: ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

   3: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

   3: dnsmasq: reading /etc/resolv.conf

   2: RAID10 conf printout:

   2: disk 0, wo:0, o:1, dev:sdd5

   2: --- wd:1 rd:2

   1: res 41/84:04:44:ab:14/84:00:05:00:00/40 Emask 0x10 (ATA bus error)

   1: sd 4:0:0:0: [sdc] Result: hostbyte=0x00 driverbyte=0x08

   1: ata5.00: configured for UDMA/100

   1: raid10: Operation continuing on 1 devices.

   1: disk 1, wo:1, o:0, dev:sdc5

   1: Descriptor sense data with sense descriptors (in hex):

   1: raid10: Disk failure on sdc5, disabling device.

   1: 72 0b 47 00 00 00 00 0c 00 0a 80 00 00 00 00 00

   1: ata5.00: cmd 61/08:00:44:ab:14/00:00:05:00:00/40 tag 0 ncq 4096 out

   1: ata5.00: cmd 61/18:00:dc:95:50/00:00:00:00:00/40 tag 0 ncq 12288 out

   1: sd 4:0:0:0: [sdc] Sense Key : 0xb [current] [descriptor]

   1: ata5: limiting SATA link speed to 1.5 Gbps

   1: ata5.00: limiting speed to UDMA/100:PIO4

   1: 03 71 89 3c

   1: res 41/84:04:dc:95:50/84:00:00:00:00/40 Emask 0x10 (ATA bus error)

   1: sd 4:0:0:0: [sdc] ASC=0x47 ASCQ=0x0   3: DR6: ffff0ff0 DR7: 00000400

   3: Modules linked in:

   3: DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000

   3:

   2: Pid: 0, comm: swapper Not tainted (2.6.26-hardened-r9 #17)

   2: DS: 0068 ES: 0068 FS: 00d8 GS: 0000 SS: 0068

   1: ESI: 00000003 EDI: 00000000 EBP: 00000000 ESP: f7c83fa8

   1: CR0: 8005003b CR2: a4e1b6c0 CR3: 30c05000 CR4: 000006f0

   1: EAX: 00000000 EBX: c09ca000 ECX: 00000000 EDX: 00000000

   1: EIP: 0060:[<c0408abb>] EFLAGS: 00000246 CPU: 3

   1: CR0: 8005003b CR2: abdb1de0 CR3: 008fc000 CR4: 000006f0

   1: [<c05c3f67>] ?  [<c04756af>] ?  [<c04756af>] ?  [<c05c40ef>] ?  [<c05526db>]  [<c04660ee>]  [<c04662ad>]  [<c0466546>]  [<c0402d12>]  =======================

   1: EAX: ec9410eb EBX: 0000000c ECX: de4dbda4 EDX: de4dbda4

   1: [<c0408a91>]  [<c0401482>]  =======================

   1: EIP: 0060:[<c0408abb>] EFLAGS: 00000246 CPU: 0

   1: BUG: soft lockup - CPU#2 stuck for 124s! [touch:11495]

   1: ESI: 00000050 EDI: c0957100 EBP: de4dbf20 ESP: de4dbd28

   1: DS: 0068 ES: 0068 FS: 00d8 GS: 0033 SS: 0068

   1: Pid: 11495, comm: touch Not tainted (2.6.26-hardened-r9 #17)

   1: EAX: 00000000 EBX: f7c82000 ECX: 00000000 EDX: 00000000

   1: BUG: soft lockup - CPU#0 stuck for 124s! [swapper:0]

   1: ESI: 00000000 EDI: c08f8000 EBP: 0129f067 ESP: c09cbfdc

   1: EIP: 0060:[<c07b7f71>] EFLAGS: 00000293 CPU: 2

   1: [<c0408a91>] ?  [<c0401482>] ?  =======================

   1: BUG: soft lockup - CPU#3 stuck for 124s! [swapper:0]

   1: CR0: 8005003b CR2: a1065624 CR3: 35ca5000 CR4: 000006f0

   3: DR6: ffff0ff0 DR7: 00000400

   3: Modules linked in:

   3: DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000

   3:

   2: Pid: 0, comm: swapper Not tainted (2.6.26-hardened-r9 #17)

   2: DS: 0068 ES: 0068 FS: 00d8 GS: 0000 SS: 0068

   1: ESI: 00000003 EDI: 00000000 EBP: 00000000 ESP: f7c83fa8

   1: CR0: 8005003b CR2: a4e1b6c0 CR3: 30c05000 CR4: 000006f0

   1: EAX: 00000000 EBX: c09ca000 ECX: 00000000 EDX: 00000000

   1: EIP: 0060:[<c0408abb>] EFLAGS: 00000246 CPU: 3

   1: CR0: 8005003b CR2: abdb1de0 CR3: 008fc000 CR4: 000006f0

   1: [<c05c3f67>] ?  [<c04756af>] ?  [<c04756af>] ?  [<c05c40ef>] ?  [<c05526db>]  [<c04660ee>]  [<c04662ad>]  [<c0466546>]  [<c0402d12>]  =======================

   1: EAX: ec9410eb EBX: 0000000c ECX: de4dbda4 EDX: de4dbda4

   1: [<c0408a91>]  [<c0401482>]  =======================

   1: EIP: 0060:[<c0408abb>] EFLAGS: 00000246 CPU: 0

   1: BUG: soft lockup - CPU#2 stuck for 124s! [touch:11495]

   1: ESI: 00000050 EDI: c0957100 EBP: de4dbf20 ESP: de4dbd28

   1: DS: 0068 ES: 0068 FS: 00d8 GS: 0033 SS: 0068

   1: Pid: 11495, comm: touch Not tainted (2.6.26-hardened-r9 #17)

   1: EAX: 00000000 EBX: f7c82000 ECX: 00000000 EDX: 00000000

   1: BUG: soft lockup - CPU#0 stuck for 124s! [swapper:0]

   1: ESI: 00000000 EDI: c08f8000 EBP: 0129f067 ESP: c09cbfdc

   1: EIP: 0060:[<c07b7f71>] EFLAGS: 00000293 CPU: 2

   1: [<c0408a91>] ?  [<c0401482>] ?  =======================

   1: BUG: soft lockup - CPU#3 stuck for 124s! [swapper:0]

   1: CR0: 8005003b CR2: a1065624 CR3: 35ca5000 CR4: 000006f0

I'm thinking it is time to replace CPU and motherboard.

I've replaced the power supply and that didn't affect it (1000 watts should be enough!)

If it could be kernel setup, etc. that would be helpful.Last edited by schmeggahead on Fri Nov 06, 2009 4:26 pm; edited 1 time in total

----------

## Vorlon

It sounds like this could be almost anything, but I vote for motherboard/CPU too.

You might try removing every possible thing from the motherboard.  You may also want to try swapping out some RAM to see if the problem changes.

Also, are you sure you are cooling the CPU properly?  This might happen if you overheat your CPU, or give it the wrong overclocking settings.

Good luck!

----------

## pappy_mcfae

What kind of CPU? Did it overheat? Looks like all CPU to me.

Blessed be!

Pappy

----------

## schmeggahead

I'm thinking it might be worth my time to pull off the CPU (a quad core) and clean and redo the lubricant/heat transfer.

The case has more fans than imaginable, but I'll look at the air flow around the CPU.

It takes several days for the problem to materialize.

----------

## pappy_mcfae

Be very careful. CPU's are 100% CMOS circuitry. That means they are incredibly sensitive to static discharge. Touch the connector pads as little as possible (if at all). 

Blessed be!

Pappy

----------

## schmeggahead

Well - I did the reapplication of the diamond thermal paste.

Then I found the sensors setup - boy was it enlightening.

At idle all 4 CPUs were at 65 Celcius

then I put them under load and they climbed to 100 Celcius and crashed the machine.

So I bought a new MB P7P55D EVO and an I7 860 with intent to replace MB & CPU.

Got them home and then I found the bios setting to throttle the CPU voltage and clock on temperature.

That kept it from crashing but it still climbed to 100 and rocked back to 88, then back up. I found a number of people with heat problems on my quad core CPU.

Then one of my P4's crashed completely, leaving me a viable case. oy.   :Confused: 

Another trip to the store, this time to buy a better than stock CPU cooler in the hopes of getting the quad core running cooler.

Settled on a ocz vendentta 2.   :Very Happy: 

after installing that behemouth (it leaves a few inches clearance in the case w/120mm fan on the side):

idle is around 43 Celcious and under load, they are hitting 50 Celcius and screaming!   :Cool: 

Moved it into the old P4 case and it is running cool & great.

So now I am off to work through the bleeding edge of the i7 / p5 / Realtek 8168 mess. Creating another thread for that adventure.

Thanks for all the help.

----------

