# CPU temp above threshold

## LIsLinuxIsSogood

Turning to the forum for some help with whatever is going on with ACPI and CPU temp (especially!). What is this and should I go about to be fixing it?  Do these messages seem at all related?

```
dmesg

[    0.580397] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20170303/psargs-364)

[    0.581508] ACPI Error: Method parse/execution failed [\_SB.PCI0.SAT0.SPT5._GTF] (Node ffff8802160af460), AE_NOT_FOUND (20170303/psparse-543)

[    0.583843] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20170303/psargs-364)

[    0.585060] ACPI Error: Method parse/execution failed [\_SB.PCI0.SAT0.SPT3._GTF] (Node ffff8802160af370), AE_NOT_FOUND (20170303/psparse-543)

[    0.587609] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20170303/psargs-364)

[    0.588316] ACPI Error: Method parse/execution failed [\_SB.PCI0.SAT0.SPT1._GTF] (Node ffff8802160af280), AE_NOT_FOUND (20170303/psparse-543)

[    0.589741] ata2.00: ATAPI: HL-DT-ST DVD-RW GSA-H60L, DC07, max UDMA/100

[    0.590490] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20170303/psargs-364)

[    0.591246] ACPI Error: Method parse/execution failed [\_SB.PCI0.SAT0.SPT0._GTF] (Node ffff8802160af208), AE_NOT_FOUND (20170303/psparse-543)

[    0.592843] ata6.00: ATA-9: WDC WD30EZRX-00DC0B0, 80.00A80, max UDMA/133

[    0.593623] ata6.00: 5860533168 sectors, multi 16: LBA48 NCQ (depth 31/32), AA

[    0.594412] ata4.00: ATA-8: ST1000LM024 HN-M101MBB, 2AR10002, max UDMA/133

[    0.595193] ata4.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 31/32), AA

[    0.596252] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20170303/psargs-364)

[    0.597070] ACPI Error: Method parse/execution failed [\_SB.PCI0.SAT0.SPT1._GTF] (Node ffff8802160af280), AE_NOT_FOUND (20170303/psparse-543)

[    0.598716] ata2.00: configured for UDMA/100

[    0.599565] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20170303/psargs-364)

[    0.600423] ACPI Error: Method parse/execution failed [\_SB.PCI0.SAT0.SPT5._GTF] (Node ffff8802160af460), AE_NOT_FOUND (20170303/psparse-543)

[    0.601820] usb 1-1: new high-speed USB device number 2 using ehci-pci

[    0.603619] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20170303/psargs-364)

[    0.604547] ACPI Error: Method parse/execution failed [\_SB.PCI0.SAT0.SPT3._GTF] (Node ffff8802160af370), AE_NOT_FOUND (20170303/psparse-543)

...[

  407.369289] ata1.00: exception Emask 0x10 SAct 0x400000 SErr 0x280100 action 0x6 frozen

[  407.369292] ata1.00: irq_stat 0x08000000, interface fatal error

[  407.369294] ata1: SError: { UnrecovData 10B8B BadCRC }

[  407.369296] ata1.00: failed command: READ FPDMA QUEUED

[  407.369300] ata1.00: cmd 60/00:b0:28:00:07/01:00:12:00:00/40 tag 22 ncq dma 131072 in

                        res 40/00:b0:28:00:07/00:00:12:00:00/40 Emask 0x10 (ATA bus error)

[  407.369302] ata1.00: status: { DRDY }

[  407.369305] ata1: hard resetting link

[  407.679403] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)

[  407.690316] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20170303/psargs-364)

[  407.690322] ACPI Error: Method parse/execution failed [\_SB.PCI0.SAT0.SPT0._GTF] (Node ffff8802160af208), AE_NOT_FOUND (20170303/psparse-543)

[  407.710284] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20170303/psargs-364)

[  407.710290] ACPI Error: Method parse/execution failed [\_SB.PCI0.SAT0.SPT0._GTF] (Node ffff8802160af208), AE_NOT_FOUND (20170303/psparse-543)

[  407.720350] ata1.00: configured for UDMA/133

[  407.720363] ata1: EH complete

...

[  883.456241] CPU0: Core temperature above threshold, cpu clock throttled (total events = 1)

[  883.456243] CPU1: Package temperature above threshold, cpu clock throttled (total events = 1)

[  883.456246] CPU0: Package temperature above threshold, cpu clock throttled (total events = 1)

[  883.457245] CPU0: Core temperature/speed normal

[  883.457247] CPU1: Package temperature/speed normal

[  883.457249] CPU0: Package temperature/speed normal

[ 1183.689492] CPU0: Core temperature above threshold, cpu clock throttled (total events = 48376)

[ 1183.689494] CPU1: Package temperature above threshold, cpu clock throttled (total events = 48376)

[ 1183.689496] CPU0: Package temperature above threshold, cpu clock throttled (total events = 48376)

[ 1183.690530] CPU0: Core temperature/speed normal

[ 1183.690532] CPU0: Package temperature/speed normal

[ 1183.690543] CPU1: Package temperature/speed normal

[ 3869.784891] CPU0: Core temperature above threshold, cpu clock throttled (total events = 49284)

[ 3869.784893] CPU1: Package temperature above threshold, cpu clock throttled (total events = 49284)

[ 3869.784895] CPU0: Package temperature above threshold, cpu clock throttled (total events = 49284)

[ 3869.785917] CPU0: Core temperature/speed normal

[ 3869.785919] CPU1: Package temperature/speed normal

[ 3869.785921] CPU0: Package temperature/speed normal

```

----------

## eccerr0r

They look separate from each other, your cpu probably really was overheating if you were running it under heavy load.  I get those a lot during the summer, I may also need to really clean my heatsink/fan and get some fresh thermal interface material...

----------

## russK

I don't know what the ACPI issue is.

The CRC error with the disk is potentailly bad, but the disk may have handled it gracefully.  It happened around 8 minutes before the cpu.

It's a good idea to keep heatsinks free of dirt and fans running OK.  If machine is in a warm place, help it out.

Heat can effect a hard drive too.  You may want to check the hard drive smartctl info.  smartctl can also tell you temperature.

https://wiki.gentoo.org/wiki/Smartmontools

----------

## Section_8

I see those temperature messages sometimes, as my system runs a couple of boinc projects.  I wouldn't be surprised to see them if you're compiling some big package.

----------

## LIsLinuxIsSogood

The disk error is still happening today...could I please get some help with these very foreign looking messages I do not require detailed explanation, but more a simple overview...

Thanks to those with suggestions, like the steps I will be taking to implement SMART, in the meantime what can I do to query the hardware to check it or find out more info.  I am still unaware how the device is being referred to in the output of the log as ata1?  How does that correspond to /dev/sdx for naming of disks?  Does that refer to the entire disk or just a partition of the disk, and where can I find this information?  

I can't seem to understand the messages except that "bus error" and "hard resetting link"  those don't seem good...I know that much but that's all I know. Help!  

```

[68437.088944] ata1.00: exception Emask 0x10 SAct 0x10 SErr 0x280100 action 0x6 frozen

[68437.088946] ata1.00: irq_stat 0x08000000, interface fatal error

[68437.088948] ata1: SError: { UnrecovData 10B8B BadCRC }

[68437.088950] ata1.00: failed command: READ FPDMA QUEUED

[68437.088954] ata1.00: cmd 60/00:20:80:00:de/01:00:03:00:00/40 tag 4 ncq dma 131072 in

                        res 40/00:20:80:00:de/00:00:03:00:00/40 Emask 0x10 (ATA bus error)

[68437.088955] ata1.00: status: { DRDY }

[68437.088959] ata1: hard resetting link

[68437.394033] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)

[68437.404948] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20170303/psargs-364)

[68437.404957] ACPI Error: Method parse/execution failed [\_SB.PCI0.SAT0.SPT0._GTF] (Node ffff8802160af208), AE_NOT_FOUND (20170303/psparse-543)

[68437.424951] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20170303/psargs-364)

[68437.424961] ACPI Error: Method parse/execution failed [\_SB.PCI0.SAT0.SPT0._GTF] (Node ffff8802160af208), AE_NOT_FOUND (20170303/psparse-543)

[68437.434961] ata1.00: configured for UDMA/133

[68437.434977] ata1: EH complete
```

----------

## russK

I'm no expert but I'll take a stab at some of it ...

The errors are a repeat & similar to the first post.

Both times, the initial messages happened within one second, as in the 2nd episode shown by [68437.nnnnnn].

Within the first 11 microseconds of the first message, the driver decided to reset the link.  I suspect the drive was trying in earnest to read some unrecoverable data at 68437.088944, 68437.088946, 68437.088948 and 68437.088950 and the driver was either not prepared to hear the bad news or simply impatient.  The driver may have been upset about how long it took for the drive to respond.  Some of this is conjecture on my part.

Finally the driver decided to reset the link to get back to a known state.  At 68437.434977 the reset was complete and the drive was ready.

I suspect there is some data on the disk that was not recorded properly.  Every time the drive tries to read the data, it retries over and over and then finally gives up.  During this period, the driver gets a little impatient and resets the link.

smartctl can tell you about the health of the drive.

There are tools and web pages about discovering bad blocks and getting them re-allocated.

ddrescue is a good tool for recovering data from a bad drive.

Your drive may or may not be in need of replacement.  I would use smartctl to see if the drive believes it is healthy.

Depending on how many drives you have, something like this might be useful:

```
# for d in /dev/sd? ; do echo ========= DISPLAY INFO FOR $d ================== && smartctl -a $d ; done | less

```

----------

## russK

Also note, sometimes errors like this can be due to the cables or bad connections.

----------

## Small_Penguin

You can also investigate with sys-apps/gsmartcontrol which has a more user-friendly interface.

----------

## bunder

```
[ 3869.784893] CPU1: Package temperature above threshold, cpu clock throttled (total events = 49284)

[ 3869.784895] CPU0: Package temperature above threshold, cpu clock throttled (total events = 49284)

[ 3869.785917] CPU0: Core temperature/speed normal

[ 3869.785919] CPU1: Package temperature/speed normal

[ 3869.785921] CPU0: Package temperature/speed normal
```

Intel Turbo Boost.  You can disable it with

```
echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
```

but you'll probably have to keep toggling it if you're on a laptop since every time you plug/unplug it gets reset back to 0 (on).

----------

