# Laptop switch off due to ACPI thermal zone [solved]

## mv

Hello,

I have an old Gericom laptop with probably a broken BIOS and perhaps broken thermal sensors.

Since ages, this laptop used to slow down after 15 minutes, apparently because it reports some overheating.

Removing the thermal module had avoided this slowdown.

Since several kernel versions there is no longer a slowdown, but the laptop simply switches off (after about 15 minutes, or even earlier if it is hot), rendering it practically unusable.

I am completely aware that it may break the laptop if running it possibly overheated, but I would be willing to take this risk (after all, it worked for years without the slowdown and probably thermal sensors reporting way too high temperature). However, it seems that I am unable to configure the kernel to avoid this switching off.

This is what I tried: Removing the thermal module does not help. After inserting the thermal module, I see in /sys/devices/virtual/thermal/thermal_zone0/ the data which obviously causes the problem: Files like

 *sys/devices/virtual/thermal/thermal_zone0/ wrote:*   

> trip_point_0_temp
> 
> trip_0_type # content: "critical"
> 
> trip_point_1_temp
> ...

 

I do not remember the content of trip_point_?_temp, but these numbers are rather low (much lower than the examples in the kernel sources), and unfortunately these files are not writable.

This is what is happening when I switch on the cool laptop: The fan is active, but nevertheless temp is increasing. When it reaches trip_point_1_temp after about 2-3 minutes, the fan is switched off (I guess the opposite should be happening, but as I said, the BIOS is probably rather broken. BTW: the cooling_device0 (=fan) entry shows in all cases that the fan is switched on).

The temp continues to raise until it reaches trip_point_0_temp, and then the machine switches off.

The bad thing is that the only writable file is "mode" with content "enabled", but even if I switch it to "disabled", nothing changes.

It seems that the only reasonable thing which I might do is to change the "critical" behaviour from trip_0_type, but as mentioned above these files are not writable. On the other hand, this cannot be a pure BIOS built-in, because with previous kernels this led only to a slowdown (and by removing the kernel module the action could be suppressed completely). So I guess that I would have to patch the kernel somehow, but it is not even roughly clear to me which kernel code is responsible for the switching off: After all, the switching off also happens if the thermal module is removed.

I know, everything would be a hack, but OTOH, everything (including death of laptop by overheating) is better than having a practically unusable laptop.Last edited by mv on Thu Aug 15, 2013 12:33 am; edited 1 time in total

----------

## eccerr0r

The emergency shutdown could also be triggered by hardware as a failsafe for software ignoring the error condition.  What happens if you go back to the old kernel?

I'd personally check to make sure the heat problem isn't a real problem like heatsink paste cracking or poorly installed/broken heatsinks.

----------

## mv

 *eccerr0r wrote:*   

> The emergency shutdown could also be triggered by hardware as a failsafe for software ignoring the error condition.

 

Yes, that's what I thought, too, but then why didn't it happen previously? The time it works is rather exactly the time when previous kernels wanted to slow down the processor (which I omitted).

 *Quote:*   

> What happens if you go back to the old kernel?

 

Unfortunately, I do not have anymore such an old kernel - I do not even know when it started since I switched the laptop on only very briefly in the last months. The oldest kernel I have is 3.8.somthing which already shows the problem. Of course, I might go by trial-and-error, but since I do not expect that this solves my problem...

 *Quote:*   

> I'd personally check to make sure the heat problem isn't a real problem like heatsink paste cracking or poorly installed/broken heatsinks.

 

It is not possible to open anything in this laptop, and it does not feel hotter than it did previously. I used a vacuum cleaner just to be sure and I cool from outside by a fan, but it does not change anything. The trigger temperature is just too low.

----------

## PaulBredbury

Hack the DSDT, remove/alter the trip-point, then compile a new kernel to use your amended DSDT.

Some commands to point you in the right direction:

```
emerge sys-power/iasl

cat /sys/firmware/acpi/tables/DSDT > dsdt.dat

iasl -d dsdt.dat &&

iasl -tc dsdt.dsl

# To compile to hex

iasl -tc dsdt.dsl

cp dsdt.hex /lib/firmware/
```

The kernel must be recompiled, to incorporate the contents of /lib/firmware/dsdt.hex - cannot simply update /lib/firmware/dsdt.hex and reboot.

Edit: Also, kernel config:

```
$ zgrep DSDT /proc/config.gz 

CONFIG_ACPI_CUSTOM_DSDT_FILE="/lib/firmware/dsdt.hex"

CONFIG_ACPI_CUSTOM_DSDT=y
```

----------

## eccerr0r

Wait - will the machine shut down if the ACPI interpreter in the kernel isn't enabled (thus DSDT isn't used)?

----------

## mv

 *eccerr0r wrote:*   

> Wait - will the machine shut down if the ACPI interpreter in the kernel isn't enabled (thus DSDT isn't used)?

 

How to disable the ACPI interpreter? Unselect ACPI completely in the kernel config? I tried this, and the laptop still switches off.

----------

## PaulBredbury

 *mv wrote:*   

> when I switch on the cool laptop: The fan is active, but nevertheless temp is increasing. When it reaches trip_point_1_temp after about 2-3 minutes, the fan is switched off (I guess the opposite should be happening

 

LOL, yeah. Maybe your CPU needs its thermal grease re-applied - that stuff is absolutely critical, along with a good connection to the cooler, otherwise the CPU can critically overheat in literally seconds, and then the CPU itself might be telling the motherboard to shut down before melting.

Also, use the Linux tools I mentioned, to get the dsdt.dsl file for your BIOS, and take a look for obvious brokenness. Also of interest is different behaviour when Linux is detected, or indeed even different versions of Windows. E.g.:

```
If (_OSI ("Windows 2001 SP1"))
```

----------

## mv

 *PaulBredbury wrote:*   

> Maybe your CPU needs its thermal grease re-applied - that stuff is absolutely critical, along with a good connection to the cooler, otherwise the CPU can critically overheat in literally seconds, and then the CPU itself might be telling the motherboard to shut down before melting.

 

Since switching the ACPI off did not solve the problem, I guess now that I cannot solve it from the software side.

 *Quote:*   

> Also, use the Linux tools I mentioned, to get the dsdt.dsl file for your BIOS, and take a look for obvious brokenness.

 

I did not understand much of the output and did not even find the values for the trip points (although I found a section called "thermal device"). There are no strange "if" conditions for the system - probably the laptop is too old for this.

----------

## eccerr0r

I would definitely take the machine apart and inspect the thermal interface from the CPU to its heatsink. Not saying it's easy to disassemble the laptop, but I've taken many laptops apart and worked with the heatsink, and yes it tends to be a lot of hidden screws everywhere...  Look for a service manual if possible.

I guess this will be a semi-plug for at least some HP laptops, there are service manuals available that describe how to open them... I'm sure other manufacturers have them too, it's worth checking.  High tech high stakes jigsaw puzzle?

Oh, and make sure you remember where each screw came from...

----------

## mv

I managed now to open the laptop by complete induction from the oppositve end. Using a vacuum cleaner once more and keeping it open, it now runs since 2 hours, so I guess it really is a hardware problem.

----------

