# [TIP] stoping the box when dead fan or overheat

## doublehp

Since i was born ... I wanted my desktop to automatically shutdown in any of those cases:

- UPS has low battery

- CPU overheat

- any fan is dead

UPS has low batt is a trivial case: I want to sync disks before crashing  :Smile: 

CPU overheat is also an obvious case, but few people implement it. You just need ACPI or sensors monitoring.

Dead fan is a bit more tricky. A standard system rarely can monitor one, or two fans. No standard desktop can ever minotor the most important fan: PSU !!! 

Many system monitoring daemons can do all this for you: monitor probes, send emails when WARNING events are met. Fewer can perform active actions, like ... really shuting the system down. After spending two days in MRG, Nagios, Munin, and Cacti ... I ended up with nice webpages generated by MRTG and Munin, but none of them could perform actions. So, I now have to perform actions my self.

PSU was the most important problem for me. Because, if most of us can configure the BIOS or Sensord to track the CPU fan ... the PSU fan has no feedback at all. You can indirectly track a CPU fan problem, by comparing the temperature of CPU against iddle time and frequency. But there is no direct way to probe the PSU fan. When this fan dies, the PSU overheats, burns, and *may* inject in your computer *any* voltage ... including 400V directly to the CPU, or the HDD. This is something you NEVER want to happen.

So, I had to change the hardware. Buy a new 3 wire fan, remove the PSU fan, put the new one, and plug it on the mother board. My Antec PSU adapts fan speed depending on required power, and temperature; so, I had to supply the fan from the PSU, and plug on the MB only the speed feedback wire.

When this is done, we can write software. First configure UPSd and sensors, then, this script will shutdown the box when a fan stops, or CPU overheats for 4 mn:

```
#!/bin/bash

cv="upsc mgeups_a@leon |grep input.bypass.voltage: |cut -d ' ' -f2"

cb="upsc mgeups_a@leon |grep battery.charge: |cut -d ' ' -f2"

C=0

H=0

while /bin/true

do

        if [ `eval $cv` == "0.0" ]

        then

                if [ `eval $cb` -lt 48 ]

                then

                        /usr/bin/wall '!!! Shutting down _ `hostname` _ SOON !!!'

                fi

                if [ `eval $cb` -lt 40 ]

                then

                        /sbin/halt

                fi

        fi

        H=0

        for i in `/usr/bin/sensors | grep -i RPM | cut -d ":" -f2 | cut -d "(" -f1 | sed 's/RPM//' | sed 's/ //g'`

        do

                if [ $i -lt 200 ]

                then

#                       echo $i less than 200

                        /bin/echo "Some fan is too slow ..." | /usr/bin/logger

                        /usr/bin/wall "Some fan is too slow ..."

                        /usr/bin/sensors | /usr/bin/logger

                        H=1

                fi

        done

        for i in `/usr/bin/sensors | grep -i temp | cut -d ":" -f2 | cut -d "(" -f1 | sed 's/RPM//' | tr "+°C" " " | sed 's/ //g' | sed 's/\.0//'`

        do

                if [ $i -gt 70 ]

                then

                        /bin/echo "Some temp is too hot ..." | /usr/bin/logger

                        /usr/bin/wall "Some temp is too hot ..."

                        /usr/bin/sensors | /usr/bin/logger

                        H=1

                fi

                if [ $i -gt 80 ]

                then

                        H=1

                        C=99

                fi

                if [ $i -gt 90 ]

                then

                        /sbin/halt -hp

                fi

        done

        if [ $H == 1 ]

        then

                echo HALT ME

                C=$(($C+1))

                if [ $C -gt 3 ]

                then

#                       echo DOING HALT NOW

                        /bin/echo "Some fan is too slow ... HALTING NOW" | /usr/bin/logger

                        /usr/bin/wall "Some fan is too slow ... HALTING NOW"

                        /sbin/halt

                fi

        else

                C=0

        fi

#       echo "h=$H c=$C"

        sleep 60

done

```

Start this script from inittab, and you are done  :Smile:  Feel free to make the loo faster, or decrease the threasholds.

While we are at it: monitoring daemons remain usefull for one thing: keep an eye on the CPU temperature: if your CPU temp increases with time, whereas the system temp remains the same, it means your heatsink get dusty ! => clean it !!!

----------

## Mad Merlin

FWIW, apcupsd can already monitor your (APC) UPS's battery life and automagically shut down one or more computers as appropriate.

----------

