# what is this log entry "Machine check events logged"

## snIP3r

hi all!

recentrly i checked my /var/log/messages an i found these entries:

```

Jan  1 17:48:58 area52 Machine check events logged

Jan  1 17:55:13 area52 Machine check events logged

Jan  1 18:01:19 area52 Machine check events logged

Jan  1 18:15:41 area52 Machine check events logged

Jan  1 18:21:56 area52 Machine check events logged

Jan  1 18:29:26 area52 Machine check events logged

Jan  1 18:38:10 area52 Machine check events logged

Jan  1 18:45:40 area52 Machine check events logged

Jan  1 18:53:10 area52 Machine check events logged

Jan  1 18:58:47 area52 Machine check events logged

Jan  1 19:05:02 area52 Machine check events logged

```

there are no more log-entries like these - not before nor after. i dont know what they mean and who produced them. i also checked my emerge log - i havent installed any package at this timestamp or before that could produce these messages.

can someone describe what this means? do i have to worry?

thx in advance

snIP3r

----------

## bunder

those are from the machine check exception option from the kernel.

http://en.wikipedia.org/wiki/Machine_Check_Exception

 *Quote:*   

> MCE: The hardware reports a non fatal, correctable incident occurred on CPU 0.
> 
> Bank 1: 9400000000000151
> 
> MCE: The hardware reports a non fatal, correctable incident occurred on CPU 0.
> ...

 

not sure why it doesn't give you more info compared to the one i pasted above though.  i'd check for overheating or ram errors.

cheers

----------

## snIP3r

is this cause i do not have installed mcelog-package?

----------

## snIP3r

ok, i installed mcelog and found numerous errors like this:

```

MCE 31

HARDWARE ERROR. This is *NOT* a software problem!

Please contact your hardware vendor

CPU 1 0 data cache TSC 1c39af3c77c74b

ADDR 4e74af80

  Data cache ECC error (syndrome 97)

       bit46 = corrected ecc error

       bit62 = error overflow (multiple errors)

  memory/cache error 'data read mem transaction, data transaction, level 2'

STATUS d44bc00000000136 MCGSTATUS 0

```

does this mean that my cpu is broken? or my ram? i encountered no problems with the cpu or ram. everything runs fine so far. i have a _very_ stable system, no lockups so far. its a pitty that no timestamp is printed out so i cannot determine the exact occurence of the errors.

perhaps someone can give me some advise!

EDIT: there is something strange happening: now these messages accumulate. the last 45 mins i got 7 messages!?!?!?

EDIT2: i did a little research on my gentoo box. found out that the load increased constantly while no one was doing anything on it. then i saw a bash process with 100% cpu usage. temp of cpu also increased. after killing this bash process, everything went to normal - and also these messages disappeared. for 1 hour i got no more "Machine check events logged" messages  :Wink: 

but nevertheless i still want to know if these messages mean an hardware error?

greets

snIP3r

----------

## bunder

 *Quote:*   

> but nevertheless i still want to know if these messages mean an hardware error? 

 

yep.  if you read the wiki link i pasted, it gives many causes of these errors.

cheers

----------

## snIP3r

hi again!

i tried to reproduce the error messages by comiling the kernel with makeopts="-j3" and parallel i unrared a big file on an encrypted filesystem (overall i got a load of 3 and core0 temp 30°  and core1 temp 45° - no error messages are displayed. i also checked the logs again and encountered that the messages appear form 03.00 to 06.30 in the morning. at this time my cron.daily scripts are executed:

```

area52 cron.daily # ls -la

total 32

drwxr-x---  2 root root  115 Jan  8 16:29 .

drwxr-xr-x 67 root root 8192 Jan  8 17:24 ..

-rw-r--r--  1 root root    0 Apr 17  2007 .keep

-rw-r--r--  1 root root    0 Sep 28 21:15 .keep_sys-process_cronbase-0

-r-xr-xr-x  1 root root 7386 Jul  5  2007 dccd

-rwxr-xr-x  1 root root   52 Jul  5  2007 logrotate.cron

-rwxr-xr-x  1 root root  115 Jul  5  2007 makewhatis

-rwxr-xr-x  1 root root  121 Jan  8 16:29 mcelog

```

but checking the munin and hotsanic data i cannot see any stressing of the cpu (no load, no higher cpu usage or temperature), so i do not know what script triggers the messages... ok, i ran every scriopt manually and none of these produces the messages... start searching from the beginning...

and again: the machine runs as normal, everything seems to be ok. 

greets

snIP3r

----------

## snIP3r

hi all!

i still hope that someone could help me with this. i encountered that this issue has something to do with the cron jobs running on my system. today i changed the hour of the running cron.daily jobs. and ~10 minutes after the cron.daily jobs are started, the messages were displayed in my log.

but still, executing the scripts manually does not cause the messages. so i don't know what else might be a reason for these messages.

can anyone help?

btw: calling amd technical support does also not bring any solution  :Sad: 

greets

snIP3r

----------

