# [SOLVED] Comm: swapper/1 Not tainted

## mcbarlo

Several times per day I notice this in logs:

```
May  9 09:46:30 bgp kernel: NMI backtrace for cpu 1 

May  9 09:46:30 bgp kernel: CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.12.13-gentoo #5 

May  9 09:46:30 bgp kernel: Hardware name: IBM IBM System X3250 M4 -[2583E1G]-/00D3729, BIOS -[JQE142CUS-1.01]- 05/14/2012 

May  9 09:46:30 bgp kernel: task: ffff88007eb73d50 ti: ffff88007eb9e000 task.ti: ffff88007eb9e000 

May  9 09:46:30 bgp kernel: RIP: 0010:[<ffffffff812ab127>]  [<ffffffff812ab127>] intel_idle+0xc7/0x130 

May  9 09:46:30 bgp kernel: RSP: 0018:ffff88007eb9fdf8  EFLAGS: 00000046 

May  9 09:46:30 bgp kernel: RAX: 0000000000000020 RBX: 0000000000000008 RCX: 0000000000000001 

May  9 09:46:30 bgp kernel: RDX: 0000000000000000 RSI: ffff88007eb9ffd8 RDI: 0000000000000001 

May  9 09:46:30 bgp kernel: RBP: ffff88007eb9fe28 R08: 0000000000001fd5 R09: 0000000000000018 

May  9 09:46:30 bgp kernel: R10: 000000000000351f R11: 0000000000008d59 R12: 0000000000000004 

May  9 09:46:30 bgp kernel: R13: 0000000000000020 R14: 0000000000000003 R15: ffffffff817ac678 

May  9 09:46:30 bgp kernel: FS:  0000000000000000(0000) GS:ffff88007ee80000(0000) knlGS:0000000000000000 

May  9 09:46:30 bgp kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 

May  9 09:46:30 bgp kernel: CR2: 00007fe831f07030 CR3: 000000000176d000 CR4: 00000000000407e0 

May  9 09:46:30 bgp kernel: Stack: 

May  9 09:46:30 bgp kernel: ffff88007eb9fe28 000000018107cedd ffff88007ee98300 ffffffff817ac500 

May  9 09:46:30 bgp kernel: 00002a71dc2520e1 0000000000000004 ffff88007eb9fe88 ffffffff8142991a 

May  9 09:46:30 bgp kernel: 000000000000001f 0000000002333743 000000000000001f 0000000002333743 

May  9 09:46:30 bgp kernel: Call Trace: 

May  9 09:46:30 bgp kernel: [<ffffffff8142991a>] cpuidle_enter_state+0x4a/0xd0 

May  9 09:46:30 bgp kernel: [<ffffffff81429a3e>] cpuidle_idle_call+0x9e/0x150 

May  9 09:46:30 bgp kernel: [<ffffffff8100a409>] arch_cpu_idle+0x9/0x20 

May  9 09:46:30 bgp kernel: [<ffffffff81076551>] cpu_startup_entry+0x91/0x170 

May  9 09:46:30 bgp kernel: [<ffffffff81028afa>] start_secondary+0x19a/0x1f0 

May  9 09:46:30 bgp kernel: Code: 48 8b 34 25 b0 b7 00 00 48 8d 86 38 e0 ff ff 48 89 d1 0f 01 c8 0f ae f0 48 8b 86 38 e0 ff ff a8 08 75 08 b1 01 4c 89 e8 0f 01 c9 <85> 1d 6b 17 50 00 75 0e 48 8d 75 dc bf 05 00 00 00 e8 03 80 dd 

May  9 09:46:30 bgp bird: xxx: Received: Hold timer expired 

May  9 09:46:30 bgp bird: xxx: Received: Hold timer expired 

May  9 09:46:30 bgp bird: xxx: Received: Hold timer expired 

May  9 09:46:31 bgp bird: xxx: Received: Hold timer expired 

May  9 09:46:31 bgp bird: xxx: Received: Hold timer expired 

May  9 09:46:31 bgp bird: xxx: Received: Hold timer expired
```

At this moment load is very high. CPU, disk etc. do nothing. The worst is BGP sessions are disconnecting. Can it be hardware problem?Last edited by mcbarlo on Wed May 14, 2014 10:07 am; edited 1 time in total

----------

## blu3bird

 *mcbarlo wrote:*   

> 
> 
> ```
> May  9 09:46:30 bgp kernel: NMI backtrace for cpu 1
> ```
> ...

 

Do you know what's causing the http://en.wikipedia.org/wiki/Non-maskable_interrupt? Does your server have some sort of IML?

----------

## mcbarlo

Unfortunatelly I don't know.

----------

## aCOSwt

hmmm... the hardware watchdog timer I presume.

I'm sure you know this one : http://publib.boulder.ibm.com/infocenter/systemx/documentation/index.jsp?topic=/com.ibm.sysx.2583.doc/c_using_imm.html

could you post the result of 

```
cat /usr/src/linux/.config | grep NMI
```

?

And BTW, check your BIOS version. Several fixes regarding NMI : http://www-947.ibm.com/support/entry/portal/docdisplay?lndocid=MIGR-5086587

----------

## mcbarlo

```
CONFIG_OPROFILE_NMI_TIMER=y

CONFIG_HAVE_PERF_EVENTS_NMI=y

CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG=y
```

IMM is not configured (default settings). 

I will check BIOS version when I have physical access to server. One of my x3250 have updated BIOS but I'm not sure with one.

What should I do? Turn off watchdog in IMM?

----------

## aCOSwt

 *mcbarlo wrote:*   

> What should I do? Turn off watchdog in IMM?

 

 :Shocked:   :Confused: 

Hmmm... only you can answer this one I presume.

Of course, as long as *you* decide *you* do not care detecting hangs, *you can* disable the watchdog.

http://pic.dhe.ibm.com/infocenter/lnxinfo/v3r0m0/index.jsp?topic=%2Fliaai.crashdump%2Fliaaicrashdumpnmiwatch.htm

BTW, That can't be part of the help I provide here to tell that you should or should not care detecting hangs.

----------

## mcbarlo

Ok, I understand. I will move disks to another server. This should tell me it is a hardware or software problem. Thank you for your reply.

----------

## mcbarlo

I solved problem I think. Turn off C1E state in BIOS. Router is working about 30h without any problems.

----------

