# Kernel Bug net/core/skbuff.c:127

## Jarli

 *Quote:*   

> 
> 
> 870499.556664] skbuff: skb_over_panic: text:ffffffff816ba890 len:1568 put:289 head:ffff880f76ca7000 data:ffff880f76ca7160 tail:0x780 end:0x6c0 dev:<NULL>
> 
> [870499.556685] ------------[ cut here ]------------
> ...

 

This is a bug that I had this morning on my cluster. I am not the most skilled Gentoo person around, and I am learning the system. This system was setup by a vendor, and we are trying to find out what caused this and what the fix is. 

Any help would be greatly appreciated. 

There are numerous sites around google that are similar but not exact. 

http://www.serverphorums.com/read.php?12,341550

The one above is plastered just about everywhere I can find.

----------

## ce110ut

Hello Jarli,

Can you share the following information:

- what kernel version are you using?

The following will list the kernels installed on the host in question:

```
equery list sys-kernel/*
```

The following will show which (if multiple kernels installed) is active:

```
eselect kernel list
```

- how often does this happen?

- any diagnostics you can share?  Is the host under unusual load when this happens?

----------

## Jarli

cannon1 ~ # equery list sys-kernel/*

 * Searching for * in sys-kernel ...

[I--] [??] sys-kernel/gentoo-sources-3.2.1-r2:3.2.1-r2

[IP-] [  ] sys-kernel/gentoo-sources-3.3.8:3.3.8

[IP-] [  ] sys-kernel/gentoo-sources-3.4.9:3.4.9

[I--] [??] sys-kernel/gentoo-sources-3.5.2:3.5.2

[IP-] [  ] sys-kernel/gentoo-sources-3.5.7:3.5.7

[I--] [??] sys-kernel/git-sources-3.3_rc1:3.3_rc1

[IP-] [  ] sys-kernel/linux-headers-3.4:0

cannon1 ~ # eselect kernel list

Available kernel symlink targets:

  [1]   linux-3.2.1-gentoo-r2

  [2]   linux-3.3.8-gentoo

  [3]   linux-3.4.9-gentoo

  [4]   linux-3.5.2-gentoo

  [5]   linux-3.5.7-gentoo *

This is the first time we've had this issue occur. Just a two weeks ago we had a raid controller boot from the wrong drive and revert all of the data by a week. Fortunately I have backups of all the system data and was able to restore the data. I have a bios update that I have to apply to fix this issue from the MB manufacturer. 

This seems to have occurred last evening at some time, but I can't be certain. Everything was operation at 5PM when I left for the day.

----------

## ce110ut

Is your raid controller 'new' or recent?

I ask because the message may be misleading.

The last time I dealt with this was several years ago.  The company where I worked at the time (~2004) migrated from 2.4 to 2.6 kernel.  Most of the gear we had was older and the drivers worked, save for the new network controllers we had.

The vendor didn't officially support 2.6 kernel but they did provide us with an release candidate driver.  We noticed intermittent kernel panics and the last log line mentioned skbuff and SMP - just like yours.

It turned out that the driver didn't have blocks on a certain buffer.  The driver presumably worked fine under a single-core processor / system.  With SMP, some buffer was prone to a race condition which lead to the skbuff facility to throw a panic.  

That said, I can only see this happening if you're using new drivers.  If you're running hardware that requires external drivers, that MAY be the problem.  Other than that, I'm guessing you'll have to test and see if you can forcibly reproduce the panic.

----------

## Jarli

The cluster system is hardly a year old. 

Drivers possibly an issue, as I said before I do have to update the bios on the raid controller to resolve another issue. 

But at this point you believe the issue to be a drivers issue with the raid controller to cause this then?

----------

## ce110ut

It's hard to say definitively, but that is where my money is given the information you shared.  I strongly recommend that your next course of action is to do research.

I'd ask your team the following questions:

Did it only panic the two times you mentioned?  

Do you have any monitoring that reports host activity for all the nodes in the cluster?  

If so, How are the other nodes behaving?  

Are the nodes the same build, tin and OS?

----------

