# Bonding stopped working after power outage

## chrisk2305

Hi,

I have a problem with my Dual 10Gbe NIC and bonding since a power outage a few days ago. After cold booting I noticed that my server had no connectivity. Bond was brought up normally though. Checked the switch config and everything seemed fine. Cleared mac adress table on the switch and rebooted - but still no luck.

I haven't seen any errors in the log.

I could ping other devices in the same subnet though but was not able to reach the gateway. Did a traceroute and it took 3000ms! to reach the gateway. Then I disconnected one of the two fibre cables and voila internet, etc. was working again.

Do you guys have any idea what the problem could be?

Thanks in advance,

Christian

----------

## szatox

I suppose that bond is connected with at least 2 wires to a managed switch.

Is the same mode configured on both ends of the link? Mismatch at this point will only let it work by accident (so you should have _sort_of_ connectivity), but the packet loss that can occur in such scenario would make any smart protocol repeat at reduced rate, and then reduce rate and repeat again, and reduce rate and repeat.... 

Just a guess.

Providing some more details on your setup and pointing out devices that were affected by power outage could allow for another guess.

Also, do you often restart pieces of your equipment? Perhaps you hotfixed setup in runtime on some device and forgot to make this change permanent.

----------

## chrisk2305

Hi,

sorry I did not provide enough infomation. Yes the bond consists of two LC Cables with the appropriate SFP+ Modules. Has been working for 6 months without a problem.

Switch is a D-Link DGS-1510-28X and the NIC in the Server is Dual Port 10GBe with Broadcom Chipset (NetXtreme II driver). Bond is configured via netctl.

here the output of the bonding status with one nic (eth3) disconnected:

```

 cat /proc/net/bonding/bond4

Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation

Transmit Hash Policy: layer2 (0)

MII Status: up

MII Polling Interval (ms): 100

Up Delay (ms): 0

Down Delay (ms): 0

802.3ad info

LACP rate: slow

Min links: 0

Aggregator selection policy (ad_select): stable

System priority: 65535

System MAC address: 74:d0:2b:98:c2:25

Active Aggregator Info:

        Aggregator ID: 1

        Number of ports: 1

        Actor Key: 13

        Partner Key: 1

        Partner Mac Address: 00:00:00:00:00:00

Slave Interface: eth2

MII Status: up

Speed: 10000 Mbps

Duplex: full

Link Failure Count: 0

Permanent HW addr: 74:d0:2b:98:c2:25

Slave queue ID: 0

Aggregator ID: 1

Actor Churn State: none

Partner Churn State: churned

Actor Churned Count: 0

Partner Churned Count: 1

details actor lacp pdu:

    system priority: 65535

    system mac address: 74:d0:2b:98:c2:25

    port key: 13

    port priority: 255

    port number: 1

    port state: 77

details partner lacp pdu:

    system priority: 65535

    system mac address: 00:00:00:00:00:00

    oper key: 1

    port priority: 255

    port number: 1

    port state: 1

Slave Interface: eth3

MII Status: down

Speed: Unknown

Duplex: Unknown

Link Failure Count: 1

Permanent HW addr: 74:d0:2b:98:c2:27

Slave queue ID: 0

Aggregator ID: 2

Actor Churn State: churned

Partner Churn State: churned

Actor Churned Count: 2

Partner Churned Count: 2

details actor lacp pdu:

    system priority: 65535

    system mac address: 74:d0:2b:98:c2:25

    port key: 0

    port priority: 255

    port number: 2

    port state: 69

details partner lacp pdu:

    system priority: 65535

    system mac address: 00:00:00:00:00:00

    oper key: 1

    port priority: 255

    port number: 1

    port state: 1

```

Here is the netctl service file:

```

Description='Bond Interface'

Interface='bond4'

Connection=bond

BindsToInterfaces=('eth2' 'eth3')

IP=static

Address=('192.168.1.2/24')

Gateway=('192.168.1.1')

DNS=('192.168.1.1')

```

Kernel Options in grub.conf

```

title Gentoo Linux 4.6.4

root (hd0,0)

kernel /boot/vmlinuz-4.6.4-gentoo root=/dev/md125 init=/usr/lib/systemd/systemd bonding.mode=4 bonding.miimon=100

```

Thanks!

----------

## bbgermany

Hi,

have you checked your switch, whether it still has a valid bond/lacp/etherchannel/portchannel configuration on the ports where your server is attached to?

greets, bb

----------

## chrisk2305

yes I checked the switch and everything is fine there. I just rebooted the server with only one cable attached (which worked) and had no connectivty. Just out of curiosity I attached the second cable (bond worked fine and enslaved eth3) but still no connectivity. Then I removed the second calbe again and voila connectivity was there.

here the dmesg output:

```

97.020592] bond4: link status definitely up for interface eth3, 10000 Mbps full duplex

[   97.021183] bond4: first active interface up!

[   97.021788] IPv6: ADDRCONF(NETDEV_CHANGE): macvtap0: link becomes ready

[  241.396098] bond4: Removing an active aggregator

[  241.396397] bond4: Releasing backup interface eth2

[  241.396682] bond4: the permanent HWaddr of eth2 - 74:d0:2b:98:c2:25 - is still in use by bond4 - set the HWaddr of eth2 to a different address to avoid conflicts

[  241.397328] bond4: first active interface up!

[  241.676587] bond4: Removing an active aggregator

[  241.676848] bond4: Releasing backup interface eth3

[  242.069901] bond4 (unregistering): Released all slaves

[  242.102068] IPv6: ADDRCONF(NETDEV_UP): bond4: link is not ready

[  242.102404] 8021q: adding VLAN 0 to HW filter on device bond4

[  242.651348] bnx2x 0000:03:00.0 eth2: using MSI-X  IRQs: sp 55  fp[0] 57 ... fp[7] 64

[  242.902742] 8021q: adding VLAN 0 to HW filter on device eth2

[  242.940906] bnx2x 0000:03:00.0 eth2: NIC Link is Up, 10000 Mbps full duplex, Flow control: none

[  242.942916] bond4: Enslaving eth2 as a backup interface with an up link

[  243.459481] bnx2x 0000:03:00.1 eth3: using MSI-X  IRQs: sp 65  fp[0] 67 ... fp[7] 74

[  243.716956] 8021q: adding VLAN 0 to HW filter on device eth3

[  243.755862] bnx2x 0000:03:00.1 eth3: NIC Link is Up, 10000 Mbps full duplex, Flow control: none

[  243.757492] bond4: Enslaving eth3 as a backup interface with an up link

[  243.758101] IPv6: ADDRCONF(NETDEV_CHANGE): bond4: link becomes ready

[  286.267443] bnx2x 0000:03:00.1 eth3: NIC Link is Down

[  286.268483] bnx2x 0000:03:00.1 eth3: speed changed to 0 for port eth3

[  286.301446] bond4: link status definitely down for interface eth3, disabling it

```

----------

## chrisk2305

Hi again,

I double checked the switch config and saw that the protocol was changed to static instead of lacp. 

Thanks for you help guys...I was just blind  :Wink: 

----------

