# Bonding issues

## JeffBlair

Hi all,

  I just got a new Intel Dual NIC. It's working fine, but the problem I'm having is that it's not bonding like I want it to.

I'm looking to put it into mode 4(802.3ad), but it's not doing it. 

  Also, after I reboot the PC, I have to manually put in the route, for some reason, it's not picking up the route statement in my config. 

What am I doing wrong oh network guru's?

```
net:

dns_domain_lo="jeffhome.us"

modules=("ifconfig")

conifg_eth2=( "none" )

conifg_eth3=( "none" )

preup() {

      # Adjusting the bonding mode / MII monitor

      # Possible modes are : 0, 1, 2, 3, 4, 5, 6,

      #     OR

     #   balance-rr, active-backup, balance-xor, broadcast,

      #   802.3ad, balance-tlb, balance-alb

      # MII monitor time interval typically: 100 milliseconds

      if [[ ${IFACE} == "bond0" ]] ; then

#              BOND_MODE="balance-alb"

              BOND_MODE="4"

              BOND_MIIMON="100"

              echo ${BOND_MODE} >/sys/class/net/bond0/bonding/mode

              echo ${BOND_MIIMON}  >/sys/class/net/bond0/bonding/miimon

              einfo "Bonding mode is set to ${BOND_MODE} on ${IFACE}"

              einfo "MII monitor interval is set to ${BOND_MIIMON} ms on ${IFACE}"

      else

              einfo "Doing nothing on ${IFACE}"

      fi

      return 0

 }

 

slaves_bond0="eth2 eth3"

config_bond0="192.168.0.15/24"

routes_bond0="default gw 192.168.0.1"

dns_servers_bond0="68.94.156.1 8.8.8.8"

depend_bond0() {

        need net.eth2 net.eth3

}

```

```
Modules:

modules_2_6="vboxdrv vboxnetflt vboxnetadp"

modules_2_6="${modules_2_6} bonding"

module_bonding_args_2_6="miimon=100 mode=4"

```

```
ifconfig:

bond0: flags=5187<UP,BROADCAST,RUNNING,MASTER,MULTICAST>  mtu 1500

        inet 192.168.0.15  netmask 255.255.255.0  broadcast 192.168.0.255

        ether 00:15:17:cd:19:14  txqueuelen 0  (Ethernet)

        RX packets 18753324  bytes 17337285865 (16.1 GiB)

        RX errors 0  dropped 28  overruns 0  frame 0

        TX packets 18652383  bytes 17651960459 (16.4 GiB)

        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth2: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST>  mtu 1500

        inet 192.168.0.100  netmask 255.255.255.0  broadcast 192.168.0.255

        ether 00:15:17:cd:19:14  txqueuelen 1000  (Ethernet)

        RX packets 3165919  bytes 277580763 (264.7 MiB)

        RX errors 0  dropped 0  overruns 0  frame 0

        TX packets 9331558  bytes 8833223059 (8.2 GiB)

        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

        device interrupt 16  memory 0xfdee0000-fdf00000  

eth3: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST>  mtu 1500

        inet 192.168.0.103  netmask 255.255.255.0  broadcast 192.168.0.255

        ether 00:15:17:cd:19:14  txqueuelen 1000  (Ethernet)

        RX packets 15587405  bytes 17059705102 (15.8 GiB)

        RX errors 0  dropped 0  overruns 0  frame 0

        TX packets 9320825  bytes 8818737400 (8.2 GiB)

        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

        device interrupt 17  memory 0xfdea0000-fdec0000  

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 16436

        inet 127.0.0.1  netmask 255.0.0.0

        loop  txqueuelen 0  (Local Loopback)

        RX packets 52052  bytes 20374589 (19.4 MiB)

        RX errors 0  dropped 0  overruns 0  frame 0

        TX packets 52052  bytes 20374589 (19.4 MiB)

        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

```

```
ifenslave:

The result of SIOCGIFFLAGS on bond0 is 1443.

The result of SIOCGIFADDR is 00.00.ffffffc0.ffffffa8.

The result of SIOCGIFHWADDR is type 1  00:15:17:cd:19:14.

```

----------

## gerdesj

802.3ad probably needs switch support - have you enabled LACP?

Your routes_bond0 needs to read "default via w.x.y.z" 

I notice you are using ipconfig.  Try iproute2 - its much better in general and provides a lot more functionality.  Simply remove the modules ipconfig thing as its (iproute2) the default nowadays and make sure you have emerged iproute2 or set the USE flag and had it pulled in by something else.  You can verify its installed with:

```

#ip a

```

Which will print out your IP address config.

Cheers

Jon

----------

## JeffBlair

gerdesj,

  Thanks for the reply. I've taken out the ifconfig line in the conifg. 

  I also changed the gateway line, but I still couldn't route out with that change. I just changed it to  "routes_" and it works fine now.

  Now the problem I'm having is that my eth3 isn't grabbing an IP address. OR it will just grab it every once and a while. I know the cables good, so I'm not sure what's going on there. And, I think that ifenslave might be reporting wrong. In the /sys/class/net/bond0/bonding/mode file, it's listing it as 0. So at least it's load balancing. And, yes, I have the 2 ports in the same LAG group on my Dell switch.

Any ideas about the eth3 grabbing a dummy address?

```

3: eth2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 state UP qlen 1000

    link/ether 00:15:17:cd:19:14 brd ff:ff:ff:ff:ff:ff

    inet 192.168.0.106/24 brd 192.168.0.255 scope global eth2

4: eth3: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 state UP qlen 1000

    link/ether 00:15:17:cd:19:14 brd ff:ff:ff:ff:ff:ff

    inet 169.254.85.192/16 brd 169.254.255.255 scope global eth3

5: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP 

    link/ether 00:15:17:cd:19:14 brd ff:ff:ff:ff:ff:ff

    inet 192.168.0.15/24 brd 192.168.0.255 scope global bond0

    inet6 fe80::215:17ff:fecd:1914/64 scope link 

       valid_lft forever preferred_lft forever

```

```

Boot messages:

e1000e: Intel(R) PRO/1000 Network Driver - 1.9.5-k

e1000e: Copyright(c) 1999 - 2012 Intel Corporation.

e1000e 0000:01:00.0: Disabling ASPM  L1

e1000e 0000:01:00.0: (unregistered net_device): Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode

e1000e 0000:01:00.0: irq 45 for MSI/MSI-X

e1000e 0000:01:00.0: eth0: (PCI Express:2.5GT/s:Width x4) 00:15:17:cd:19:14

e1000e 0000:01:00.0: eth0: Intel(R) PRO/1000 Network Connection

e1000e 0000:01:00.0: eth0: MAC: 0, PHY: 4, PBA No: C57721-005

e1000e 0000:01:00.1: Disabling ASPM  L1

e1000e 0000:01:00.1: (unregistered net_device): Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode

e1000e 0000:01:00.1: irq 46 for MSI/MSI-X

e1000e 0000:01:00.1: eth0: (PCI Express:2.5GT/s:Width x4) 00:15:17:cd:19:15

e1000e 0000:01:00.1: eth0: Intel(R) PRO/1000 Network Connection

e1000e 0000:01:00.1: eth0: MAC: 0, PHY: 4, PBA No: C57721-005

e1000e 0000:01:00.0: irq 45 for MSI/MSI-X

e1000e 0000:01:00.0: irq 45 for MSI/MSI-X

e1000e 0000:01:00.0: eth2: Jumbo frames cannot be enabled when both receive checksum offload and receive hashing are enabled.  Disable one of the receive offload features before enabling jumbos.

e1000e: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None

e1000e 0000:01:00.1: irq 46 for MSI/MSI-X

e1000e 0000:01:00.1: irq 46 for MSI/MSI-X

e1000e 0000:01:00.1: eth3: Jumbo frames cannot be enabled when both receive checksum offload and receive hashing are enabled.  Disable one of the receive offload features before enabling jumbos.

e1000e: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None

e1000e 0000:01:00.0: irq 45 for MSI/MSI-X

e1000e 0000:01:00.0: irq 45 for MSI/MSI-X

e1000e 0000:01:00.1: irq 46 for MSI/MSI-X

e1000e 0000:01:00.1: irq 46 for MSI/MSI-X

e1000e 0000:01:00.0: eth2: Jumbo frames cannot be enabled when both receive checksum offload and receive hashing are enabled.  Disable one of the receive offload features before enabling jumbos.

e1000e: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None

e1000e: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None

```

Now if they would just fix the bug with the Intel driver, and jumbo packets.  :Smile: 

----------

## gerdesj

It might be worth trying a different dhcp client.  I find dhcpcd the best out of the lot.  Is eth3 a typo because it seems from your config that the bond is made up of eth2 and 3 so eth2/3 should not have their own addresses.

If you are using Dell Powerconnects then make sure that you are using the right hashing method - IP or MAC or both, I think they call it Layer2/3 (MAC=Layer 2, IP = Layer3).  Make sure your Gentoo and switch agree!  If you have a 28xx series, I don't think those can do LACP so it may not function properly at all.

Cheers

Jon

----------

## JeffBlair

Jon,

  I do have dhcpcd installed already.

  And, nope, that eth3 isn't a typo. For some reason when my system starts I see a message that udevd renamed eth0 to eth3... ok, I just looked at the logs again, and it's renaming eth0 to both eth2 and eth3. Odd to say the least.

```

[    7.265917] udevd[1123]: renamed network interface eth0 to eth2

[    7.417154] udevd[1123]: renamed network interface eth0 to eth3

```

So, I just made a symlink to eth0 and eth1(I didn't have them in there), and it's now renaming eth0 to eth2, and eth1 to eth3

```

[    7.537409] udevd[1124]: renamed network interface eth0 to eth2

[    7.544176] udevd[1123]: renamed network interface eth1 to eth3

```

Yeah, I'm not sure why they're grabbing an IP address. I'm posting my current config file again to see if you see any mistakes that I made.

I've got the PowerConnect 2716. It says that it does LAG, but this is the first time I'm really putting it to the test.

```

/etc/conf.d/net:

dns_domain_=(

        "jeffhome.us"

)

routes_=(

  "default via 192.168.0.1"

)

conifg_eth2=( "none" )

conifg_eth3=( "none" )

preup() {

      # Adjusting the bonding mode / MII monitor

      # Possible modes are : 0, 1, 2, 3, 4, 5, 6,

      #     OR

      #   balance-rr, active-backup, balance-xor, broadcast,

      #   802.3ad, balance-tlb, balance-alb

      # MII monitor time interval typically: 100 milliseconds

      if [[ ${IFACE} == "bond0" ]] ; then

#              BOND_MODE="balance-alb"

              BOND_MODE="4"

              BOND_MIIMON="100"

              echo ${BOND_MODE} >/sys/class/net/bond0/bonding/mode

              echo ${BOND_MIIMON}  >/sys/class/net/bond0/bonding/miimon

              einfo "Bonding mode is set to ${BOND_MODE} on ${IFACE}"

              einfo "MII monitor interval is set to ${BOND_MIIMON} ms on ${IFACE}"

      else

              einfo "Doing nothing on ${IFACE}"

      fi

      return 0

 }

slaves_bond0="eth2 eth3"

config_bond0="192.168.0.15/24"

dns_servers_bond0="68.94.156.1 8.8.8.8"

depend_bond0() {

        need net.eth2 net.eth3

}

```

Thanks for all your help,

  Jeff

----------

## gerdesj

Have a look in /etc/udev/rules.d for 70-persistent-net.rules.  In there will be the rules that cause your network devices to be renamed.  You can edit that file as you like.  I imagine this box has had several NICs in it. I suggest clear that file and reboot.  Then edit ./conf.d/net to reflect eth0 and 1.  I also think you set config_eth0="null"

Ahh, just noticed the syntax you are using, get rid of the brackets (parenthesis):  

```

config_eth0="null"

config_eth1="null"

```

Check this doc out (your version may be different) or if you are not using OpenRC then /etc/conf.d/net.example:

```

$less /usr/share/doc/openrc-0.9.9.2/net.example.bz2

```

The switch will probably only do static LAG without LACP.  Once you are up and running test it well. This  http://en.wikipedia.org/wiki/Link_aggregation#Link_Aggregation_Control_Protocol explains it quite well.

Cheers

Jon

----------

## gerdesj

Just had a look at a Powerconnect 2824.  To set the load balance mode go to Switch -> Ports -> LAG Configuration.  There are three options Layer 2 layer 3 and Layer 2/3.  

I've done a quick hunt around and I think you will want layer 2/3.  It looks like the 802.1ad Linux driver uses both MAC and IP in its "Transmit Hash Policy".  However it also seems it will want LACP enabled at the other end.  You may have to use one of the other modes on the Gentoo box or get a fancier switch but they are a bit expensive.  However you do get what you pay for and I have recently seen an entire small business network apparently improve across the board when I replaced their "core" 48 port Gigabit Netgear with a Dell PC 7xxx.  Even though everything is still using 1GB or 100MB connections it all seemed to move rather better afterwards.  There are plenty of other factors and with Dells I highly recommend you keep the firmware up to date.  I've fixed some funny problems this way on several occasions in the last year alone like this. 

You must be really careful when reading other people's take on LAG/LACP etc indeed anything to do with networking (jumbo frames and 802.1Q VLANs are other topics for howlers).  Quite often you will see incorrect "recommendations".

Cheers

Jon

----------

