# Intel Pro 1000, Linux, and Advanced Networking Services

## sparks

I just purchased an Intel Pro 1000 MT Dual Port Server Adapter.  After looking at Intels webpage it looks like their ANS software which allows for NIC teaming in order to achieve load balancing only works with 2.4 series kernels.  It would be a disappointment if load balancing is not an option with a 2.6 kernel.  Can anyone offer any first hand experience or suggestions about teaming Intel NIC's with a 2.6 kernel?

----------

## sparks

anyone have any thoughts on teaming NIC's in general?  is it possible with 2.6?

----------

## tkdfighter

What you're looking for is called "bonding". It you can enable support for it in the kernel. I don't really have any first-hand experience configuring NICs to bond, although I've used computers which had their NICs bonded. So I did a quick Google search. I know that I found a how-to on tldp a month or two ago, but I can't find it anymore. Oh well.

What are you setting up that needs this much bandwidth?

----------

## tgh

Aye, look for bonding.  Supposedly which works very well in Linux with multiple modes (although I'm still working on figuring out where I went wrong setting it up on my system).

/usr/src/linux/Documentation/networking/bonding.txt

```
mode

        Specifies one of the bonding policies. The default is

        balance-rr (round robin).  Possible values are:

        balance-rr or 0

                Round-robin policy: Transmit packets in sequential

                order from the first available slave through the

                last.  This mode provides load balancing and fault

                tolerance.

        active-backup or 1

                Active-backup policy: Only one slave in the bond is

                active.  A different slave becomes active if, and only

                if, the active slave fails.  The bond's MAC address is

                externally visible on only one port (network adapter)

                to avoid confusing the switch.

                In bonding version 2.6.2 or later, when a failover

                occurs in active-backup mode, bonding will issue one

                or more gratuitous ARPs on the newly active slave.

                One gratutious ARP is issued for the bonding master

                interface and each VLAN interfaces configured above

                it, provided that the interface has at least one IP

                address configured.  Gratuitous ARPs issued for VLAN

                interfaces are tagged with the appropriate VLAN id.

                This mode provides fault tolerance.  The primary

                option, documented below, affects the behavior of this

                mode.

        balance-xor or 2

                XOR policy: Transmit based on the selected transmit

                hash policy.  The default policy is a simple [(source

                MAC address XOR'd with destination MAC address) modulo

                slave count].  Alternate transmit policies may be

                selected via the xmit_hash_policy option, described

                below.

                This mode provides load balancing and fault tolerance.

        broadcast or 3

                Broadcast policy: transmits everything on all slave

                interfaces.  This mode provides fault tolerance.

        802.3ad or 4

                IEEE 802.3ad Dynamic link aggregation.  Creates

                aggregation groups that share the same speed and

                duplex settings.  Utilizes all slaves in the active

                aggregator according to the 802.3ad specification.

                Slave selection for outgoing traffic is done according

                to the transmit hash policy, which may be changed from

                the default simple XOR policy via the xmit_hash_policy

                option, documented below.  Note that not all transmit

                policies may be 802.3ad compliant, particularly in

                regards to the packet mis-ordering requirements of

                section 43.2.4 of the 802.3ad standard.  Differing

                peer implementations will have varying tolerances for

                noncompliance.

                Prerequisites:

                1. Ethtool support in the base drivers for retrieving

                the speed and duplex of each slave.

                2. A switch that supports IEEE 802.3ad Dynamic link

                aggregation.

                Most switches will require some type of configuration

                to enable 802.3ad mode.

        balance-tlb or 5

                Adaptive transmit load balancing: channel bonding that

                does not require any special switch support.  The

                outgoing traffic is distributed according to the

                current load (computed relative to the speed) on each

                slave.  Incoming traffic is received by the current

                slave.  If the receiving slave fails, another slave

                takes over the MAC address of the failed receiving

                slave.

                Prerequisite:

                Ethtool support in the base drivers for retrieving the

                speed of each slave.

        balance-alb or 6

                Adaptive load balancing: includes balance-tlb plus

                receive load balancing (rlb) for IPV4 traffic, and

                does not require any special switch support.  The

                receive load balancing is achieved by ARP negotiation.

                The bonding driver intercepts the ARP Replies sent by

                the local system on their way out and overwrites the

                source hardware address with the unique hardware

                address of one of the slaves in the bond such that

                different peers use different hardware addresses for

                the server.

                Receive traffic from connections created by the server

                is also balanced.  When the local system sends an ARP

                Request the bonding driver copies and saves the peer's

                IP information from the ARP packet.  When the ARP

                Reply arrives from the peer, its hardware address is

                retrieved and the bonding driver initiates an ARP

                reply to this peer assigning it to one of the slaves

                in the bond.  A problematic outcome of using ARP

                negotiation for balancing is that each time that an

                ARP request is broadcast it uses the hardware address

                of the bond.  Hence, peers learn the hardware address

                of the bond and the balancing of receive traffic

                collapses to the current slave.  This is handled by

                sending updates (ARP Replies) to all the peers with

                their individually assigned hardware address such that

                the traffic is redistributed.  Receive traffic is also

                redistributed when a new slave is added to the bond

                and when an inactive slave is re-activated.  The

                receive load is distributed sequentially (round robin)

                among the group of highest speed slaves in the bond.

                When a link is reconnected or a new slave joins the

                bond the receive traffic is redistributed among all

                active slaves in the bond by initiating ARP Replies

                with the selected mac address to each of the

                clients. The updelay parameter (detailed below) must

                be set to a value equal or greater than the switch's

                forwarding delay so that the ARP Replies sent to the

                peers will not be blocked by the switch.

                Prerequisites:

                1. Ethtool support in the base drivers for retrieving

                the speed of each slave.

                2. Base driver support for setting the hardware

                address of a device while it is open.  This is

                required so that there will always be one slave in the

                team using the bond hardware address (the

                curr_active_slave) while having a unique hardware

                address for each slave in the bond.  If the

                curr_active_slave fails its hardware address is

                swapped with the new curr_active_slave that was

                chosen.
```

----------

## sparks

 *tkdfighter wrote:*   

> What you're looking for is called "bonding". It you can enable support for it in the kernel. I don't really have any first-hand experience configuring NICs to bond, although I've used computers which had their NICs bonded. So I did a quick Google search. I know that I found a how-to on tldp a month or two ago, but I can't find it anymore. Oh well.
> 
> What are you setting up that needs this much bandwidth?

 

Thanks for pointing me in the correct direction.  I will look into setting this up on my server.  I am setting up a staging farm for the company I work for.  We will constantly be pulling large images to multiple computers, fast hard drives and lots of network bandwith are what I'm aiming for.  Thank again for the information.

----------

## tgh

Any luck?  I'm having limited success with my pair of Intel Pro/1000 dual-port PCIe cards.

I think I'll be spending some time tonight reading up on the bonding HOWTOs again and Linux networking... it looks like it should be working, but isn't.

----------

## tgh

Okay, I have bonding working with the onboard NICs on my M32N-SLI Deluxe motherboard but not with the Intel Pro/1000 NICs.  Still, it's a start, right?  "atop" comes in handy when troubleshooting and testing because it shows you the NICs and the bonded NIC device along with the number of packets/period.  For instance, I can see that all of the inbound packets are on eth5 and that eth6 is mostly quiet, but outbound packets are balanced round-robin style across eth5+eth6.  

(Which means that even with mode=0, inbound traffic from a single source cannot exceed 1Gbps?)

"atop" is also good for showing you what happens when you pull one of the cables.  You'll see the inbound traffic shuffle over to the second NIC in the series.  Then as you swap the cables again (keeping one connected at all times) traffic will move from NIC to NIC as needed.

I have the SMCGS16 switch setup with ports 4+5 trunked together, which I think was needed to work with mode=0 (round-robin).  I may experiment with modes 5 and 6, which don't require switch support.  The balance-alb (6) mode looks like what I'm really looking for, but I need to think about it some more.

Docs: /usr/src/linux/Documentation/networking/bonding.txt

Now for the gory details, as I understand them, but there's still one issue that I'm attempting to solve.  I've listed these steps in roughly the assurance level that I have for them being correct.  (i.e. I'm almost positive that the first steps are correct, things get a bit hazy towards the end)

1) set CONFIG_BONDING=m (Device Driver -> Networking Support -> Bonding driver support), recompile your kernel and install the new kernel in grub/lilo.  You should do step #2 before the reboot, that will allow the bonding module to load with its default arguments.

2) Configure /etc/modules.autoload.d/kernel-2.6

```
# /etc/modules.autoload.d/kernel-2.6:  kernel modules to load when system boots.

#

# Note that this file is for 2.6 kernels.

#

bonding miimon=100 mode=0
```

Not sure if the options here override options later on, or if options here are considered to be default options for all bonded NIC sets.  Or even whether the "option" lines in the modprobe.conf file are looked at by the bonding module?

If you rebooted before editing this file (or for testing), you can use modprobe to load (or modprobe -r to unload) the bonding module.

3) Setup your links to the /etc/init.d/net.lo file (note that I am only showing examples for bond0=eth5+eth6, but I actually have room for 3 bonds of 2 NICs each that will be configured later)

```
# ln -sf /etc/init.d/net.lo /etc/init.d/net.bond0

# ln -sf /etc/init.d/net.lo /etc/init.d/net.eth5

# ln -sf /etc/init.d/net.lo /etc/init.d/net.eth6

# ls -l net*

lrwxrwxrwx  1 root root    18 Sep 11 19:10 net.bond0 -> /etc/init.d/net.lo

lrwxrwxrwx  1 root root    18 Sep 11 19:10 net.bond1 -> /etc/init.d/net.lo

lrwxrwxrwx  1 root root    18 Sep 11 19:10 net.bond2 -> /etc/init.d/net.lo

lrwxrwxrwx  1 root root     6 Aug 25 18:09 net.eth0 -> net.lo

lrwxrwxrwx  1 root root     6 Aug 26 01:22 net.eth1 -> net.lo

lrwxrwxrwx  1 root root     6 Aug 26 01:22 net.eth2 -> net.lo

lrwxrwxrwx  1 root root     6 Aug 26 01:22 net.eth3 -> net.lo

lrwxrwxrwx  1 root root     6 Aug 26 01:22 net.eth4 -> net.lo

lrwxrwxrwx  1 root root     6 Aug 26 01:22 net.eth5 -> net.lo

lrwxrwxrwx  1 root root     6 Aug 26 01:22 net.eth6 -> net.lo

lrwxrwxrwx  1 root root     6 Aug 26 00:46 net.eth7 -> net.lo

-rwxr-xr-x  1 root root 24317 Feb  9  2006 net.lo

-rwxr-xr-x  1 root root  3055 Feb  9  2006 netmount
```

All pretty simple, everything simply points at net.lo via a link.

4) Add net.bond0 to the startup (no need to add the slaves).  If your slave NICs (eth#) are set to automatically start, delete them using "rc-update del net.eth5" and "rc-update del net.et6".

```
rc-update add net.bond0 default
```

That makes sure that the bond comes up at restart.  It also allows you to say "/etc/init.d/net.bond0 stop|restart|status|start".

5) Emerge the following tools.  The first one is required (ifenslave), the other two are optional (except if you are doing mode=5 or mode=6 in which case you need "ethtool" as well).

```
emerge ifenslave

emerge ethtool

emerge atop
```

"atop" is optional, but useful.  None of the packages require configuration after being installed.

6) Edit your /etc/modules.d/bond file

```
alias bond0 bonding
```

Make sure to run modules-update after editing this file.

Note that the file shows that you can supposedly add "option" lines to pass options to the bonding module.  But I'm not exactly sure if that works.  I'm also not 100% sure how to configure multiple bonded sets in this file (it should simply be "alias bond# bonding" for each additional one).  I'm getting an error that "bond1 does not exist" when I attempt to start it later on.

You may need to do this step before being able to "modprobe bonding".

Note #2: Gentoo allows us to edit files in the /etc/modules.d tree and use modules-update to create both modules.conf and modprobe.conf on-the-fly.  For other Linux systems you'd probably need to edit modprobe.conf (for 2.6 kernels) directly, or figure out how they want you to set things up.

7) Finally, we edit /etc/conf.d/net

```
slaves_bond0="eth5 eth6"

config_bond0=( "192.168.142.110 netmask 255.255.255.0" )

routes_bond0=( "default gw 192.168.142.1" )
```

Note, in my configuration, I use a different interface (eth4) as my "default" gateway, so the 3rd line is commented out.  In my case, eth4 is my "management" NIC that I connect directly to the workstation LAN while my bonded NICs are in their own VLAN or on a dedicated switch for SAN traffic.  For testing purposes, I have everything on the same switch right now, not even VLAN'd.

8) You should now be able to start the bond.  It should either start or give you errors.

```
/etc/init.d/net.bond0 start
```

...

According to "svn status" in my /etc folder, I've listed all of the files that I touched while setting up my bond0.  (A good reason to keep /etc in subversion.)

Now I have to go figure out why the Intel NICs are cooperating like the onboard Broadcom/Marvell NICs.

I also have to figure out why bond1 and bond2 aren't working.  It could simply be that there are problems with the Intel NICs (even though they show up in the "ifconfig -a" listing as eth0-eth3).

Edited: To clean up the display and make it pretty...

----------

## tgh

Well, with mode=0, I'm seeing about 11MB/s maximum from my single-NIC gigabit workstation to the SAN unit over iSCSI.  Average is more like 8-9MB/s (the disks inside are capable of 30-80MB/s since it's a 4-disk RAID10 set).  Reading from the SAN gives me 22MB/s and the packets are spread across both NICs (disks are at 50-60% with this sequential read going on, more spindles would've been good).  

Which sounds slow, except I don't have jumbo frames turned on yet (or the traffic shuffled into its own VLAN yet to allow for that without affecting other machines on the switch).  

Pulling the plug changes my throughput from 17-25MB/s outbound from the SAN to... 17-25MB/s (since I haven't saturated a single NIC yet).

----------

## tgh

http://sourceforge.net/projects/e1000

http://downloadmirror.intel.com/df-support/9180/ENG/ldistrib.txt

http://downloadmirror.intel.com/df-support/9180/ENG/README.txt

- I see a note that there may be problems with 2.6.11 - 2.6.15 when combined with UP/SMP kernel along with the "noapic" kernel option.  This may apply in my case as I have to currently use "noapic" to get the machine to boot.

- Intel's driver readme says "This driver is only supported as a loadable module at this time." (which applies to the e1000 module).  So I may switch back to marking the e1000 option in the kernel with "M" instead of "*".

----------

## tgh

Hmmm, changing e1000 to load as a module instead of built-in had the following effects (remember to add it to your /etc/modules.autoload.d/kernel-2.6 file).

- All of my ethernet devices renumbered themselves.  Which is fine, that puts my 3c509 at eth0 now instead of eth4, and the onboard NICs are now eth1/eth2 with the e1000 NICs at eth3 to eth6.  That's actually more straightforward for me (grin).

- Bonding automatically worked when I rebooted, even though it picked the wrong pair of adapters due to the renumbering.  Even though I don't have those 2 ports trunked on the switch.  Whether this means the switch automatically configured the bonding or if this switch doesn't care, I'm not entirely sure.

- Both interfaces show inbound traffic packets.

----------

## sparks

I still have not had any luck.  I am also bridging my eth0 with tap0 to make br0 for Openvpn.  This complicates things a bit.  I wonder if you can Bond a Bridged port?  My /etc/conf.d/net file is below.  

```

config_eth0=( "null" )

config_tap0=( "0.0.0.0 promisc" )

bridge_br0=( "eth0 tap0" )

config_br0=( "192.168.1.15/24" )

brctl_br0=( "stp off" )

depend_br0() {

        need net.eth0 net.tap0

}

```

----------

## etomsch

I did bonding on different servers with a 3-Interface configuration but since the latest updates, /etc/init.d/net.bond0 seems not working any more. /etc/conf.d/net is as described in /etc/conf.d/net.example but the init-script isn't doing anything.

If I do all the bonding stuff (ifenslave, ifconfig, ...) by hand it works as expected.

Any suggestions?

Thanks!

----------

## tgh

Should be error messages that will point you in the proper direction.  Also look at the files mentioned in my earlier posting.  Once I got the e1000 loaded as a module instead of built-in, things worked very easily and as expected.

You also didn't specify what you updated.  The kernel?  Something else?

(When working with complicated configurations, always put your /etc under version control or at least have a daily rsnapshot / dirvish / rsync local backup using hard links so that you can go back to any point in time.  Then there's no need to guess about what was changed on purpose or by accident.)

----------

## etomsch

 *tgh wrote:*   

> Should be error messages that will point you in the proper direction.  Also look at the files mentioned in my earlier posting.  Once I got the e1000 loaded as a module instead of built-in, things worked very easily and as expected.

 

That's one of the main proglems there aren't any error-messages. The init-script isn't outputting anything either.

 *tgh wrote:*   

> You also didn't specify what you updated.  The kernel?  Something else?

 

I didn't change any files in /etc except the /etc/init.d/net.lo init-script by the usual Gentoo-update. I can't reliably say, what packages got updated, because I did an emptytree world after upgrading to gcc-4.1.1 and glibc-2.4. I'm not sure, what update caused these problems because this machine ran for about 150days without reboot until yesterday (server-systems I usually don't reboot that often.  :Wink: )

I get from your post that you don't have any problems with the init-script when using bonding, so I'll go into my configuration a little deeper again, and again, and... until I find something.

Thanks for your hints.

----------

## tgh

Date on my /etc/init.d/net.lo is Feb 2006...  I haven't tried bonding on the new system that has a newer Sep 2006 date for net.lo.

Sometimes etc-update auto-merges stuff in /etc, which always makes me nervous (and is why I'm a bit paranoid about using subversion and a hard linked backup).

----------

## tgh

Note: (not sure if I mentioned this)

If you have multiple NICs and multiple bonds, your /etc/modules.d/bond file will look like:

```
# read /usr/src/linux/Documentation/networking/bonding.txt for help!

alias bond0 bonding

alias bond1 bonding

alias bond2 bonding

#options bond0 mode=0 miimon=100

#alias bond1 bonding

#options bond1 -o bonding1 arp_interval=200 arp_ip_target=10.0.0.1

# Parameters:

# max_bonds int, description "Max number of bonded devices"

# miimon int, description "Link check interval in milliseconds"

# use_carrier int, description "Use netif_carrier_ok (vs MII ioctls) in miimon; 0 for off, 1 for on (default)"

# mode string, description "Mode of operation : 0 for round robin, 1 for active-backup, 2 for xor"

# arp_interval int, description "arp interval in milliseconds"

# arp_ip_target string array (min = 1, max = 16), description "arp targets in n.n.n.n form"

# updelay int, description "Delay before considering link up, in milliseconds"

# downdelay int, description "Delay before considering link down, in milliseconds"

# primary string, description "Primary network device to use"

# multicast string, description "Mode for multicast support : 0 for none, 1 for active slave, 2 for all slaves (default)"

# lacp_rate string, description "LACPDU tx rate to request from 802.3ad partner (slow/fast)"
```

I have the option lines commented out because I'm setting the option in /etc/modules.autoload.d/kernel-2.6:

```
e1000

bonding miimon=100 mode=0
```

I guess that the option lines in the /etc/modules.d/bond could be used to override the mode on an individual bond basis.  Otherwise you can specify it for the overall system in kernel-2.6 and not worry about it in modules.d/bond.

----------

## tgh

Follow-up note:

If you are trying to setup multiple bonds (for instance, a 2-interface firewall), then you need to pass the "max_bonds" parameter to the bonding module.  Otherwise you'll get the error message:

```
* Starting bond1

*   network interface bond1 does not exist

*   Please verify hardware or kernel module (driver)
```

See the file: /etc/modules.autoload.d/kernel-2.6

```
# old line (only supports a single bond)

# bonding miimon=100 mode=0

# new line (supports 2 bonds)

bonding miimon=100 mode=0 max_bonds=2
```

You may need to reboot to get this to function correctly.

----------

