# Bonding, KVM and NFS

## PietdeBoer

Hi,

Im setting up a HA cluster using KVM for the hypervisors and a NFS server for shared storage.

Currently i have;

A switch supporting LACP

A NFS server with an intel quad port gbit nic

Two Hyporvisors with an intel dual port gbit nic

I would like to bond the interfaces on the servers to one big and fast interface to get the highest speed as possible from the hypervisors to the NFS shared storage.

Iam confused about what bond mode i should use, i would love fault tolerance but also get the highest speed per connection.

Any hints?

----------

## bbgermany

Hi,

if you switch supports LACP you should have a look at mode 4 or 802.3ad. You will find more information about this in /usr/src/linux/Documentation/networking/bonding.txt

 *bonding.txt wrote:*   

> 
> 
>         802.3ad or 4
> 
>                 IEEE 802.3ad Dynamic link aggregation.  Creates
> ...

 

bb

----------

## PietdeBoer

Thx for your reply,

When i team two gbit network connection using dynamic lacp (mode 4) on two servers on the same switch, will I be able to reach speeds up to 2Gbps with a single file transfer? Or is my speed limited to 1Gbps per session.

----------

## bbgermany

Hi,

i haven't done this with linux up to now, but with windows... on the switch side we have cisco switches, where you create a port-channel. My windows machine shows 2GBit, so it should be double the speed. According to file-copies (in my case it's the backup server), i have full 2GBit.

bb

----------

## PietdeBoer

 *bbgermany wrote:*   

> Hi,
> 
> i haven't done this with linux up to now, but with windows... on the switch side we have cisco switches, where you create a port-channel. My windows machine shows 2GBit, so it should be double the speed. According to file-copies (in my case it's the backup server), i have full 2GBit.
> 
> bb

 

Thx bbgermany,

With that in mind reading the docs again, i think that the balance mode does balance every tcp packet over the available interfaces instead of balancing tcp sessions.

This means that i should get a 2gbit uplink in the case of a dualport gbit trunk and 4gbit uplink in case of a quadport gbit trunk.

Now all that remains are real tests to verify the theory  :Smile: .

Thx!

----------

## thisnickistaken

 *bbgermany wrote:*   

> Hi,
> 
> i haven't done this with linux up to now, but with windows... on the switch side we have cisco switches, where you create a port-channel. My windows machine shows 2GBit, so it should be double the speed. According to file-copies (in my case it's the backup server), i have full 2GBit.
> 
> bb

 

This is not correct.

The only way to see single TCP streams of > 1Gbps on 1Gbps ethernet that I have ever seen work is bonding mode 1 (round-robin). The catch? It will only work directly between two nodes using rr bonding.  Also, that being said, you will not see TCP streams at the full theoretical speed of the bonded link. On a 4x1Gbps link that I have setup between two storage servers for their replication (and heartbeat) link (not running jumbo frames), using iperf I can see about 2.80-3.05 Gbps in a single TCP stream.  The problem here is the race condition caused by sending packets that are "in a sequence" over parallel connections.  You have no guarantee that they will arrive at the destination in the same order.  This causes issues for TCP which preforms poorly when receiving packets out of order.......that being said, ~3Gbps out of a quad-nic is very nice  :Smile: 

The reason this does not work when using a switch:

Switches know 802.3ad (now if anyone can find one that is rr capable PLEASE let me know), in 802.3ad there are a few different ways to decide which port incoming packets are sent to, the basics are usually layer2 or layer3 header hashing (xor). In Layer2 only frames with the same layer2 header are send down a link, in other words links are split by MAC address accessing them.  In layer3 bonding things get a little more interesting, since layer3 includes both ip address and port, it is possible here to have links split by port (I've even seen a TCP stream coming in on one eth and going out on the other!)......this method would provide for 2 servers with multiple links to have a collective bandwidth over more than one TCP stream of approaching the theoretical throughput for the number of links, however; you will still not see a single TCP stream of over 1Gbps.

All that being said, I have seen one other method proposed to do rr through non-rr switches.  I have not personally tested this, but figured I would include it as it seems sound.........and now I plan on testing it  :Smile: 

if you have say, 2 servers and 2 switches, hook each server into each switch with one link, and hook the switches together (assuming you need it for machines on the network with a single link), set your servers to both do rr bonding, now, because each switch has a direct path to your MAC address any packet it receives goes down that path, this means that the sending side is controlling which port the receiving side will see packets on, so if it rr's, it'll pass through the switches because each one is just handing the packet to the port it knows for each server.

Like I stated, I have not actually tested this method, but everything I know about networking tells me it should work.........the obvious problem........want 4 links..........get 4 switches lol.............hmmm.....or maybe VLANs.......

Lastly  :Smile:  and I just couldn't leave this out to put a bug in anyones head that has the resources laying around to test this  :Smile: 

I have considered setting up a box with several pcie2 x4 slots and several 4x1Gbps nics, have each set of 4 nics rr'd together, add all bonds to a bridge, hook other servers in via 4x1Gbps rr bonds to the newly created "rr-bond switch" and test throughput..........I think that would be a fun project.........just would like to see if the cost would justify it in comparison to FC or 10GbE

----------

