# Best bang for the buck:  Fast networking.

## 1clue

Hi.

I've been trying to figure out how to get better than 1 gigabit ethernet speeds, and this bottleneck is holding up my new network and hardware purchases.

My current setup has significant network bottlenecks.  I'm trying to figure out how best to deal with that.

10GbE cards are actually affordable now, but there are problems:

Throughput is only around 1 gbps, according to widespread benchmarking.

You can't really effectively go peer-to-peer:  You need a switch.

Switches cost a fortune, and in the end you don't get much better than regular 1gbe.

So I guess I'm after some advice from people with real experience like this.  Here's the sort of thing I can think of:

Get 4x ethernet cards* and bond them together as one card, hooked up to a decent quality switch.

Get 4x ethernet cards and set them up as separate interfaces and hope TCP/IP load balances them, hooked up to a decent quality switch.

Get 4x ethernet cards and set them up as separate interfaces, host to host in many configurations to get shortest-path benefits.

* 4x 1gbe ethernet cards means a single PCI express card with 4 network ports on it.

My application is this:

2 or 3 QEMU hosts

1 or more NAS machines

A few discrete machines which are not related to QEMU but which need at least 1gbe performance.

VM guests on any host are not all on the same network.

I need to be able to migrate VMs from host to host without changing network configuration.

----------

## vaxbrat

I'd say you are trying to cross the steep gap from commodity hardware to the much smaller (and pricier) high performance club.  Have you looked at Infiniband versus 10GB?  It all boils down to performance in the switches (and sometimes hardware and firmware revs in them).  I've seen Mellanox cards quadruple in speed just because a QLogic switch needed a firmware upgrade.

Then again, this is all at work where somebody else has gone out and bought the hardware.  I have yet to try pricing and cobbling together something for my home network.

I've never tried the bonding driver so don't know what to tell you there.  You might get a win separating the NAS traffic out to its own dedicated lan and switch.

----------

## 1clue

 *vaxbrat wrote:*   

> I'd say you are trying to cross the steep gap from commodity hardware to the much smaller (and pricier) high performance club.

 

Exactly.

 *Quote:*   

> 
> 
> Have you looked at Infiniband versus 10GB?  It all boils down to performance in the switches (and sometimes hardware and firmware revs in them).  I've seen Mellanox cards quadruple in speed just because a QLogic switch needed a firmware upgrade.
> 
> Then again, this is all at work where somebody else has gone out and bought the hardware.  I have yet to try pricing and cobbling together something for my home network.
> ...

 

I have not tried infiniband, I just started looking at 10GbE and looking at actual real-world performance figures, and realized that it's almost no upgrade at all, all you do is spend money and get virtually nothing back.  This thread is to give me ideas about what to search on, if nothing else.  I've looked at fiber channel, it's way out of the budget.

I can spend USD $250 or maybe a bit more on each server for network performance, and maybe get a switch for $1k or so, fudge that all around a bit, but the 4-digit price tag for a fiber channel card is out of the question.

 *Quote:*   

> You might get a win separating the NAS traffic out to its own dedicated lan and switch.

 

Now this is the sort of thing I'm talking about.  I haven't bought anything yet, maybe it would be worthwhile building my own NAS, put some Open Source NAS package on it and have a nic or two to each virtualization host.  I just don't know.

Maybe I'm over-thinking this, but I know my network speed is the limiting factor on my existing setup.  I know that prices for what I've seen so far in no way justify the cost to me, and I can see why the world is still stuck at 1gbps because not even the enterprise guys seem to be interested in the price/performance for 10GbE.

I'll look into the Infiniband and Mellanox cards.  It will at least give me an idea what else is out there.

Thanks.

----------

## 1clue

Ugh.

I could see spending $500-$600 for the VM hosts and NAS if it actually got me faster-than-sata3 transfer speeds, but those switches would kill me.

I had to deal with Cisco for networking gear at one point, with their required support contracts.  I can smell the money pouring out through this thing even after it's paid for.

Fiber channel seems to be one of the few technologies that actually deliver high transfer rates.  It's out of my reach because of those switches.

Thanks though.

----------

## vaxbrat

If your bottleneck is disk i/o going out through the LAN to a NAS you might consider bringing the hot files in closer with maybe one of these two schemes:

Turn on the CONFIG_BCACHE in the kernel and putting the cache on an SSD or larger local spinning rust drive.  Your system drive is already on an SSD I hope   :Very Happy: 

If your total storage needs are somewhat sensible to where you can fit everything into 2 or 3 local multi-terabyte drives, consider rolling out ceph on top of btrfs.  You can then tweak ceph to keep a copy of everything on each local drive sets while managing to push the changes around in the background.

One final thing to think about is too turn on jumbo frames on the NIC's if you haven't already.  That can be tricky sometimes depending on the level of stupidity in your switches.

ceph is one of those things I intend to look into in my copious spare time.  I've already had the taste of working with lustre-fs in a high performance cluster, but that shit is just way too brittle when it comes to installation.

----------

## 1clue

Each VM host has an ssd and at least one "spinning rust" drive -- I'm gonna shamelessly plagiarize that.  I'm contemplating non-RAID on the VM hosts since that slows things down so much.

The NAS will have an SSD for booting and such, or maybe a high quality USB3 stick.  It will also have a slot-load sata3 hard drive if I can find the enclosure, for backups.

I'm thinking about a network boot type setup with the image stored locally, the network just has a checksum to see if there's a new image.  So in that sense there will be one central repository for images which also has a backup facility, and then possibly a traditional NAS.  I haven't decided on that yet, I need to have a way to cut down on network activity.  But my network activity isn't constant, it's just that at high load moments when I need a bunch more bandwidth.

Most of the VMs will be relatively tiny.  The big stuff is either special purpose (database for example) or some sort of file server.  Which might be the traditional NAS.

----------

## vaxbrat

If your network bottleneck is the initial pull of the VM image, jumbo frames will help if you don't already have it turned on and can manage to do it.  Ceph has the ability to replicate objects across the pool of stores (btrfs filesystems for example) that is given to it.  If you tweak it to do n-way replication across a pool of n object stores, it will automatically push around a changed VM image to each local host's store.  Because it is COW savvy, you could probably take snapshots and cause ceph to only copy around the changed extents in the VM instead of the entire image file or all of the extents in the sparse filesystem.

There may be a way to do that by hand with just btrfs.  There was a recent writeup out there by one of the linux journal authors on btrfs that covers incremental replication of a btrfs filesystem to a remote host using "btrfs send".

----------

## 1clue

My current setup does not do the network boot.  Network load at boot doesn't really bother me, and if it's cached on the machine it won't be a big deal anyway.

My network bandwidth issues come during high activity periods between app server and database, and on bulk data transfer from one host to the next.  I haven't tried jumbo frames, my current gear won't take them so I haven't even tried.

I haven't tried ceph, will look into it.  What you're talking about here is interesting, but I have a lot of reading before I try that, no sense making you type it all when it's in a documentation site somewhere.  I mainly started this thread to get ideas, I don't really know where to look.

Thanks for the ideas, and anyone else has some I'm definitely listening.

----------

## thegeezer

1. jumbo frames. makes a difference. if your switch cant do it there is a good chance your switch fabric can't keep up and its' the switch itself that is slowing things down.

2. go san not nas.   iscsi is faster 

3. have you tried read speed tests on teh storage device to see your actual throughput?

4. there are very many ways to bond in linux. i'd also suggest looking at a switch that understands lacp.

----------

## Mad Merlin

One thing to consider: dual port 10gige cards are commonplace and much cheaper per port than a single port card. If you have 2-3 VM hosts, you can throw a dual port card in each and directly connect them all back to back, skipping the 10gige switch. However, down the road when you want to expand, you'll need more cards or a switch. Though, 10gige switch prices are dropping now, so in a year or two they may be somewhat reasonably priced.

I don't actually have any 10gige equipment yet, but my understanding is that with some tuning, you can get full wire speed out of them without too much effort.

----------

## 1clue

@thegeezer,

I hear you.  None of the new equipment is purchased yet.  I'm trying to not be caught with my pants down by asking around.

san:  A bit out of my league.  This is a one-man shop.  It's for business, but it's coming out of my pocket.

@Mad Merlin,

The problem is I can find zero claims online of anything near wire speed for actual benchmarks.  I want to see it before spending that kind of cash.

Bonding 1gbe cards can get better performance and is probably cheaper in the long run anyway.

For sure the next switch will understand jumbo frames and some sort of bonding that Linux understands.

----------

## Mad Merlin

 *1clue wrote:*   

> @Mad Merlin,
> 
> The problem is I can find zero claims online of anything near wire speed for actual benchmarks.  I want to see it before spending that kind of cash.
> 
> Bonding 1gbe cards can get better performance and is probably cheaper in the long run anyway.
> ...

 

Here's a good article on tuning: http://dak1n1.com/blog/7-performance-tuning-intel-10gbe

Vanilla: 4.7 Gbit/s

Tuned: 9.9 Gbit/s

Even the untuned result is much faster than you'll see out of quad 1gige bonded with LACP (speaking from experience on that one).

Here's a paper from 2009, which is far more in depth, but still manages ~8 Gbit/s out of at least 5 year old hardware: http://landley.net/kdocs/ols/2009/ols2009-pages-169-184.pdf

Also, there have been a number of not particularly recent (2.6.35-2.6.38 era) improvements regarding network throughput by taking advantage of multiple cores, see more here: https://www.kernel.org/doc/Documentation/networking/scaling.txt

----------

## thegeezer

http://scst.sourceforge.net/SCST_Gentoo_HOWTO.txt

lets you do iscsi host

you can have that sitting on a bunch of disks using zfs or lvm or mdadm as you prefer

crucially this is almost guaranteed to be faster than whatever nas you are using

it also means it's more flexible than your existing nas as you can more easily add memory and disks or optimise for speed i.e. put all vm root partitions at the beginning of the spinning disk to ensure speed, or get a couple of ssd's for the DB app.

your existing VM's then connect using iscsi 

ymmv, but if you are looking at doing things on a budget, and your switch can't do simple things like jumbo frames, you might want to consider adding cards to the gentoo iscsi host and have a dedicated connection for iscsi between the vm servers and the data

it's easier to have 3x nic in this than add nics to a nas, and a nas with an arm processor will not as fast when doing raid parity work

----------

