# Stealthy GigE Corruption between Windows & Linux [SOLVED]

## yottabit

Well, it took me some time to figure this one out...

Server runs standard x86 32-bit Gentoo, and has a direct cross-over link from a D-Link GigE card (model DGE-530T with Marvell 88E8003-LKG chipset) using the SysKonnect (skge) drivers to my workstation. For this initial report, the workstation is running 32-bit Windows XP SP2, and is terminating that link with the on-board Marvell Yukon GigE port.

I set the driver in Windows to use 9 kB MTU jumbo frames, and in Linux set the D-Link to use 9k MTU jumbo frames. Windows accesses the data on the server via Samba/CIFS. The problem is that almost all large data transfers are corrupting. Windows copies anything I ask without complaining, but then when I try to use the data (installer, extract an archive, etc.) I always get errors about CRC failures and corruptions!

So I thought, hmmm. Maybe the problem is the stupid Windows driver, so I set the MTUs for both NICs back down to standard 1.5 kB and then via command line told WinRAR to verify integrity of all archives in a certain directory of very large compressed ISOs. It went for a full 5 minutes without any problems, and just when I thought that might be my solution, bam! CRC error.  :Wink:  So next I had the server verify the same archive exhibiting corruption by using RAR for Linux, and it tested fine. Finally, I opened the same archive again with WinRAR, and it tests successful. Nice, eh?

So, party people, my next step is to switch my Windows system to the 3C905B-TX NIC interfacing to the same D-Link GigE NIC in the server and see if the problem persists... If this proves the Marvell Yukon Windows driver is junk, I'll boot my KUbuntu partition and run the tests again with the Marvell Yukon Linux driver and see if the same problems occur...

In the mean time, if anyone has any pointers, I'd appreciate it.  :Wink: 

----------

## yottabit

Right, so here's what happened when I tested directly from my 3C905B-TX in Windows to the D-Link/SysKonnect in Linux... 16 GB of data tested, and only 2 CRC errors. That's quite an improvement, but I'm still not happy with it. Why even have a 32-bit CRC check in the Ethernet frame if it isn't catching this?

So for my third trick I'm running the same 16 GB data verification from my 3C905B-TX in Windows, through my 16-port L2 switch, to the nForce2 embedded NIC in Linux. Surprise... no errors!!

Sooo, early theory here then, is that the problem actually lies with the D-Link/Syskonnect NIC in Linux... To be sure, I'm going to now test to the same Linux NIC from my Marvell Yukon NIC in Windows.... (and to think, I was nearly ready to throw another D-Link/SysKonnect NIC into my Windows workstation!)

----------

## yottabit

Well, that last test was successful, too. It's looking bad for the D-Link/SysKonnect...

I'm running kernel 2.6.17 right now, so I'll check the release notes on .18 and .19 and see if they've updated that SysKonnect driver. If not, can anyone recommend a GigE 32-bit PCI NIC for Linux that's actually supported and has a good driver supported and maintained by the manufacturer instead of having to rely on our poor 'nix community hacking together a driver for undocumented hardware with no support from the manufacturer?   :Confused:   Of course it should not cost me a lot of money either...   :Rolling Eyes: 

For my next trick. I'm going to copy all 16 GB to my workstation, and then do an md5sum on the server & the workstation to make sure they really match. And then I'll boot Linux on my workstation and test using the original GigE link that was giving me problems... doubtful it will accomplish much more, but at least I'll know 100% for sure then that the Windows driver on the workstation side had absolutely nothing to do with it.

----------

## yottabit

md5 sums came out perfect...

All is left is to check with Linux on the workstation for one final sanity measure.

FYI, I read the release notes for Linux kernel 2.6.18, 2.6.19, and 2.6.20 (RCs), and found that there will be an upstream SysKonnect driver update in 2.6.20... guess I'll try it again when 2.6.20 is released!  :Sad: 

----------

## yottabit

So, since my last post, I've attempted the test by using md5sum (it's faster than using RAR since I don't really need to decompress the data, but only test for integrity) on both GigE interfaces on the Windows workstation... using latest drivers for the built-in GigE nVidia nForce PHY, and for the built-in GigE Marvell PHY, and then even added the exact same D-Link DGE-530T NIC, using direct cross-over cable to the Gentoo server with the D-Link DGE-530T (skge driver). All three NICs show corruption in mismatched md5 sums.

Next I tested from the D-Link NIC in the Gentoo server to my laptop running Windows with an Intel PRO/1000PL GigE NIC. Even though Intel says this NIC supports Jumbo Frame, the option to configure it was not in fact present in the Advanced settings for the driver, despite what they say in their documentation, hehe. But even with MTU 1500 there was md5sum corruption present.

Aha, so the problem must be the D-Link NIC in the Gentoo server, eh? One would think... 4 NICs tested to it, from 2 different machines and hardware platforms...

But then I booted my main workstation into Kubuntu and tested from the Marvell to the D-Link on the Gentoo server, and guess what!!! No problems!!!

The problem? Windows. WTF?! I just can't believe Microsoft's TCP/IP stack is that terrible... Next I will test from Windows to Windows and see what happens.

----------

## yottabit

So I tested all three GigE NICs in Windows to the GigE NIC on my laptop running Windows... no problems. I just can't believe that Windows-to-Windows is fine, and Linux-to-Linux is fine, but Linux-to-Windows is not. The only thing I can imagine is that there is some sort of incompatibility between Samba and Windows XP??? But in that case, my 3Com 100-Base-TX NIC in Windows should exhibit the same corruptions... I guess I'll have to test that one 3 or 5 times in a row to make sure it really is stable.

If anyone else has any suggestions, I'd appreciate it. BTW, in Samba I was using the IPTOS_THROUGHPUT flag, and just to make sure, tried the default TCP_NODELAY flag instead, and it didn't make a difference.

----------

## yottabit

For my last test I ran 5 iterations of md5sum across the network from Windows using the stable 3Com NIC running at 100/Full/1500 through my L2 switch. All tests passed just fine.

So then it occurred to me that the problem really seems to be the GigE speed since that's the only commonality between them all so far... So I manually set the direct link from the Windows Marvell Yukon to the Linux D-Link DGE-530T (skge) to 100/Full/1500 and surprise, no problems! I wondered whether I could actually do Jumbo Frame over a 100Mb link since both drivers supported it, so I thought hey, let's try. And wouldn't you know it, the tests over 100/Full/9000 passed with flying colors, too!

So by now I'm enlisting the help of one of my 'nix buddies and he suggests trying a different protocol to see if Samba was somehow the culprit. So I set up an FTP server on the Linux host, and all of the data transfers successfully over 1000/Full/9000, but of course the md5sums fail! Then I thought that perhaps by using SFTP which will do a sort of upper-layer error-checking on its own (through decryption of the packets), I would get some sort of better indication. And guess what! SFTP reports "Invalid MAC Received" literally every 5-10 seconds and bails on the connection!!!! So this is my first validation of a Layer-4 problem that isn't being caught by the operating systems' network stacks!

Now since I've always been transmitting from Linux and receiving in Windows, and I therefore think its Windows' fault for not rejecting the corrupt frames, I'm about to try the opposite transmission direction, from Windows to Linux, and see what happens. I can't wait!  :Smile: 

----------

## yottabit

PROBLEM SOLVED! It turns out that the D-Link DGE530T GigE NIC in the Gentoo server apparently is incapable of generating correct TCP checksums in offloading mode... I used ethtool to turn off the checksum offloading and the problems miraculously disappeared. I have verified 5 times so far with the 16 GB of data via md5sum.

```
hal temp # ethtool -k eth3

Offload parameters for eth3:

Cannot get device tcp segmentation offload settings: Operation not supported

Cannot get device udp large send offload settings: Operation not supported

rx-checksumming: on

tx-checksumming: on

scatter-gather: off

tcp segmentation offload: off

udp fragmentation offload: off

hal temp # ethtool -K eth3 tx off rx off
```

Setting only Tx to off seems to have solved the problem, but I figured I may as well set Rx to off too, just in case. I haven't done extensive testing in that direction so I'd rather be safe.

I still can't figure out logically how this is causing a problem though... if the NIC calculates the wrong checksum for an outgoing packet, my receiving host should reject it... but obviously it isn't, which is why the corrupt data is able to survive the transit. The way I can honestly believe this is happening is for one of these two conditions to be met:

The receiving host's TCP checksum doesn't work--and I've tried offload and no offload--or,

The checksum offloading operation of the transmitting NIC is actually corrupting the data BEFORE the CRC is calculated and added to the packet, therefore the CRC is VALID on the receiving host, but the payload is still corrupt. I wonder if this could have anything to do with MMAP option I'm using to speed up PCI transfers in the kernel????

I have made the skge driver maintainer aware of this issue and he has graciously lent an ear and provided some suggestions. I will notify him of this last update in this post. He may have the technical expertise to figure out whether this is a driver issue, hardware issue, or MMAP issue, where I certainly do not.  :Wink:  I could of course turn off MMAP in my kernel, and I may soon just to rule out that last possibility, but I can't reboot my server right now.

I guess another thing that has me completely baffled is why this only occurs Linux -> Windows, whereas Linux -> Linux doesn't exhibit the problem. Totally bizarre...

In short, if you're using skge in GigE mode (regardless of whether you're using Jumbo Frame), and definitely if you're using the D-Link DGE-530T, I would highly recommend you make use of ethtool in your /etc/conf.d/local.start to disable Rx & Tx checksome offloading.

----------

## Chris_Hird

yottabit

I was really concerned because of problems I was having, I initally had a 100MB card in the linux server running kernel 2.6.18-r2.  I went up to the latest 2.6.19-r5 so I could use a couple of new DGE-530T cards I had purchased.  I had hoped to improved on a transfer I make from an IBM System i5 to the FTP server (pure-ftp) running on the linux box. So I installed the card, built the new kernel, and rebooted into the new kernel.  I changed the MTU to 9000 and tried a transfer!  I saw a best throughput of 2.1 MB! This was much worse than the 7.1MB I was getting from the old 2.6.18-r2 kernel and running over a 100MB card.  I had changed the switch to a new managed switch which allows 1GB links, so I spent hours trying to understand why the transfer rates were so poor? After a number of changes to the switch settings and checking all of the cables etc, I still saw no improvement. So I tried the turning off the rx/tx switches using ethtool.  This stopped the transfers altogether?

I didnt have the ability to set the frame defrag and packet size from the IBM System i5 so I was playing around with other PC's running windows and Mac OS to determine of the problem was the new switch, turns out it was a problem with the kernel I built.

I rebuilt the kernel removing a lot of the options which are set by default and which I had no need for and after a rebuild the system now works with a transfer rate of 17MB.  This is not the 100MB I would have expected but it is a lot better than 2.1MB!  I think the rest is to do with drive speed and CPU capabilities. I only have a 1.4GHz AMD athlon with 1.5GB memory and two old WDC disks with a spin speed of 7200. I did leave the old card in and when I transfer over that card I now get 11.7MB so whatever I removed from the kernel helped that as well?  All I need to do now is find out what the hell I changed!  And keep skinnying it down further to see what I can get!  I dont run this box for anything but FTP, HTTP and Samba.  Obviously has other utils on there such as PHP and other related functionality.

I did try transferring to a dual Zeon processor server and saw rates as high as 27MB before I made the network changes. I hope after the changes it will be better?  I almost went back to the old HS hub and 100MB cards!

Chris...

----------

## yottabit

Hi Chris,

Sounds like you're on the right track. In your kernel make sure MMAP features for the NIC are enabled if possible. This can greatly enhance performance. Also keep in mind that using Jumbo Frames (9k MTU) will help performance mainly because there is less overhead per data transferred, and less overhead generation means less CPU time needed, so this can especially help low-speed processors.

The server I have the GigE link on is an Athlon XP 1800+ with very fast Hitachi SATA hard drives in RAID stripes, and the workstation initiating the xfers is an Athlon64 X2 3800+ with a very fast Hitachi SATA hard drive (single). I get very high disk utilization during these transfers, so I know the 1800+ processor on the server is fast enough to handle a high efficiency on the xfers.

You can't ever expect a 10x increase in speed when going from 100 Mb to GigE.. it just doesn't work that way. Ethernet is not super efficient, and speed depends on so much more than the raw interface speed. You can expect a performance boost of course, but not 10x.

Also, if you're using a layer-2 switch in the middle, this will definitely affect performance because each packet needs examined in order for the switch to send the packet to the appropriate port. And not all switches are equal; some are very nice high-performance types, and others can be very slow and cheap. Keep in mind that the switch must also support Jumbo Frame and be configured (it will not work automatically, and every device on the particular VLAN must be set to use the same frame size as provisioned on the switch because Ethernet frames have no field to define the frame size).

To troubleshoot speeds, try connecting the two computers directly with a cross-over Cat5 or Cat6 cable instead of going through the switch.

Hope this helps!!! And hello to my neighbors up north.  :Smile: 

J

----------

## Chris_Hird

Thanks for the followup! I have purchased a managed switch which was very cheap! But it does make a big difference to my old 10/100MB High Speed Hub from D-Link.  I cant face spending another $1,500 on better switch just to step it up a bit more.  It is a lot better than I was doing before and its only a test environment I play in so no big heartache if it only achieves 2-300% better than before.  I have a big Dual Zeon server with raided Sata Drives and 2GB memory which I was tempted to put a new Linux install on, but its running Windows 2003 SBS and I use it for developing MS CRM (its pretty bad! I kept saying to the client look at Linux and SugarCRM but there you go). Anyhow I will keep playing as I get time and possibly post the results here so its altogether in one place!  You never know I may just rip out that SBS install and see what I end up with.

Let me know how you get on as well, I will review the kernel config again to make sure I have MMAP turned on for the NIC.

Chris...

----------

