# [solved] Rsync stalls while downloading

## grant123

I'm trying to rsync pull a series of archives from one remote system which is connected to a business DSL line to another remote system which is hosted in a data center.  The rsync seems to stall at about the same point every time and I'm sure that both of the remote systems are online throughout the stall.  Should I investigate further or contact the DSL ISP?Last edited by grant123 on Tue May 06, 2014 9:50 pm; edited 1 time in total

----------

## khayyam

 *grant123 wrote:*   

> The rsync seems to stall at about the same point every time and I'm sure that both of the remote systems are online throughout the stall.  Should I investigate further or contact the DSL ISP?

 

grant123 ... is this 'rsync -e ssh' or not, and what version of rsync? I seem to remember a similar report (can't remember where exactly, perhaps here or b.g.o) of stalling. Anyhow, more information required ...

best ... khay

----------

## grant123

It is indeed 'rsync -e ssh' and rsync-3.0.9-r3 on both client and server.

----------

## khayyam

 *grant123 wrote:*   

> It is indeed 'rsync -e ssh' and rsync-3.0.9-r3 on both client and server.

 

grant123 ... sorry, I should have also asked if the 'hpn' useflag is enabled (I'll assume yes as its the default with gentoo). So, the bug I was refering to above (ok, its old, and marked RESOLVED/FIXED but I don't know what your setup is, so it may be of some help) ... seems the solution (or at least something to try) is 'TcpRcvBufPoll  no' in /etc/ssh/sshd_config (on the recieving/server end).

HTH & best ... khay

----------

## grant123

'TcpRcvBufPoll no' doesn't fix it unfortunately.  I tried -hpn on the client and that didn't do it but I'll be able to try -hpn on the server tomorrow and I will report back.

----------

## grant123

I'll be testing with -hpn on the server shortly but I thought I'd mention that I can push to the server without any problem.  The issue only manifests itself when pulling from the server.  I also tested with scp with the same problem.

----------

## grant123

Unfortunately -hpn on both the client and the server do not fix the stall.  Any other ideas?

----------

## grant123

What would be the easiest way to test file transfer without openssh?  I don't currently run a webserver on the backup server.

----------

## khayyam

 *grant123 wrote:*   

> Unfortunately -hpn on both the client and the server do not fix the stall.  Any other ideas?

 

grant123 ... well, other than parsing the logs (particularly if iptables is in use) of server, client, and gateways (if possible) then no, not really, you might try and disable compression on the client end (but I doubt this is the issue) ...

~/.ssh/config

```
Host foo

  Hostname foo.domain.tld

  Compression no
```

 *grant123 wrote:*   

> What would be the easiest way to test file transfer without openssh? I don't currently run a webserver on the backup server.

 

probably 'netcat' (here using 'ncat' from net-analyzer/nmap[ncat] ... though there are (many) other netcat's)

```
# ncat -v -l -p 80 < file
```

... and the recieveing end ...

```
# ncat foo.domain.tld 80 > file
```

HTH & best ... khay

----------

## khayyam

 *khayyam wrote:*   

> [...] you might try and disable compression on the client end (but I doubt this is the issue) ...

 

correction ... that should be "enable" not "disable" as per comment 29 in the above bug.

~/.ssh/config

```
Host foo

  Hostname foo.domain.tld

  Compression yes
```

... sorry for any confusion ... khay

----------

## grant123

Wow, I'm surprised to find that even though rsync consistently fails to pull 50MB without stalling, netcat transferred that amount successfully the first time.  So there's an SSH pull problem (pushing works fine).  I've compiled SSH without HPN on both the client and server and I've enabled compression on the client with -C, but the rsync transfer still stalls.  Where else should I look?

----------

## khayyam

 *grant123 wrote:*   

> Wow, I'm surprised to find that even though rsync consistently fails to pull 50MB without stalling, netcat transferred that amount successfully the first time. So there's an SSH pull problem (pushing works fine).

 

grant123 ... the strange thing is this has all the hallmarks of the HPN issue, you did restart sshd on updating (and logged out and back in so that the ssh session was not using some HPN enabled process ... if say 'ForwardAgent yes' was enabled)? 

 *grant123 wrote:*   

> I've compiled SSH without HPN on both the client and server

 

I should have stated that it probably wasn't necessary to re-emerge with -hpn as hpn can be disabled via "HPNDisabled yes" in /etc/ssh/sshd_config.

 *grant123 wrote:*   

> [...] and I've enabled compression on the client with -C, but the rsync transfer still stalls.  Where else should I look?

 

I would probably try the transfer using '-vvv' (maximum verbose) and see if anything is reported when it stalls, I would also look at any other possible contributing factors (kernel version, firewall, etc) most importantly (as stated above) log files.

best ... khay

----------

## grant123

 *Quote:*   

> the strange thing is this has all the hallmarks of the HPN issue, you did restart sshd on updating (and logged out and back in so that the ssh session was not using some HPN enabled process ... if say 'ForwardAgent yes' was enabled)?

 

Yes, I've restarted the server and client since then and I tested again today with the same result.

 *Quote:*   

> I should have stated that it probably wasn't necessary to re-emerge with -hpn as hpn can be disabled via "HPNDisabled yes" in /etc/ssh/sshd_config.

 

I tried setting HPNDisabled but it was unrecognized which I think confirms that my install does not include HPN.

 *Quote:*   

> I would probably try the transfer using '-vvv' (maximum verbose) and see if anything is reported when it stalls, I would also look at any other possible contributing factors (kernel version, firewall, etc) most importantly (as stated above) log files.

 

Upgraded to 3.14.2 with the same result and -vvv was not enlightening unfortunately.

----------

## khayyam

 *grant123 wrote:*   

> I tried setting HPNDisabled but it was unrecognized which I think confirms that my install does not include HPN.

 

grant123 ... yes, obviously, what I ment was there was actually no need to re-merge with -hpn, it could have been toggled via HPNDisabled.

Anyhow, I wonder if this isn't a MTU/fragmentation problem ... try setting the mtu to 576 as suggested.

best ... khay

----------

## grant123

 *Quote:*   

> I wonder if this isn't a MTU/fragmentation problem ... try setting the mtu to 576 as suggested.

 

You nailed it!  I changed both interfaces to MTU 576 on the server and client and the problem disappeared.  I suspect the root of the problem is that I'm using an AT&T modem/router with the server.  I've had trouble using a proxy server there before and I discovered that the attached modem/router doesn't send ICMP responses.  The solution proposed by AT&T was to put the modem/router into bridged mode but it's remote so I haven't been able to do that.

Could this MTU discovery also point to the modem/router and could putting it into bridged mode solve the problem?  What can I do in the meantime?

Is it strange that netcat works with MTU 1500 and openssh doesn't?

----------

## khayyam

 *grant123 wrote:*   

>  *Quote:*   I wonder if this isn't a MTU/fragmentation problem ... try setting the mtu to 576 as suggested. 
> 
> You nailed it!  I changed both interfaces to MTU 576 on the server and client and the problem disappeared.

 

grant123 ... ok, good.

 *grant123 wrote:*   

> I suspect the root of the problem is that I'm using an AT&T modem/router with the server. I've had trouble using a proxy server there before and I discovered that the attached modem/router doesn't send ICMP responses. The solution proposed by AT&T was to put the modem/router into bridged mode but it's remote so I haven't been able to do that.

 

You could probably put dd-wrt or open-wrt on it (see: list of router or firewall distributions for other possible options) ... but this might violate your terms of service. As for bridging, well, yes if you have a second device behind it (basically, putting it in bridge mode will turn it from a modem/router into a modem).

 *grant123 wrote:*   

> Could this MTU discovery also point to the modem/router and could putting it into bridged mode solve the problem?

 

I can't say for sure if its the router or not, its the most likely culprit however. As for solving the problem, well, yes, if your aching feet can be solved by cutting your legs off :) ... basically, you're just disabling its ability to do routing.

 *grant123 wrote:*   

> What can I do in the meantime?

 

Well, you can set the mtu in /etc/conf.d/net ...

```
mtu_eth0="576"
```

Otherwise, find a better firmware for the device or replace it with something else. Only the other day I found a Comtrend CT-6373 wireless router (4x gigabit ethernet, 1x a/b/g wireless, 8mb flash, 32mb RAM, USB) that's supported by open-wrt, I haven't as yet installed anything on it but once I do it'll be able to do the equivalent of a fairly decent Cisco router/WAP. So, just look around, there is a ton of this kind of HW thrown out every day (as ISPs tend to bundle them with the ADSL package and when some feature is upgraded ... such as 802.11g to 802.11n ... they just get thrown out).

 *grant123 wrote:*   

> Is it strange that netcat works with MTU 1500 and openssh doesn't?

 

ummm ... doesn't that link I provided explain why it effects ssh?

best ... khay

----------

## grant123

 *Quote:*   

> As for bridging, well, yes if you have a second device behind it (basically, putting it in bridge mode will turn it from a modem/router into a modem).

 

I have a Gentoo (laptop) router behind it so I think that's what I'll do.

 *Quote:*   

> doesn't that link I provided explain why it effects ssh?

 

I just read it again but I'm not seeing the answer.  Could you give me a clue?

That's a lot for all your help.

----------

## khayyam

 *grant123 wrote:*   

>  *Quote:*   doesn't that link I provided explain why it effects ssh? 
> 
> I just read it again but I'm not seeing the answer.  Could you give me a clue?

 

grant123 ... I would if I could remember. I didn't happen to read the link I provided, I just remembered the MTU issue with ssh from some years back and that link looked to explain it. As I remember this also effects SSL (perhaps the reason your proxy had issues) and basically it boils down to encrypted protocols being bad at handling packet fragmentation (by design and for good reason) the issue comes about due to the overhead of datagram encapsulation (as the packet passes between routers its added to) so (again, as I remember) as the packet is marked "don't fragment" the increase in size passes above the MTU and it stalls.

Why does this not effect everyone? Well, I'm not sure exactly, I assume its due to some routes being less congested than others, and/or how well the router and first hop are negotiating.

best ... khay

----------

## grant123

Why not just leave the MTU at 576?  Does that reduce bandwidth?

----------

## khayyam

 *grant123 wrote:*   

> Why not just leave the MTU at 576?  Does that reduce bandwidth?

 

grant123 ... a larger MTU is more efficent (larger data payload, smaller transmission packets) so an increase in throughput .... but its not as simple as it sounds (higher does not necessarily equal faster), TCP checks there is no transmission errors, and any packet that fails the check will be re-requested (at the same MTU) ... so if there are errors then there is an increase in the amount sent to aquire the same payload.

1500 is just the maximum allowed, and in most cases there is little reason to set it lower, in your case setting it to 567 avoids the problem, but you might also experience the same with 1400.

best ... khay

----------

## grant123

Got it, thank you for your help.

----------

