# Postfix conversation with .. timed out while sending message

## lostinspace2011

I managed to run my mail server for some time without any issues. In my setup I am using Postfix, Amavisd, Spamassassing and Clamd as described in http://www.gentoo.org/doc/en/mailfilter-guide.xml. However for the past two days I have been struggling with message getting stuck in the outgoing queue.

sendmail -bp shows the following messages for some (most) of my outgoing messages.

 *Quote:*   

> 
> 
> conversation with mx1.bt.mail.yahoo.com[212.82.111.207] timed out while sending message body
> 
> ...
> ...

 

Most of the messages are very small (<10K) and one is larger (>350K), so I don't think this problem is linked to the size of the message.

I have tried to disable amavisd by comenting out the following in /etc/postfix/main.cf

```
#content_filter = smtp-amavis:[localhost]:10024
```

However this did not resolve the problem either. I can see outbound traffic and some of the message seem to still get received at their destination. However the message is still shown in the outgoing messages. 

I have already tried sending messages via telnet directly to the same destination address and this works, so I am lead to rule out a network problem. I also tried using telnet to send via my own server using port 25, 10024 (amavisd) and 100025 (smtp-delivery), however the message ended up getting stuck every time. I also tried this with firewall on and off with the same results and since telnet to the destination address works, I don't think this is a network issue.

Any other suggestions on what I can do to debug and analyse the cause of this problem. 

Thanks in advance

Alex

----------

## Anarcho

There are suggestions on the net to lower the MTU to 1400 or even 1000 and try again.

If this helps, maybe your server doesn't get the ICMP messages for path MTU discovery?

See also: http://www.postfix.org/faq.html#timeouts

----------

## lostinspace2011

I decided to leave this issue for now and do something else for an hour or two. And when I got back all messages were sent. I then re-enabled the amavis integration and tried sending new messages. All went without any issue. Not sure what causes this and I really don't understand why. Maybe it was just some strange network congestion which only affected certain email recipients. Oh well. Still I will keen an eye on this one for a couple of days.

Thanks for your help and the link

Alex

----------

## lostinspace2011

It would appear after some time the problem came back. I haven't been able to figure out exactly what is causing yet. So I am still looking for any other suggestion on what can be done about this.

Could it be that one of the timeout settings in postfix is too small:

postconf |grep timeout

```

smtp_connect_timeout = 30s

smtp_data_done_timeout = 600s

smtp_data_init_timeout = 120s

smtp_data_xfer_timeout = 180s

smtp_helo_timeout = 300s

smtp_mail_timeout = 300s

smtp_quit_timeout = 300s

smtp_rcpt_timeout = 300s

smtp_rset_timeout = 20s

smtp_starttls_timeout = 300s

smtp_tls_session_cache_timeout = 3600s

smtp_xforward_timeout = 300s

smtpd_policy_service_timeout = 100s

smtpd_proxy_timeout = 100s

smtpd_starttls_timeout = 300s

smtpd_timeout = 300s

smtpd_tls_session_cache_timeout = 3600s

```

Thanks in advance

Alex

----------

## lostinspace2011

It would appear that the message are timing out after 5 minutes (300 seconds). 

 *Quote:*   

> 
> 
> messages:Oct 24 11:03:53 bumblebee amavis[5533]: (05533-05) sending SMTP response: "250 2.0.0 Ok, id=05533-05, from MTA([127.0.0.1]:10025): 250 2.0.0 Ok: queued as E1EFB6283CC"
> 
> messages:Oct 24 11:03:53 bumblebee postfix/smtp[8654]: 169A66283C0: to=<REMOVED@iinet.net.au>, relay=localhost[127.0.0.1]:10024, delay=2.9, delays=0.07/0.01/0.02/2.8, dsn=2.0.0, status=sent (250 2.0.0 Ok, id=05533-05, from MTA([127.0.0.1]:10025): 250 2.0.0 Ok: queued as E1EFB6283CC)
> ...

 

However this message is rather small 

 *Quote:*   

> 
> 
> E1EFB6283CC    10631 Sun Oct 24 11:03:53  REMOVED @ sender.xyz (lost connection with as-av.iinet.net.au[203.0.178.180] while sending end of data -- message may be sent more than once)
> 
>                                          REMOVED @ iinet.net.au
> ...

 

I'd say it must be some sort of network issue, but it is affected most of my messages but not all. I managed to send a 18Mb email to a test account as well as received it back, so I don't think it is network related. I did some ping measurements to this particular server with the following results

 *Quote:*   

> bumblebee ~ # ping 203.0.178.180
> 
> PING 203.0.178.180 (203.0.178.180) 56(84) bytes of data.
> 
> 64 bytes from 203.0.178.180: icmp_req=1 ttl=60 time=20.4 ms
> ...

 

I eventually tried to change the MTU on my ADSL router to 1400 and issues sendmail -q to retry message delivery. Shortly afterwards all message had been delivered. 

I then disabled MTU path discovery using the following command and set the MTU back to 1492.

```
sysctl -w net.ipv4.ip_no_pmtu_disc=1
```

Then I resent some test messages, all of which were delivered. Upon further testing other messages got stuck again and were only delivered after changing the MTU to 1400.

Thanks in advance

Alex

----------

## lostinspace2011

Using MTU of 1492 on my router I get:

```

bumblebee log # tracepath -n web47.justhost.com

 1:  192.168.0.3                                           0.223ms pmtu 1500

 1:  192.168.0.1                                           0.976ms 

 1:  192.168.0.1                                           0.883ms 

 2:  192.168.0.1                                           0.966ms pmtu 1492

 2:  203.215.5.244                                        36.195ms 

 3:  203.215.4.18                                         80.277ms 

 4:  203.215.20.68                                        91.846ms asymm  5 

 5:  114.31.193.237                                       93.607ms asymm  6 

 6:  114.31.193.237                                       96.809ms 

 7:  208.178.246.85                                      296.737ms asymm  8 

 8:  208.178.246.85                                      368.962ms 

 9:  69.31.111.94                                        313.895ms asymm 13 

10:  69.31.111.94                                        315.777ms asymm 13 

11:  99.198.126.118                                      317.966ms asymm 14 

12:  no reply

13:  no reply

```

And using an MTU of 1400 on my router I get:

```

bumblebee log # tracepath -n web47.justhost.com

 1:  192.168.0.3                                           0.233ms pmtu 1500

 1:  192.168.0.1                                           0.997ms 

 1:  192.168.0.1                                           0.902ms 

 2:  192.168.0.1                                           0.983ms pmtu 1400

 2:  203.215.5.244                                        34.550ms 

 3:  203.215.4.36                                         35.384ms 

 4:  203.215.20.6                                         88.938ms 

 5:  203.215.20.148                                       92.369ms asymm  4 

 6:  114.31.199.58                                       293.905ms 

 7:  114.31.199.58                                       296.088ms asymm  6 

 8:  69.31.110.229                                       314.788ms asymm 12 

 9:  69.31.110.229                                       316.066ms asymm 12 

10:  99.198.126.118                                      314.154ms asymm 14 

11:  99.198.126.118                                      315.451ms asymm 14 

12:  no reply

13:  no reply

```

Not quite sure what this means though. Will be consulting the man pages.Last edited by lostinspace2011 on Mon Oct 25, 2010 3:34 pm; edited 1 time in total

----------

## lostinspace2011

After playing with various MTU values on both the router trying 1400 on both, then various permutation of PPPoA and PPPoE messages were still getting stuck. At this point setting my servers MTU back to 1500, the routers to 1400 and issuing sendmail -q did not delivery the messages. In this case a reboot and some patience resolved the problem.

Still is very confusing and there seem to be several factors contributing.

Alex

----------

