Community discussions

MUM Europe 2020
 
flameproof
Frequent Visitor
Frequent Visitor
Topic Author
Posts: 75
Joined: Tue Sep 01, 2015 3:17 pm

Failover route fails to carry traffic upon primary failure

Fri Sep 28, 2018 12:43 pm

Hi all,

I have a CCR1016 configured with a default gateway on ETH1 (10.20.10.1), distance 1, check gateway via ping. Then, another route on ETH5 (10.20.16.1) with distance 2. When I disable the primary interface on the upstream router, the CCR correctly marks the primary route as unreachable, and makes the secondary route active.

However, almost no traffic is sent towards the secondary route in this scenario. I have Hotspot and PPPoE servers on the CCR, pulling in some 250Mbps. All sessions stay active, the RADIUS server is reachable on the secondary route, and I can run a bandwidth test on the secondary route against the upstream router, all good.

Any ideas as to reasons for this?
 
User avatar
Anumrak
Forum Guru
Forum Guru
Posts: 1055
Joined: Fri Jul 28, 2017 2:53 pm

Re: Failover route fails to carry traffic upon primary failure

Fri Sep 28, 2018 1:48 pm

Do you using masquerading? Do you assign out interfaces for masq in failover direction?
 
stoser
Member Candidate
Member Candidate
Posts: 107
Joined: Sun Aug 21, 2016 12:04 am

Re: Failover route fails to carry traffic upon primary failure

Fri Sep 28, 2018 4:29 pm

I have seen that when the WANs are from a different providers with a different public IP and gateway subnet, many stateful connections hang. This occurs when connection marks are used and the corresponding connections are not expired. Over time I have seen that the traffic slowly increases on the new gateway, and in a few minutes most connections are trafficking correctly. I think that this behavior occurs because the servers interpret the change in client IP as a man in the middle hack. Sometimes I have had to delete all active connections that correspond to the change of WAN. This of course forces the affected user to have to sign into some services again, but this is a small negative compared to zero access.

I have also seen that if the new WAN has dynamically added DNS entries (like with pppoe connections and some DHCP client connections), I have had to delete the dns cache after a wan change. It appears that my ISPs usurp many services by overwriting their DNS entries, so as to keep them local, and a switch causes problems with the most common services like youtube, gmail, etc.

If you find that this is your problem, your could write a scheduled script that tests for active WAN interface and when it detects a change, takes the appropriate actions such as delete all corresponding active connections and/or clears the dns cache.

Hope this helps. Kind regards -
 
User avatar
Anumrak
Forum Guru
Forum Guru
Posts: 1055
Joined: Fri Jul 28, 2017 2:53 pm

Re: Failover route fails to carry traffic upon primary failure

Fri Sep 28, 2018 4:38 pm

Connections get hanging in conn tracker only if you choose whatever source nat action except masquerade. Masq will drop all connection by himself if route though masq interface is unreachable.
 
stoser
Member Candidate
Member Candidate
Posts: 107
Joined: Sun Aug 21, 2016 12:04 am

Re: Failover route fails to carry traffic upon primary failure

Fri Sep 28, 2018 5:02 pm

Connections get hanging in conn tracker only if you choose whatever source nat action except masquerade. Masq will drop all connection by himself if route though masq interface is unreachable.
Anumrak, thank you for this info. There is one interface that I was using src-nat instead of masquerade. It was the main interface, running the majority of the traffic. This explains why most connections would hang. Thank you again, I will change and test.
 
User avatar
Anumrak
Forum Guru
Forum Guru
Posts: 1055
Joined: Fri Jul 28, 2017 2:53 pm

Re: Failover route fails to carry traffic upon primary failure

Fri Sep 28, 2018 5:05 pm

Connections get hanging in conn tracker only if you choose whatever source nat action except masquerade. Masq will drop all connection by himself if route though masq interface is unreachable.
Anumrak, thank you for this info. There is one interface that I was using src-nat instead of masquerade. It was the main interface, running the majority of the traffic. This explains why most connections would hang. Thank you again, I will change and test.
Be careful with processor utilization.
 
stoser
Member Candidate
Member Candidate
Posts: 107
Joined: Sun Aug 21, 2016 12:04 am

Re: Failover route fails to carry traffic upon primary failure

Fri Sep 28, 2018 5:51 pm

Anumrak: I can confirm that your suggestion solved my problem, thanks again. The routing failover is now very fast and the connections do not hang. CPU is good. Hopefully you OP had the same problem, and this will help him as well. Back to the OPs original topic request ...
 
flameproof
Frequent Visitor
Frequent Visitor
Topic Author
Posts: 75
Joined: Tue Sep 01, 2015 3:17 pm

Re: Failover route fails to carry traffic upon primary failure

Tue Oct 02, 2018 10:37 am

@stoser how did you fix specifically? I believe the issue is how our masquerade is configured - we masquerade PPPoE to a specific upstream IP address, which of course breaks when the secondary route takes over, as the upstream IP range is different.

I'm simulating all this in the lab but have a few details to work out.

Who is online

Users browsing this forum: No registered users and 95 guests