New Load Balanced Setup - Poor Performance

That should work

/ip route
add disabled=no distance=1 dst-address=0.0.0.0/0 gateway=55.188.40.1 pref-src=\
    "" routing-table=main scope=30 suppress-hw-offload=no target-scope=10
add disabled=no distance=1 dst-address=0.0.0.0/0 gateway=192.168.1.254 \
    pref-src="" routing-table=main scope=30 suppress-hw-offload=no \
    target-scope=10
	
add disabled=no distance=1 dst-address=108.177.122.100/32 gateway=55.188.40.1 \
    pref-src="" routing-table=main scope=12 suppress-hw-offload=no \
    target-scope=11
add disabled=no distance=1 dst-address=74.6.143.25/32 gateway=192.168.1.254 \
    pref-src="" routing-table=main scope=12 suppress-hw-offload=no \
    target-scope=11
	
add check-gateway=ping distance=1 gateway=108.177.122.100 routing-table=to-ISP1 \
    target-scope=12
add check-gateway=ping distance=2 gateway=74.6.143.25 routing-table=to-ISP1 \
    target-scope=12
add check-gateway=ping distance=1 gateway=74.6.143.25 routing-table=to-ISP2 \
    target-scope=12
add check-gateway=ping distance=2 gateway=108.177.122.100 routing-table=to-ISP2 \
    target-scope=12

@sindy

After disabling those routes, still having issues with existing/new connections. I can make WAN2 fail in simulation, still working on WAN1 simulation. Here is some data:

Route Export:

ip route
add disabled=no distance=1 dst-address=0.0.0.0/0 gateway=55.188.40.1 pref-src=\
    "" routing-table=main scope=30 suppress-hw-offload=no target-scope=10
add disabled=no distance=1 dst-address=0.0.0.0/0 gateway=192.168.1.254 \
    pref-src="" routing-table=main scope=30 suppress-hw-offload=no \
    target-scope=10
add disabled=yes distance=1 dst-address=0.0.0.0/0 gateway=55.188.40.1 pref-src=\
    "" routing-table=to-ISP1 scope=30 suppress-hw-offload=no target-scope=10
add disabled=yes distance=1 dst-address=0.0.0.0/0 gateway=192.168.1.254 \
    pref-src="" routing-table=to-ISP2 scope=30 suppress-hw-offload=no \
    target-scope=10
add disabled=no distance=1 dst-address=108.177.122.100/32 gateway=55.188.40.1 \
    pref-src="" routing-table=main scope=11 suppress-hw-offload=no \
    target-scope=10
add disabled=no distance=1 dst-address=74.6.143.25/32 gateway=192.168.1.254 \
    pref-src="" routing-table=main scope=11 suppress-hw-offload=no \
    target-scope=10
add check-gateway=ping disabled=no distance=1 dst-address=0.0.0.0/0 gateway=\
    108.177.122.100 pref-src="" routing-table=to-ISP1 scope=30 \
    suppress-hw-offload=no target-scope=11
add check-gateway=ping distance=2 gateway=74.6.143.25 routing-table=to-ISP1 \
    target-scope=11
add check-gateway=ping distance=1 gateway=74.6.143.25 routing-table=to-ISP2 \
    target-scope=11
add check-gateway=ping distance=2 gateway=108.177.122.100 routing-table=to-ISP2 \
    target-scope=11

Here is route Print Detail BOTH WANs WORKING:

 /ip route print detail
Flags: D - dynamic; X - disabled, I - inactive, A - active; 
c - connect, s - static, r - rip, b - bgp, o - ospf, d - dhcp, v - vpn, m - mode>
H - hw-offloaded; + - ecmp 
 0  Xs   dst-address=0.0.0.0/0 routing-table=to-ISP1 pref-src="" 
         gateway=55.188.40.1 distance=1 scope=30 target-scope=10 
H - hw-offloaded; + - ecmp 
 0  Xs   dst-address=0.0.0.0/0 routing-table=to-ISP1 pref-src="" 
         gateway=55.188.40.1 distance=1 scope=30 target-scope=10 
         suppress-hw-offload=no 

 1  Xs   dst-address=0.0.0.0/0 routing-table=to-ISP2 pref-src="" 
         gateway=192.168.1.254 distance=1 scope=30 target-scope=10 
         suppress-hw-offload=no 

 2  As + dst-address=0.0.0.0/0 routing-table=main pref-src="" 
         gateway=192.168.1.254 immediate-gw=192.168.1.254%ether2-ISP2 
         distance=1 scope=30 target-scope=10 suppress-hw-offload=no 

 3  As + dst-address=0.0.0.0/0 routing-table=main pref-src="" gateway=55.188.40.>
         immediate-gw=55.188.40.1%ether1-ISP1 distance=1 scope=30 
         target-scope=10 suppress-hw-offload=no 

   DAc   dst-address=10.10.10.0/24 routing-table=main gateway=LAN 
         immediate-gw=LAN distance=0 scope=10 suppress-hw-offload=no 
         local-address=10.10.10.1%LAN 

   DAc   dst-address=65.188.80.0/20 routing-table=main gateway=ether1-ISP1 
         immediate-gw=ether1-ISP1 distance=0 scope=10 suppress-hw-offload=no 
         local-address=55.188.40.126%ether1-ISP1 

 4  As   dst-address=74.6.143.25/32 routing-table=main pref-src="" 
         gateway=192.168.1.254 immediate-gw=192.168.1.254%ether2-ISP2 
         distance=1 scope=11 target-scope=10 suppress-hw-offload=no 

 5  As   dst-address=108.177.122.100/32 routing-table=main pref-src="" 
         gateway=55.188.40.1 immediate-gw=55.188.40.1%ether1-ISP1 distance=1 
         scope=11 target-scope=10 suppress-hw-offload=no 

   DAc   dst-address=192.168.1.0/24 routing-table=main gateway=ether2-ISP2 
         immediate-gw=ether2-ISP2 distance=0 scope=10 suppress-hw-offload=no 
         local-address=192.168.1.105%ether2-ISP2 

 6  As   dst-address=0.0.0.0/0 routing-table=to-ISP1 pref-src="" 
         gateway=108.177.122.100 immediate-gw=55.188.40.1%ether1-ISP1 
         check-gateway=ping distance=1 scope=30 target-scope=11 suppress-hw-offload=no 

 7   s   dst-address=0.0.0.0/0 routing-table=to-ISP1 pref-src="" 
         gateway=74.6.143.25 immediate-gw=192.168.1.254%ether2-ISP2 
         check-gateway=ping distance=2 scope=30 target-scope=11 
         suppress-hw-offload=no 

 8   s   dst-address=0.0.0.0/0 routing-table=to-ISP2 pref-src="" 
         gateway=108.177.122.100 immediate-gw=55.188.40.1%ether1-ISP1 
         check-gateway=ping distance=2 scope=30 target-scope=11 
         suppress-hw-offload=no 

 9  As   dst-address=0.0.0.0/0 routing-table=to-ISP2 pref-src="" 
         check-gateway=ping distance=1 scope=30 target-scope=11 
         suppress-hw-offload=no

WAN2 DOWN Detail:

WAN 2 DOWN
 /ip route print detail
Flags: D - dynamic; X - disabled, I - inactive, A - active; 
c - connect, s - static, r - rip, b - bgp, o - ospf, d - dhcp, v - vpn, m - mode>
H - hw-offloaded; + - ecmp 
 0  Xs   dst-address=0.0.0.0/0 routing-table=to-ISP1 pref-src="" 
         gateway=55.188.40.1 distance=1 scope=30 target-scope=10 
         suppress-hw-offload=no 

 1  Xs   dst-address=0.0.0.0/0 routing-table=to-ISP2 pref-src="" 
         gateway=192.168.1.254 distance=1 scope=30 target-scope=10 
         suppress-hw-offload=no 

 2  As + dst-address=0.0.0.0/0 routing-table=main pref-src="" 
         gateway=192.168.1.254 immediate-gw=192.168.1.254%ether2-ISP2 
         distance=1 scope=30 target-scope=10 suppress-hw-offload=no 

 3  As + dst-address=0.0.0.0/0 routing-table=main pref-src="" gateway=55.188.40.>
         immediate-gw=55.188.40.1%ether1-ISP1 distance=1 scope=30 
         target-scope=10 suppress-hw-offload=no 

   DAc   dst-address=10.10.10.0/24 routing-table=main gateway=LAN 
         immediate-gw=LAN distance=0 scope=10 suppress-hw-offload=no 
         local-address=10.10.10.1%LAN 

   DAc   dst-address=55.188.40.0/20 routing-table=main gateway=ether1-ISP1 
         immediate-gw=ether1-ISP1 distance=0 scope=10 suppress-hw-offload=no 
         local-address=55.188.40.126%ether1-ISP1 

 4  As   dst-address=74.6.143.25/32 routing-table=main pref-src="" 
         gateway=192.168.1.254 immediate-gw=192.168.1.254%ether2-ISP2 
         distance=1 scope=11 target-scope=10 suppress-hw-offload=no 

 5  As   dst-address=108.177.122.100/32 routing-table=main pref-src="" 
         gateway=55.188.40.1 immediate-gw=55.188.40.1%ether1-ISP1 distance=1 
         scope=11 target-scope=10 suppress-hw-offload=no 

   DAc   dst-address=192.168.1.0/24 routing-table=main gateway=ether2-ISP2 
         immediate-gw=ether2-ISP2 distance=0 scope=10 suppress-hw-offload=no 
         local-address=192.168.1.105%ether2-ISP2 

 6  As   dst-address=0.0.0.0/0 routing-table=to-ISP1 pref-src="" 
         gateway=108.177.122.100 immediate-gw=55.188.40.1%ether1-ISP1 
         check-gateway=ping distance=1 scope=30 target-scope=11 
         suppress-hw-offload=no 

 7  IsH  dst-address=0.0.0.0/0 routing-table=to-ISP1 pref-src="" 
         gateway=74.6.143.25 immediate-gw="" check-gateway=ping distance=2 
         scope=30 target-scope=11 suppress-hw-offload=no 

 8  As   dst-address=0.0.0.0/0 routing-table=to-ISP2 pref-src="" 
         gateway=108.177.122.100 immediate-gw=55.188.40.1%ether1-ISP1 
         check-gateway=ping distance=2 scope=30 target-scope=11 
         suppress-hw-offload=no 

 9  IsH  dst-address=0.0.0.0/0 routing-table=to-ISP2 pref-src="" 
         gateway=74.6.143.25 immediate-gw="" check-gateway=ping distance=1 
         scope=30 target-scope=11 suppress-hw-offload=no

What kind of issues in particular? Some connections do not get through although you assume they should, some connections do get through although you assume they should not, or some connections establish via a WAN that is simulated to be down?


Both print outputs show that everything works as expected:

  • when both WANs are simulated to be OK, routes 6 and 9 are active because their recursive gateways are reachable, and routes 7 and 8 are not used (not marked as Active) simply because they have higher distance values than 6 and 9 ones.
  • when WAN 2 is simulated to be down, routes 6 and 8 are active whereas 7 and 9 are marked as Inactive because their recursive gateway (74.6.143.25) is unreachable. So even though route 8 has higher distance than route 9, it becomes Active.

So what is the issue when WAN 2 is simulated to be down, i.e. what behaves different than you expect?

As for simulation of WAN 1 failure, it is enough to add the following firewall rule:

/ip/firewall/filter
add chain=output out-interface=ether1-ISP protocol=icmp dst-address=108.177.122.100 action=drop

This will prevent the check-gateway pings from reaching the recursive gateway, having the same effect as if it becomes unreachable due to some issue further away in the network.

So what is the issue when WAN 2 is simulated to be down, i.e. what behaves different than you expect?

Existing clients connected have no path to internet. They are unable to re-establish existing connections in most cases. Seems like the router is still trying to send traffic to the down WAN.

Connections that are established at the moment WAN2 goes down cannot continue if there is NAT, because the connection tracking keeps sending the packets belonging to these connections out with the source address of the WAN2 interface even though they are routed via WAN1. So either the routers on the path to the destination drop them, or the server sends the response to this private address and thus they are routed to that private address within the context of the server’s network, so they cannot reach your router neither via WAN 1 nor via WAN 2. The clients have to initiate new connections whose first packet will be sent via WAN 1 and NATed accordingly. E.g. in case of ping, you have to wait 10 seconds so that the tracked connection is forgotten, and then try again.

You can use /tool/sniffer and /ip/firewall/connection/print detail where src-address~“sour.ce.i.p:port” dst-address~“de.st.i.p:port” to visualise this.

Thanks @sindy. So from your perspective, my setup is working as expected based on the information I was able to provide. I’ll I’m going to keep testing scenarios. Truly appreciate all the help this group has provided.

If you can confirm that clients who use routing table to-ISP2 establish their connections via WAN 2 while the internet is reachable through WAN 2, and establish new connections via WAN 1 while it is not, then yes, it works as expected. Including the fact that existing connections via WAN 2 do not automatically continue via WAN 1 after internet becomes unreachable via WAN 2.

@sindy are the 10 seconds the TCP timeout you re referring to ?

10 seconds are ICMP default timeout. For TCP, there are different timeouts depending on the current state of the session (24 h if everything has been ACKed, just 5 min if I remember well if there are unACKed data, etc.). I was referring specifically to ICMP as it is both simple to test and the timeouts are short.

So the current solution is having problems with clients that have active sessions (which we know won’t work) AND trying to establish new connections. If I use the drop script OR another method (disconnect WAN, but gateway remains IP remains up), traffic is still being sent to both WAN’s, even though it looks down in the routing table.

I’m going to re-do this setup using ROSv6. Wonder if something is not ‘baked’ right in ROSv7 just yet. Will report my findings back soon.

ok @sindy so you 're referring to the check-gateway ping timeout…
Yes indeed, for the unacked and retransmitted TCP packets the default timeout is 5 minutes…
But there are as well 10 second timeouts for some TCP wait timeouts etc. and i thought you were referring to them.

Not even that, it’s a coincidence (or maybe not?) that the lifetime of a pinhole (tracked connection) in firewall is 10 s by default for pings, and that the check-gateway pings are also being sent 10 s apart.

I really only suggested a connection type (ping) that is the fastest one to die off from the connection tracking module of the firewall, allowing you to initiate a new connection rather than reusing the existing one very soon after imitating the failure of the uplink - you need to wait at max 10 seconds for the failure to be detected, then stop the outgoing ping for at least 10 seconds so that the pinhole could be dropped, and then ping again to see whether it succeeds via the working uplink or not (or, as @gutowscr471 says to be the case in ROS 7.1, whether it continues to be sent via the uplink that should be down).

I still don’t understand that behaviour, as when the route via WAN 2 in table to-ISP2 becomes inactive because the check-gateway ping fails, there is still the route via WAN 1 in that table, so there is no reason why packets bearing a routing-mark to-ISP2 should fall back to routing table main and use the route via WAN 2 from that table. So maybe the routing-mark is actually not assigned properly, but I’m tired asking people over and over again to post the complete configurations rather than just those parts they deem related to their issue.