Some users complained about connection interruptions every 10-30min. After looking deeper into problem we were able to determine that Linux Kernel initiates a routing table flush every 10min. So after that connections are assigned to gateways once again and may or may not be on the same gateway.
If you have fully routed network (clients address can be routed via all available gateway), change of the gateway will have no ill effect, but in case you use masquerade, change of the gateway will result in change of the packet’s source address and connection will be dropped.
This flush was introduced by Linux Kernel developers to eliminate possibility of DoS attacks on your routers.
Hi Normis, thank for the information re ECMP. I have changed my ECMP config to NTH load balancing with masquerade as per the Wiki. Everything seems to work fine except that the load balancing is very skewed towards the “odd” connection.
Below find the config that I have implemented on ROS3.22:
If I investigate my firewall connections, address-lists and interface loads I see the following:
admin@MikroTik] /ip firewall address-list> print
Flags: X - disabled, D - dynamic
LIST ADDRESS
0 D odd 192.168.254.2
1 D odd 192.168.0.101
2 D odd 192.168.0.62
3 D even 192.168.0.34
4 D odd 192.168.0.66
5 D odd 192.168.0.32
6 D even 192.168.254.1
7 D odd 192.168.0.15
8 D odd 192.168.0.31
9 D odd 192.168.0.10
10 D even 192.168.0.5
11 D odd 192.168.0.40
12 D odd 192.168.0.63
13 D odd 192.168.0.250
Most of the IP’s are assigned to the “odd address list” i.e only 3 out of 14 to even. As a result the load on my odd interface is also much higher than the even interface. Am I missing something or what can be the reason for the “skew allocation”?
I have changed my config to the one listed below in order to “resolve my problem” where my “odd address list” contained about 3-4 times more addresses than my “even address list”
With the above config I do not get any duplication of addresses in the different address lists and I had a split of 8/7 (i.e. out of a total of 15 IP’s) addresses respectively in my odd/even address lists.
Yes, this seems to be doing much better. I successfully downloaded an entire ISO without dropping. Obviously the flushing of the routing table was causing my issues. This solution is Ok, but the ECMP was better in that opening multiple connections from one host gave more “bandwidth” to that host since some of those connections would go over both links. I have not fully completed my setup but it does split the connections based on IP. I used the above posters mangle rules but alas I have less hosts at this location so could not really see if I was getting odd or even marks distributed better. Once I have this locked down I will try one of my other test sites and report back.
Will there be any changes made by Mikrotik to enable us to use the ECMP solution with masqueraded connections again? I really liked the connections getting balanced. Thanks for tearing it to this more deeply.
Does it affect the TCP performance when the routing table is flushed? Do we have packet drops or packet delays because of it?
To our Latvian friends: I see you have added my “Connections to the router itself” config that we developed with mcgaiver, to the ECMP WiKi but the editing is disabled so I was not able to add my other part of the config that works around the flushing problem so connections stay on their proper gateways even after flushed…
Oh and by the way, could we use Private Messages on this forum again please?
If you wanted to post it I would be happy to test it. My only problem with ECMP was the routing table flushing. If it was posted in your other thread, I missed it .
hi,
i have rb1000 and i have 2 wan and 1 local ip … i have tried these setting but all the connection was odd or even there is no balancing and only one wan working …
i dont want to balance between them because one wan is 4Mb and the other is 1Mb
is there any setting to do this ???
thanks.
I think the best way to go is to use ECMP which can balance between different links (4M and 1M) distributing across all links perfectly, and implement the workaround of the routing table flushing that I am using since months already. In the setups of all my remote clients (that contact me via givememorebandwidth AT gmail DOT com with WinBox login info) it works perfectly, so it is tested.
Setting up a proper ECMP route gives you controlled way of balance, you can balance 4/5 of all connections to one gateway and 1/5 to another. It also provides good fail-over - it can detect whether the gateway IP replies to icmp or arp requests, it can work with gateway interfaces instead of gateway IP addresses, making it perfect in the case with one ISP/same gateway/multiple connections. Another good thing about it is that when one user downloads with more TCP connections, they are distributed across the gateways so he can have ALL the bandwidth for himself when he needs it.
I actually am using gateway names with the nth config too. This configuration has been solid. I will be rolling this to a few test sites and then going back to the ECMP configuration. I will figure out the way to work around the flushes and post it for everyone.
I agree that nth load balancing is not as granular as ECMP. However, it does not have any of the issues that requires one to implement policy routing to make ECMP work properly. Furthermore once you have forced certain connections via a particular interface with policy routing your load balacing is anyway skewed and then you are still uncertain whether you have catered for all “exceptions”. Also as far as I’m concerned multiple PPTP connections will not work properly with ECMP because of the issues with policy routing PPTP.
Also please note that the order of mangle rules in any of the nth load blancing examples are important. It will not work properly if you change the order of some the rules.
By the way, PPTP works with ECMP + route table flush workarounds, but L2TP does not. I mean from the router itself to a external router somewhere in the wild. Tested. I wonder how come L2TP does not work, what is so special about it that is different from PPTP for example? L2TP gets hit by the flush, the tunnel drops each 10 minutes +/- 2 or 3 or ?. I guess the connection-state=new does not catch it.
Are you useing l2tp/ipsec? if so then yes it would be affected by the flush since it uses higher level protocol than the routing protocols. Thus when the routes are flushed everything in the upper layers have to reconnect. Since the ipsec is considered lvl4/5 (depending on who you talk to) it will have to reconnect on a route flush since that happens on lvl3.
The mangle rules that we use (route table flush workarounds as I call em) fix the TCP connections (no reconnects, no loss in performance (not 100% analyzed for perf)) so they should fix everything. But L2TP somehow we miss. Maybe it is missed by connection-state=new as I said. Not sure.