Load Balancing not Balanced ?!?!

Hello All,
I have a few locations that are using Mikrotik routers. At these locations there are 50-100 users connecting through the system, so to accommodate the bandwidth needs of these people we have brought in 3 cable modems that dump directly into the Mikrotik into different ethernet ports. We then have a LAN ethernet port that the users connect through.

I have setup the loadbalancing as explained in the wiki, as well as trying the persistent load balancing that creates user src address lists. Both seem to work somewhat, but for some reason there is a lot more traffic going out of the default WAN connection than the other two!!

Here are the graphs from my 3 connections:
WAN1:

WAN2:

WAN3:

WAN3 is the ‘default’ WAN connection, because it is the connection the default route points to. If I change the default 0.0.0.0 route to WAN1, or WAN2 the new interface starts seeing most of the traffic.

It seems like the mangle rules are separating the traffic equally… so it seems there is a lot of traffic that is bypassing the mangle rules?

Has anyone experienced this?

Any ideas?

Thanks!
-Matt

do you use web-proxy? if yes - then ‘additional’ traffic is generated by proxy on your ‘default’ interface (WAN3). if no - then you do not need default routes w/o connection marks

No, I don’t use web-proxy.
And I only connection-mark the traffic from users’ LAN, so I need the default route for non marked traffic (traffic from router itself and management hardware not on users’ subnet).

-Matt

of course it will be like this, it all depends on how much traffic is coming from one group, and how much from the other. users are not drones that transmit at certain speeds all the time, each of them has different ways of using the internet. also it depends on how you set up your load balancing, on the rules. if you used the multiple gateways method, then it can’t be different than this - no connection is alike. maybe there are all torrent users in one group, and passive emailers in the other one. we need to see what load balancing method you used to understand the issue.

I’ve tried the nth load balancing from the wiki, as well as the ‘persistent’ load balancing that adds users to src address lists. It always shows significantly more traffic going out default gateway.
In a high-usage period, if i swap the non-marked default gateway to one of the other modems, that modem then starts taking on more traffic. So it doesn’t look like it’s one user in one group thats torrenting making one group use more than another at that time, but more like there is traffic that is escaping the route-marking.

If it helps here’s my config:

MANGLE:

 2   ;;; Load Balance - If user is already in SRC list bypass load balancing
     chain=prerouting action=mark-connection new-connection-mark=first 
     passthrough=yes in-interface=LAN src-address-list=first 

 3   chain=prerouting action=mark-routing new-routing-mark=first passthrough=n>
     in-interface=LAN src-address-list=first 

 4   chain=prerouting action=mark-connection new-connection-mark=second 
     passthrough=yes in-interface=LAN src-address-list=second 

 5   chain=prerouting action=mark-routing new-routing-mark=second 
     passthrough=no in-interface=LAN src-address-list=second 

 6   chain=prerouting action=mark-connection new-connection-mark=third 
     passthrough=yes in-interface=LAN src-address-list=third 

 7   chain=prerouting action=mark-routing new-routing-mark=third passthrough=n>
     in-interface=LAN src-address-list=third 

 8   ;;; Load Balance -  Spread traffic out between n gateways
     chain=prerouting action=mark-connection new-connection-mark=first 
     passthrough=yes connection-state=new in-interface=LAN 
     src-address=!10.3.0.0/24 nth=2,1,0 src-address-list=!Mesh 

 9   chain=prerouting action=add-src-to-address-list in-interface=LAN 
     connection-mark=first address-list=first address-list-timeout=4m 

10   chain=prerouting action=mark-routing new-routing-mark=first passthrough=n>
     in-interface=LAN connection-mark=first 

11   chain=prerouting action=mark-connection new-connection-mark=second 
     passthrough=yes connection-state=new in-interface=LAN 
     src-address=!10.3.0.0/24 nth=2,1,1 src-address-list=!Mesh 

12   chain=prerouting action=add-src-to-address-list in-interface=LAN 
     connection-mark=second address-list=second address-list-timeout=4m 

13   chain=prerouting action=mark-routing new-routing-mark=second 
     passthrough=no in-interface=LAN connection-mark=second 

14   chain=prerouting action=mark-connection new-connection-mark=third 
     passthrough=yes connection-state=new in-interface=LAN 
     src-address=!10.3.0.0/24 nth=2,1,2 src-address-list=!Mesh 

15   chain=prerouting action=add-src-to-address-list in-interface=LAN 
     connection-mark=third address-list=third address-list-timeout=4m 

16   chain=prerouting action=mark-routing new-routing-mark=third passthrough=n>
     in-interface=LAN connection-mark=third

NAT:

 0   chain=srcnat action=src-nat to-addresses=97.81.xxx.xxxto-ports=0-65535 connection-mark=third 

 1   chain=srcnat action=src-nat to-addresses=24.178.xxx.xxx to-ports=0-65535 connection-mark=second 

 2   chain=srcnat action=src-nat to-addresses=70.155.xxx.xxx to-ports=0-65535 connection-mark=first 

 3   ;;; masquerade hotspot network
     chain=srcnat action=masquerade src-address=10.0.8.0/24

Route:

13   S 0.0.0.0/0                          r 70.155.xxx.xxx 1        WAN1     
14 A S 0.0.0.0/0                          r 70.155.xxx.xxx          WAN1     
15 A S 0.0.0.0/0                          r 24.178.xxx.xxx          WAN2     
16 A S 0.0.0.0/0                          r 97.81.xxx.xxx             WAN3     
17   S 0.0.0.0/0                          r 24.178.xxx.xxx   5        WAN2     
18 A S 0.0.0.0/0                          r 97.81.xxx.xxx    0        WAN3

Am I missing something? :confused:

Interesting this, as I posted a couple of weeks back with exactly the same problem, and still don’t have a workable solution. We tried it with 4 gateways, with the config exactly as the wiki, except modified for the additional gateways.
We’re not so inexperienced that we’d assume all users’ traffic was alike, and made changes similar to what you have done - adding a default gateway for ‘unmarked’ traffic, and saw 90% of Internet bound traffic (upload) always went out whichever interface was assigned the default route.

Nothing we tried made any sort of improvement on the flow - and we monitored it over a few days.

Eventually we gave up trying to implement a working persistent-session setup and left it as simple ECMP with static routing rules for certain protocols that tend to break with ECMP. Back to the long clumsy list of routing policy rules…

as for me, I do not use any default gateway for non-marked packets, all works fine, and balance seems to be balanced =) I use my own address-list-based banalcing method

I never figured this out… it seems that mangle was missing packets.
What I ended up doing was creating an eoip tunnel on each WAN interface to a 100mbit connection at another location and bonding the eoip tunnels together and creating a default route out of the bond. Now the traffic on all links is really balanced. I wish i could have gotten it to work without doing this though :confused:

-Matt