Load Balancing over 2 Gateways

Gunzoid · Thu Oct 21, 2010 3:55 am

I have two identical DSL circuits-one connected to ether1 and the other to ether2.
They both have /30 Public IPs.
Ether3 is the singe Local interface.

I want to send half the client IPs out ether1 and the other half out ether2.

I have read everything I could find on Load Balancing and implemented Odd/Even load balancing using nth and masquerade as in this article: http://wiki.mikrotik.com/wiki/NTH_load_ ... masquerade
It didn't work well.

Just discovered this more recent article: http://wiki.mikrotik.com/wiki/Manual:PCC

Before I implement that, does anyone have experience with what I'm trying to do or have some recommendations?

fewi · Thu Oct 21, 2010 4:33 am

When you say you want to send half the clients out one circuit and the rest out the other do you mean you want to dynamically balance connections 50:50, or do you mean you have a /24 and want to send one /25 out one circuit and the other /25 out the other circuit?

The former - in my opinion - is best done with PCC. Pick the classifier carefully, or possibly use different qualifiers for different types of traffic. HTTPS works best if the client sticks to a circuit by IP, but traffic is more evenly balanced when using ports as well. You can extend the wiki example by more lines and make different PCC decisions based on protocol and port. http://wiki.mikrotik.com/wiki/How_PCC_works_(beginner) has details on classifiers.
The latter is done by just assigning routing marks based on source address and is easier to set up and probably easier troubleshoot, but it potentially won't distribute traffic as evenly.

Gunzoid · Thu Oct 21, 2010 5:09 am

I originally picked odd/even load balancing based on IP address because it seemed to require minimal processing, but it did not work correctly-at least following the wiki article on it.
No-I'm not splitting a /24.
PCC looks good and thanks for the link.
If there's any other methods that people have tried and works as expected I'd like to hear about it.

WirelessRudy · Sat Oct 30, 2010 3:03 am

....
The former - in my opinion - is best done with PCC. Pick the classifier carefully, or possibly use different qualifiers for different types of traffic. HTTPS works best if the client sticks to a circuit by IP, but traffic is more evenly balanced when using ports as well. You can extend the wiki example by more lines and make different PCC decisions based on protocol and port. http://wiki.mikrotik.com/wiki/How_PCC_works_(beginner) has details on classifiers.
The latter is done by just assigning routing marks based on source address and is easier to set up and probably easier troubleshoot, but it potentially won't distribute traffic as evenly.

Hi "fewi", you are the autor of the article you link to? Very nice job!
Still would like to discuss some.

First, I never realized until after reading your wiki article that basically traffic coming from one client while PCC "src-addr only" is used means that traffic flow ALWAYS has same PCC process result and if a routing mark and thus routing is a result of this process the route packets from this user uses will always be the same.
I expected that even the "src-addr only" would have a random factor in it so same input (IP address) would still produce different outputs.
Is your knowledge of the process enough to actually support or oppose this?

If it is not, would MT be able to build in some extra process input, so more different outputs could be produced while src-addr stays the same.
I am thinking maybe of using the connection tracker where connections are registred. These have a limited time to live. They also must have an memory address in router. Use that field as input in the PCC process as well. Next time, after the first connection is erased from memory ("timed out") the connection with exact same parameters will be registred in a different memory location and thus we have a new data value to input in the PCC process. This could give the process an even more ´round robin´ effect.

Reason for this is following:
- With "srce-addr only" PCC I get unbalance over my links, even with 150+ users in weeks of usage. Sort of 60/40% unbalance.
- Any other available option gives better loadbalancing but brings in the effect of breaking ´inlog request>authentication server´ at many secure sites.
Up to now the only way I could stop this ´broken´ or failing authentications is using "src-address only" option.

Your suggestion to setup different PCC`s for different protocols or ports looks tempting. But I am not sure on the repercussions. e I never tried though.
PCC for only port 443 is not solving the problem that remote secure servers registrate different IP of clients. Often the initital login webpage request is made in http while only after that the browser has to start using https.

The only possible option I see is mayby separate udp and tcp protocol. But then I won't be suprised some P2P programs (like Skype!) won't like it. They use tcp and udp in their data connections so it would be nice to have them coming from the same IP.

I agree that the bigger the amount of users (different IP's) the better the load distribution. But this PCC tool is a very nice one for relative small (W)ISP's. Usually when you get so big you have several hundreds of users you start working with big single pipelines to the backbone. Then the need of PCC is gone.
So it would be nice to develop a way the tool is usable for relative small networks with small to medium amount of users (50-200)?

fewi · Sat Oct 30, 2010 3:53 am

Thanks! I am glad you found it helpful.

Using src-address only will put the same client always on the same circuit. I am sure of that both in theory and in practice.

I don't understand how adding more options of fields to feed into the hashing algorithm solves your problem. Connections are specific to a TCP 4-tuple consisting of the two IP addresses and ports. Using the connection identifier would mean each connection would potentially be on a different circuit, potentially breaking secure sites. Client sessions for online banking will pretty much always consist of several connections with different 4-tuples with changing destination IP addresses (server for dynamic content, server for pictures, etc.) and changing source and destination ports (every connection will be different here for different servers, the same server may keep a TCP connection open for multiple requests depending on implementation). The only thing static across all that is the source IP address of the client. Since there are several connections involved including the internal connection identifier doesn't work to keep the output of the hashing algorithm the same. The connections are unrelated to each other from the viewpoint of the router - in case of SSL it isn't even possible for the router to look inside the payload to see any relation if there was any to find through inspection, which there isn't. I really don't believe you have any other options available even for Mikrotik to develop for you. If you're already at 60/40 then using src-address for tcp/80 and tcp/443 traffic and using both-addresses-and-ports for everything else should get you to 55/45, which is hopefully close enough.

WirelessRudy · Sat Oct 30, 2010 1:55 pm

I don't understand how adding more options of fields to feed into the hashing algorithm solves your problem. Connections are specific to a TCP 4-tuple consisting of the two IP addresses and ports. Using the connection identifier would mean each connection would potentially be on a different circuit, potentially breaking secure sites.

That's why we use "src address only" as classifier. But OK, I will to explain better.

We use "src-address only" in PCC (other options make ´breaking´ connections and are thus not recommended.)
Now, client (always same IP) will always produce same hash output and if the result is a routing decision his traffic will always get same route out of my router.

Like your article explained:

quote: the hash function is fed 1.1.1.1 as the source IP address, 10000 as the source TCP port, 2.2.2.2 as the destination IP address and 80 as the destination TCP port. The output will be 1+1+1+1+10000+2+2+2+2+80 = 10092, the last digit of that is 2, so the hash output is 2. It will produce 2 every time it is fed that combination of IP addresses and ports. unqoute.

Now, if we would add a new digit "1" produced by conn tracker for new connections coming from same client and this new digit is the same as long as the IP of this client has connections alive, even in waiting state.
Then, if client doesn't produce any connections anymore for some time, and old ones all have timed out in connection tracker, the ROS has to monitor connection activity for that source address (so router needs to maintain a base of srce addresses) and then next time when this srce address produces new connections again it will get a new digit "2".

So, example as in variation to your lines of your article:

the hash function is fed 1.1.1.1 as the source IP address, 10000 as the source TCP port, 2.2.2.2 as the destination IP address and 80 as the destination TCP port and a one (NEW) digit "1". The output will be now 1+1+1+1+10000+2+2+2+2+80+1 = 10093, the last digit of that is 3, so the hash output is 3. It will produce 3 every time it is fed that combination of IP addresses and ports.
Now, after some time out the last digit will be changed into "2". The hash output will be changed into "4" for otherwise identical connections so the PCC will have a different output and mangle can then give it different WAN routing mark and connection of client will start to use new different WAN port!

In this scenario time out setting of connn tracker becomes a factor too.
The shorter timeouts are the more often a new digit will be produced for certain src address and the better load balancing probability will be achieved.

Resuming:
We are looking to give hash function different digit that is based on time factor (or any other factor not related to other 4 conn tracker parameters).

What do you think of this so far?

fewi · Sat Oct 30, 2010 6:08 pm

I see.
What about when the client spends 5 minutes looking at his bank account before refreshing that page? No active connections during those five minutes, a new digit, a new circuit, and the banking web site throws an alarm and logs the client out.

WirelessRudy · Sun Oct 31, 2010 3:34 am

I see.
What about when the client spends 5 minutes looking at his bank account before refreshing that page? No active connections during those five minutes, a new digit, a new circuit, and the banking web site throws an alarm and logs the client out.

Nothing happens. These tcp connections stay in the tracker data base for 24 hours by default I believe. As long as any connections with IP from that client is still in conn. tracker data base the digit should stay the same. So new connections (after the ´reload´ action) would have the same src address and still get the same digit and thus the same hash input and thus same PCC result.
Only after all connections from client have really ´died´ (timeout) in the conn. tracker's data base the digit is allowed to alter.

(In this scenario it might be advisable to set default tcp connection timeout at only some hours in stead 24. In case 24hr default is set and client would log in everyday nothing changes.)

All MT needs to do is assign a extra digit input at the hash function. This digit is depending on src-address only. The digit only changes each time that src-addresses is not listed by any means in the conn. tracker for a while and a new connection from that src address gets listed. (So ROS need to keep track of used IP's, even if they are not in the conn. tracker anymore.)

By setting now the timeouts in conn tracker smart, say 6 hours, and the "idle time for certain LAN IP in regard to conn. tracker listing" would be one hour then probably the digit only changes each new day. I presume most people sleep at least 7 hours and during this time connections will die in the tracker, timeout will run and the first moment client starts up his PC again and goes online he will now get a different digit then the day before.

I think this would produce a much better loadbalancing, even on relative small networks over some days, while at the same time the highest stability in connections is guaranteed.

WirelessRudy · Thu Nov 25, 2010 4:04 pm

Fewi,

No more new comments on our discussion?

Anyway, I have a new problem now.
Since a week added a 3rd adsl line to my router and thus made PCC 2/0-2/1 to PCC3/0-3/2 mangle filters. (All other appropiate rules also made for third line)

Got things working fine, use of all three lines by customers. Traffic going in and out of these lines.

Until people started calling in: "No Internet!"

so checked what the real problem was and the issue is that authentication sessions are not possiblle or take soo much time they time out.

I spend now 3 days to find what the problem is but I can't get my fingers on it.
PCC has src-address only as classifier which should mean connections are grouped upon their source IP. That should mean authentication sessions (jump from http to https and v.v.) should take place in same group and thus same out put which means same result in respect of routing mark.
When I setup a test PC and change the IP each time I can make it work on each of the three connections fine. Exept that for the no.3 connection it seems that authentications don't work.... very weird since this filter also has the "src address only" classifier.

I made a new topic: http://forum.mikrotik.com/viewtopic.php?f=2&t=46915
which give the exact consule out put in relation but no comments on that one so far.

The strange thing is that when a connection is routed by PCC 3/2 (so only the 3rd filter) to ANY of the three WAN's (I just change the conn. mark in the rule whereupon the routing marker gives connection another routing mark and thus another route is given by router) the problem is there. Connections running through the first two filters are OK.*

But when I just make a ´static´ or ´manual´ rule to force that conn. to get a conn mark from any of the three options the problem with authentication sides dissapeares.... Meaning that WAN route in itself doesn't have a problem.

So, I don't know how deep your knowledge about the real PCC process goes, but it looks like PCC with 3 lines has a problem.

When I had only PCc with 2 lines I did not have the problem....

Any ideas?

*How to test each filter? By changing the src IP and then run a bandwith test so I can see which filter gets the bulk traffic. I cannot force certain IP to use one of the three PCC classifiers? Next day same IP even gets new PCC classifier.

Load Balancing over 2 Gateways

Load Balancing over 2 Gateways

Re: Load Balancing over 2 Gatewaya

Re: Load Balancing over 2 Gatewaya

Re: Load Balancing over 2 Gatewaya

Re: Load Balancing over 2 Gateways

Re: Load Balancing over 2 Gateways

Re: Load Balancing over 2 Gateways

Re: Load Balancing over 2 Gateways

Re: Load Balancing over 2 Gateways

Who is online