It appears two RouterOS methods for outbound load balancing is ECMP and source routing.
Source routing is fine when you have only a few clients, but once you start getting hundreds and you have rely on multiple (cheap) uplinks in order to distribute your bandwidth evenly source routing isn’t ideal because you can’t always predict how busy specific groups of customers will be.
ECMP works really well in this regarding but has it’s share of problems, such as downloads that break and certain connection oriented applications (MSN, VoIP, etc.) that rely on packets arriving in the same order / or source address that it came from.
The advantage appears to be that “Because per-destination load balancing depends on the statistical distribution of traffic, load sharing becomes more effective as the number of source-destination pairs increase.”
Now wouldn’t this be a super feature for RouterOS?
the ECMP cannot break any connecton, as it is not connection-based, but IP-pair based.
quick google search finds the following (something similar mentioned in the docs, I believe):
Linux kernel performs multipath routing at flow-based principle. “Flow” here means “all connections with same source and same destination”, so if you will test your ECMP setup by starting multiple FTP transfers from client A to server B, everything will flow through 1 interface.
A new gateway is chosen for each new source/destination IP pair. It means that, for example, one FTP connection will use only one link, but new connection to a different server will use another link. ECMP routing has another good feature - single connection packets do not get reordered and therefore do not kill TCP performance.
The problem is that these IM programs do not keep a single connection the entire time you are connected - they are very transient even though they use TCP. So, when you log in you might be using a one IP for the first 5 minutes, but when their systems decide to switch (using load balancing on there side also!) to another server they do not associate your previous IP and get confused as well. This causes symptoms like logging out and back in repeatedly, etc. There are problems with the protocol implementation of the application, not mikrotik.
ECMP is per IP pair, not taking into account ports. So a connection to a single server on the other end should have no problems using ECMP. Its when you add in many tcp connections to different machines that are all working together for the remote server, ie a site like hotmail or yahoo webmail, etc.
The only true way to get rid of all these issues, IMO, is to be able to send packets out of both gateways using the same source IP range. Or use policy routing and design it how your network is tuned.
They should, but they don’t., Tha classic download from single server, classic chat system etc. are broken with ECMP and nobody knows what to do about it. I was using ECMP few weeks, trying to make it working, asking MT support, other local “guru”… and I lost few customers thru that time… ECMP doesn’t work well on mt at all. Forget about this “feature”.
I personally see the role of ECMP only in controlled environment, where “open internet” surely does not belong to. Thinking of it, I could use ECMP to almost double wireless transfer speeds - instead of one antenna on each end, you have to use two (four total for one complete link). Then, by using ECMP you can double (or almost double) transfer speeds.
Your most outter connection to internet has to be only one.
Let’s compare advantages and disadvantages of two ECMP “bounded” wireless links with nstreme2:
ECMP advantages : (theoretically) higher throughput for all combined TCP/UDP streams as this should bound two independent 20Mbps links to one “almost” 40Mbps capable. This 40Mbps capacity could be dynamically spread over download/upload channels, so you can have 30Mbps down/10Mbps up one moment, then 5Mbps down/35Mbps up another moment. Although single client will not see benefit of faster speed on one downloading thread, you have better throughput what is very significant advantage for you.
Nstreme2: almost real “fullduplex” operation, separated TX/RX channels. Total bandwidth is not doubled, combine two 20Mbps normal halfduplex wireless links and you have one with 20Mbps down AND CONCURRENTLY 20Mbps up. So your total throughput will not be 40Mbps one-direction like in first case, but you will have PERFECT 20Mbps. Sometimes, this is important.