Merge 2 ISP bandwidth into one

Hi,

i have one PTCL ISP and one Static ISP in my office, i want to merge both these ISP internet into one, that would be like, 15MB ISP + 20MB ISP2 = 35MB,
This 35mb should be for all users in my DHCP pool, not for specific users, i have one pool and has configured auto failover already, i just want to merge then into one.

is there any HELP!!

You won’t be able to get one user able to use 35mbps without a LOT of work/costs and varied results.

If you just want to load-balance, use both at once, its simple - you can add your default route with multiple gateways in an “ECMP” style like this:

/ip route add check-gateway=ping distance=1 gateway=1.2.3.4,9.8.7.6

This will essentially send connections 50/50 out of each gateway (there are 2 defined)
You can load one up more by adding it multiple times, for instance 66/33 split:

/ip route add check-gateway=ping distance=1 gateway=1.2.3.4,1.2.3.4,9.8.7.6

Assuming standard NAT setup, this will still hold established connections on the same WAN, and if one connection is using heaps of bandwidth but another not much, it doesn’t differentiate, its more of a round-robin style load balancing - but effectively it should split your traffic fairly well.

Thats funny, never seen load balancing for dual wan setups or more, without mangling.
Here is one such example…

https://mum.mikrotik.com/presentations/US12/steve.pdf

Damn, that was a VERY interesting document !
Thanks for sharing.

Its the dummies guide and best document for Load balancing I have run across.

I use my example without mangles just fine, even for a national ambulance service headquarters.

I use mangle for inbound establishment (port forwards, VPN etc) to make sure return traffic remains on the same WAN - but NAT traffic is just handled by normal connection handler and basic masquerade rules, balances quite well between their 2 providers.

EDIT: it works as @joegoldman describes when routing cache is on. The only drawback may be that the routing cache is flushed every 10 minutes according to the manual.

If routing cache is disabled, a mandatory pre-requisite for this to work is that none of the ISPs cares about source IP addresses of packets coming from the client, which is often the case I admit. That’s BTW why the DDoS attacks using DNS are so easy to accomplish. Plus, to work flawlessly, it’s also necessary that the bandwidth and transport delay of both uplinks is almost the same.

The router receives a SYN packet from a LAN client, ECMP sends it “randomly” out via WAN #1 (it’s actually not random, but let’s say too complex to predict), so the whole connection gets src-nated to ip.of.wan.1. The server sends its SYN+ACK response to ip.of.wan.1, so the response arrives to WAN #1, all good. The ACK packet from the LAN client then goes via WAN #2 due to ECMP, but connection tracking nevertheless changes its source IP address to ip.of.wan.1. So if the ISP serving WAN #2 delivers such a packet, the server is happy as it comes from the same IP address like the SYN; if the ISP serving WAN #2 drops packets from “illegal” addresses, that packet won’t get through, so the client has to retransmit it. If the retransmission hits WAN #1 due to ECMP, the connection still succeeds. And this repeats throughout the whole lifetime of the connection.

So if the ISPs don’t validate source addresses, it’s effectively the same like random distribution of connections (not packets) for the internet->LAN direction, and random distribution of packets (not connections) for the LAN->internet direction. If the ISPs validate source addresses, there are many retransmissions in case of acknowledged transport protocols, and up to 50 % of mere loss in case of transport protocols with no acknowledging.

There is also the other drawback, some protocol stacks aren’t happy to receive packets in shuffled order. So if the transport delay between the LAN client and the server in the internet is about the same via each of the ISPs, everything is fine; if there is a difference, packets sent via the faster ISP regularly overtake those sent via the slower one, often triggering unnecessary retransmissions again.

The discher PDF works for all cases as it more closely controls traffic flow incoming and outgoing and doesnt rely on ISP characteristics or handling.
Its what I would choose.

In this particular instance using NAT only this is not my experience - I agree it could/would happen this way in non NAT situations but essentially as the connection starts, it uses the ECMP route to choose which nexthop its going to use, establishes the connection using that NAT src-ip for that WAN, then that specific connection stays tracked via that route/nexthop until it destroys - the hashing algo seems to hold it well enough that new connections within same session don’t flip flop and cause issues and all packets within an established connection seems to stay within that WAN without issue.

Not sure if this goes against others experience but we’ve had no odd issues or complaints, and one of the ISP’s does do src address filtering.

My bad. It’s the routing cache that makes the difference, not the connection tracking in firewall. If enabled, all packets to the same destination keep using the same gateway as you describe. If disabled, it behaves the way I’ve described above (per packet distribution).

The manual states that the routing cache is flushed every 10 minutes, but maybe it is not true any more?


Can you elaborate on what exactly you mean by connection and session here? For me, a “connection” is the internal representation of a “session” in a firewall.