EMULATING peplink BONDING with RoS

anav · October 23, 2024, 1:29pm

https://www.youtube.com/watch?v=g7-44SOtEXw

It would appear that a vendor is selling the ability to 'BOND" to ISP connections such as starlink such that both are being utilized.
I am not sure how this is any better than or different from load balancing.
I think trying to understand it, its more like VRF ( but no slave and master but using both ) and more clearly load balancing the traffic behind the VRF.

So the challenge is to emulate the same capability with RoS, assuming that one has access to a CHR in the cloud.
It may be already doable with current functionality that I dont use like OSPF BFD BGP etc…

Amm0 · October 27, 2024, 5:29pm

The answer I’d like to give is use /zerotier multipath settings to do your desired bonding: https://docs.zerotier.com/multipath/
Sadly that is NOT an option.

Since I occasionally use the peplink things… I kinda know how the peplink generally work… Also note there are additional recurring costs to do bonding (or buying TWO peplink routers, one for each end). But they do put a “QuickSet” like UI around anything “multiwan”, which isn’t bad. So if you want bonding or load balancing, you changes some radio buttons and dropdown in various places. It not really magic, they do just to use multiple VPN tunnels to some cloud or another peplink to create a bond – so two ends are required for bonding, same is every other approach. They support load balancing and failover in the web UI too, and like Mikrotik those don’t require “another side”, since load balancing happens locally on peplink if you select that mode.

But Peplinks do focus on multiwan, so some RouterOS things are just not there or just simple checkbox like “Enable DHCP Server”… so fine-grain control over things OTHER than multiwan… Peplink are more limited in options than RouterOS

I cannot stress enough there is a cost to bonding in latency and packet overhead. And whether bonding adds, or reduces resiliency & redundancy… is a bit more nuanced question IMO. And depending on the specific WAN types/speed, and your typical traffic i.e. a bond makes all traffic go out one place to internet, through many steps – peplink, Mikrotik, whatever – while something like load balancing is just simplier & lower-overhead. Most web traffic is very transitory, so handing requests with lowest latency is often better than having more bandwidth for web pages. Now, I totally get that bonding may look good in a speedtest app. And there are legit use cases for bonding - but even with Peplink’s UI, it’s not not some plug-and-play thing to bond multiple internet sources.

I didn’t watch the video, but I see two starlinks. The WAN types matter a lot on how well bonding will work. So for discussion here let’s go with the 2 starlinks, let’s also keep it simple and assume starlinks have a public IP and in bypass mode too. And trying to keep within the “@anav protocol family”, we’re at using two EoIP tunnels, one for each WAN, and then using a /interface/bonding interface.

To do this, you need a CHR in the cloud with good internet, and some Mikrotik router with the two starlink getting two public IPs on RouterOS. And same apply in most cases where you have public IPs everywhere. If you don’t have a public IP, then you do need to use EoIP with IPSec/IKE2 for sure (perhaps EoIP+WG work too), with CHR having a public IP, and being a “responder” for either IPSec/WG – basically RouterOS does need to “steer” each EoIP+GRE[+WG/IPSec] tunnel over one of the WANs to the “hub CHR”. The EoIP interfaces themselves should not need any /ip/address AFAIK, although perhaps helpful for pinging/etc.

So assuming the two EoIP tunnels are setup and working between “dual starlink” and “hub CHR”… You just add a new /interface/bonding interface, inside you add the two EoIP tunnels under “Bonding” on the new /interface/bonding interface, and set “balance-rr” as the mode. The later controls “how it bonds”, and Mikrotik that will split packets evenly between the two (or more) links in “balance-rr”. Then in /ip/address assign /interface/bonding an IP address within the same subnet (like 192.168.22.1/24 on “hub CHR” and 192.128.22.254/24 on “dual starlink”). You can do add a routing table that using the 192.168.22.1/24 as the 0.0.0.0/0 gateway & /routing/rule to set what IPs to use the bond. And the “hub CHR” will need to treat the bonding interface as “LAN”, and “hub CHR” should do the NAT on the hub side for any local subnets coming from the remote “dual starlink” side. Obviously you do NOT want to do any NAT’ing before the bonding, nor use the EoIP links directly – traffic get routed via the “hub CHR” bonding IP address — you want /ip/firewall/connection working on the “hub” end with the stable/unchanging static IP there!

Since nothing about this is automatic, these are just starting suggestions & a generalized approach. But that will basically work…

And EoIP+IPSec over a bonding interface is at it’s core what Peplink’s “Speedfusion” does … except it uses MORE proprietary bits to do more sophisticated things to deal with uneven balancing, duplicating traffic over multiple links, etc. With Mikrotik’s bonding interface, have a more limited set. Mainly “balance-rr” which does a “round-robin” between bonded interfaces, so with two always 50/50%. This has a side-effect… the max bonding speed is twice the lowest speed connection. So if you have 1G link and 100Mb link, “balance-rr” will get you a 200Mb link, leaving 800Mb unused.

But there a lot of devils is in the details. For example, you likely want a /queue/tree on the WAN to enforce the SAME max-limit on both WANs to deal with non-bonding traffic, and help balance-rr even-ness. Also with increased latency of the bonding, using “fq_codel” would almost certainly be a good idea too. But… setting up queue get tougher with thing like starlink has variable speed… so the queue have to be size to the LOWEST average/minimun speed to help enforce the 50%/50% split. The bonded link should be able to use ARP detection for basic “dead link” detection, but still be advised to still do same check on WAN as you would for failover (i.e. /tool/netwatch, check-gateway=, recursive routes).

I’ll note Peplink is actually “less sophisticated” in the WAN monitoring options, /tool/netwatch is way richer. And their “WAN smoothing” still requires a lot of tweaking, despite the marketing. Now it’s some more drop-down reasonable UI for doing that, but still does not solve all problems, especially if WAN is shitty/variable/etc. But Peplink does more out of the box than Mikrotik for multiwan. Now Peplink’s “tooltips” are actually quite helpful at navigating these things – and that part Winbox4 should pickup… since it’s like have the description from help.mikrotik.com right in the UI next to the control. So if you want to know what some option means, you click the (?) button. For example, having the description of “Bonding Modes” from https://help.mikrotik.com/docs/spaces/ROS/pages/8323193/Bonding#Bonding-Bondingmodes would be awful helpful in this above process…

And, I only cover using /inteface/bonding with EoIP+IPsec - which I think closest poor-man’s version of Speedfusion. Other can chime in… For example, other approaches for the “multiple starlink bonding” case.

/interface/bonding just needs an “ethernet-link” interface, so VXLAN could replace EoIP in above. (and WG will NOT, since it’s a L3 protocol & those do NOT work with /interface/bonding)
“Classic” bonding using MLPPP is only supported on PPPoE, not L2TP or even PPTP, which be more useful (at least AFAIK). And even with two PPPoE internet, MLPPP isn’t going to help if they are from separate providers.
MPLS might be option here too…I don’t know MPLS+VPLS+TE well enough generally. or on Mikrotik, although I’d think possible…but even still not sure if it just add more overhead here, given a starting MTU 1500 and added MPLS complexity go down with only two ends.
BGP get you BFD… but similarly I’m not sure the simple way to use it for this case (and you likely run into some Mikrotik BGP limitation someplace)… but there are some creative BGP folk here.

Anyway my two sense here.

anav · October 27, 2024, 11:09pm

I am in your camp, this is nothing more than automating some tunnels (I would use eoip and wireguard myself) over and using OSPF BDF functionality to ensure smoothest transition between WANS links to a common CHR cloud access to the internet. The additional bit is that there concern is not transparent primary/secondary failover its using both WAN connections of the HOME router at the same time… more like load balancing over eoip/wireguard with two connections to the net to access a CHR and using the CHR to reach the www. If that makes any sense?

Amm0 · October 28, 2024, 12:11am

Load balancing is more effective using all available bandwidth and easier/less complex & straightforward on RouterOS — that’s why I pitch it … But your right failover is going to be noticeable since it’s connection-based. And “hitless failure” and magic bonding is what Peplink pitches. I get the latter pitch sounds better, but lot of complexity over potentially telling folks to “refresh your browser” when WAN failure are hopefully very rare…

So if need it’s more “hitless failover”, /interface/bonding is still what you need. And since, on Mikrotik, it needs an ethernet-link interface… EoIP is the way to meet the bonding interface’s requirement. It’s entirely possible WG might be better, than IPSec in this case, IDK… Since you’re the expert on WG, that seems like a fine approach to combine with EoIP and bonding. On older routers, IPSec is likely to be offload (where WG might not be), but on anything modern/powerful, I’m not sure IPSec vs WG matters too much here.

Also…If the traffic is truly internet traffic, you technically only need 2 x EoIP without any “VPN”, but still /interface/bonding interface. Now without any VPN, GRE does need both sides having a public IP. So using WG does add BOTH dealing with only one side (i.e. CHR) having a public IP, as well is adding another layer of encryption too. But if bond is ONLY internet bound traffic, and would have already been “exposed” directly leaving the a “local” WAN to internet… So adding encryption on the one hop to CHR, may not offer anything, but overhead… Peplink support running its tunnels without encryption to save on the overhead.

One more detail, if “super reliable” is need. You can just duplicate all packets over BOTH WANs on Mikroitk using a different mode in /interface/bonding. i.e. “broadcast”, which be even more “hitless failover”.

It when you have an office full of people, generally reliable internet (that does not often fail)… you just don’t get a lot of out. But if your use case is you have some critical traffic where if the connection drops, it a real problem… then yeah bonding is useful. But you can see from the length of these posts - that just gloss over details - it’s not as simple…

Last tip… Just don’t ignore /queue/tree with /interface/bonding. Things like Peplink to do some QoS/queuing internally, but there are using same Linux queue things as RouterOS can (just with queue parameters pre-determined by Peplink UI setting). But… typically there not very complex, but do prevent single WAN from saturation, which is even important to do with bonding. But you do not have create queues for everything/all traffic a la @pcunite’s older QoS guide (although you could) — just setting up each WAN’s max-limit= to the actual minimum speed of the line really help here to avoid bufferbloat/other related issues. /interface/bonding will only make things like bufferboat worse… so just an upload traffic would help with any use of EoIP+bonding. I’d get it working first, but “playing” with some queue’ing things do similar “WAN smoothing” as Peplink.