We have a network setup with two RB750UP, connected via bonding on two redundant links (bonding sets for eth1 and eth2 on both routers). Links between routers are created with WiFi modules. Links are monitored with ARP pings. We’ve tried a different bonding modes (LACP, active-standby, RR, XOR) with similar results. For details on network config - please check picture attached.
Problem is as following: when WiFi module for link on eth2 failed on one side (WiFi module is powered off, when tests are made), we have lost packets (~5%). Packets lost timestamps corresponds the times, when remaining alive WiFi module for failed link, sends some packets towards router (its CDP packets or autodiscovery packets). So, it seems that receiving ANY packet on failed links - router decides that link is recovered and send several packets via this link, until next failed ARP ping.
Such problems doesn’t appear when we broke the link on eth1.
network.gif
Is this all one big flat network segment, or do you have different IP ranges at ether3 on each end of this link?
(I highly recommend that it is a routed connection and not a completely bridged connection.)
If layer3, try using OSPF+BFD to load balance the links. It will naturally detect and remove the failed link / add it back when it’s up.
It’s actually one flat segment with bridges inside RB.
As far as bonding doesn’t work as expected - now we do concern of usage layer3 for balancing and failover.
But bonding is simple alternative, which should work and our local reseller assure us before purchase that it’s working properly on RBs.
I wouldn’t blame the Mikrotik - this is just a case of having a screwdriver and needing a socket wrench…
(meaning the ethernet bonding)
Ethernet bonding is designed to work with two directly-connected devices over links that have consistent latency and bandwidth, and are fairly reliable links.
This application is the polar opposite.
When bonding two bridged wireless links, you have multiple ethernet devices involved so determining link failure is a workaround at best, broken at worst. There are wireless bridge devices which supply link fault propagation, but most Ubiquiti devices I’ve seen do not have such a feature. The very nature of wireless is unpredictable, (compared to fiber or in-house wiring) so bandwidth can vary from second to second, is half-duplex, is prone to transmission errors, etc. If one link is suddenly much slower than the other for some reason, it could even cause out-of-order packet isues in worst case conditions (granted, the Internet never promised in-order delivery, but it’s rare nowadays, so many applications assume in-order packets).
There’s nothing wrong with bonding wireless links into a single aggregate “link” but if you want to do it at a low layer, then it is best to use equipment that is designed specifically for that task and knows how to take on the challenges of wireless and hide them from the upper layers.
In the case of two RB750 devices, since they aren’t participating in the wireless, it’s better to design a link that considers the two links as “potentially unreliable by nature.”
Just my $0.02’s worth.
If you want to bridge the two LANs between the sites, I would suggest using an EoIP tunnel between the two Mikrotik routers, but using IP/ospf between them to manage the link bonding. This will give you a bridge that is much more tolerant of topology changes and imbalanced link performance.