I’m just curious.
Why do you want to monitor the link/circuit using software based “in-band” solution?
The answer is the hardware based “out-band” link/circuit monitoring. This means, the “monitoring” is an underlay L1 hardware, top of it the “network” ( routers ), underlay is invisible to the upper layers, there are vendors who already make this kind of equipment (Carrier Ethernet), so BFD is not the answer.
BFD handles multiple things. Link drops obviously, but STP blocks are also caught by BFD as well as basically anything between the routers that might hard or soft fail.
It’s also radio/link agnostic.
What about Latency and Jitter ?
This is not possible to measure/solve with “in-band”.
Essentially irrelevant because we don’t have a routing protocol that can account for that . Latency and jitter issues should be reported to the operator. since a radio doesnt have enough info to decide if increased latency should take a link offline then there’s nothing to do.
One thing that has surprised me is that MikroTik did not develop an in-house protocol, or extensions to an existing routing protocol, to make its behavior dependent on actual wireless link characteristics like latency, jitter due to re-transmissions, packet loss (as observed by the actual link equipment) and that learns/knows the actual topology well enough to make smart decisions in a wireless network. Like when you have these links:
A ===== B ===== C
\===============/
it can see that two short links, A->B and B->C, may yield a better path from A to C than the longer direct path from A to C.
And it would then normally route traffic via B, unless B fails (or one of the links via B) and then it would route directly from A to C.
BGP cannot do that without extensive tweaking.
As Wireless is a core business of MikroTik, I would have expected something like this to be added years ago. But it never happened.
This is what OLSR is for. Mikrotik can make a fork of it and implement some sensors. We had this routing-protocol running on Ubiquiti Hardware with sensors for AirMax-Quality and AirMax-Capacity. OLSR made smart decisions how to route the traffic. Sometimes the routing was asymmetric, because upload was way better then download because of wireless link RF interference.
Ok, I did not know about that protocol… reading about it, it seems that “ad hoc” is an important part of it, and that (while valuable) is not critically important in our network.
It would be sufficient when every link is to be manually configured, even can be entered in a central database with geographical information (which we already have).
But of course when a manufacturer would design and implement a protocol, they would not want it to depend on that.
What I have been considering is to make a service running on some external device(s), or in a container now that this exists, which gathers link information using SNMP and/or API and then re-calculates the optimal routing and sends updates to the routers to tweak the operation of an existing routing protocol (BGP or OSPF) depending on measured parameters, with a certain repetition interval. So the actual routing is still done by BGP or OSPF, assisted by BFD for link failures, but the routes chosen when all links are up can be optimized.
For that, it would be valuable when there would be an API for “temporary configuration changes” where you could tweak e.g. routing filters or link metrics frequently, without storing these changes on flash.
As intriguing I find that WiFi routing topic, can we please stick to the topic of this thread?
There is not much to discuss w.r.t. the topic. v7 has no BFD, we all know that. It is claimed to be “a work in progress” for over 1.5 year now, but nothing is happening.
What we discuss here (and in other topics) apparently has no influence at all, it keeps getting delayed.
OSLR doesn’t solve any of this. Too many design assumptionms that all links are roughly equivalent in nature. OSLR would offer little to no benefit for a standard wisp design. You cannot build business class networks off this, you can build ‘good enough in a pinch’ type networks.
We have 2 functional protocols in OSPF and BGP that are as effective and they already support BFD. WISPs do not generally run the lossy networks that most of the mesh protocols are meant to handle, and also are typically built with different speed backhauls and various eras of links across the network and with a fixed design. Much of the ‘mesh’ toolkit is wasted and most of the smarts are counter productive. I would love to see a more advanced wireless network specific protocol but that’s beyond the scope of the request for BFD. Along those lines, I’d love to see batman-adv whith it’s link quality metrics added as it truly stears around half duplex links and low quality links without taking a link down. OSPF, BGP, and especially with BFD thrown in the mix, can take a network hard down when it has 2-3% packet loss due to weather or whatever while something like batman-adv would just take us down the least lossy path.
BFD is a requirement for building those business class networks. Ie, rapid failover from primary paths. It’s up to the operator to steet traffic and make decisions on which links are better than other or are preferred for a specific site. Again, OSPF and BGP offer the tools to do this, but both of them falld own when it comes to rapid link failure detection. And while both can be tuned to be super sensative, they both then fall down on reliability. BFD handles that link state much better and dramatically improves OSPF and BGP behavior.
what recently got to my head was the thing as soon as something, and be it the least significant, coming to the next releases of v7 regarding BFD a horde of users will rush for B(FD) and certainly things will not work everywhere and right away - surely not! imagine all the (sometimes weird) setups we all have seen with mikrotik devices.
imagine some people with CCRs in their cores suddenly see a changelog with BFD stuff - “boom, yeah bfd, finally, click, activate, halt” and the whiplash continues
i hope mikrotik is doing some serious testing for these 1.5+ years now and we get a real good (new?) solution/implementation on BFD in v7 which will work solid again.
lest we not forget - BFD is not a “just click this button to activate, set’n’forget” protocol! a lot is dependant atop of it and it has to run very consistently!
mikrotik hopefully will get to us soon with some good news I hope…
I requested a lot time ago an option for OSPF to increase costs by X instead of downing a link. Partly because I wanted to have a link detect issues but not actually fail. ie, there's a little packet loss so I want it to become a last-resort route.
Well, I should say that I require BFD in two different networks:
- one is at work, where we use GRE/IPsec tunnels between offices, run over fiber and DSL internet connections, with additional L2TP/IPsec over 4G as backup
- for the amateur radio hobby network (HAMNET/AMPRnet), where WiFi PtP links are combined with some internet tunnels to form a network
For the first case, it works perfectly with the v6 BFD implementation, and I am only requesting the same thing to come back in v7. Just a quick check of a link to be down, and have BGP switch to the alternative link quickly enough for the VoIP user only to experience a brief hickup, and the computer users not to notice it at all.
Link failures are seldom, but still it is important to recover from them without user irritation. The network is a star with some partial mesh (link between branch offices), and there is nothing to be done for “optimal path” selection, it is just “working path” selection (i.e. when the DSL re-trains, use the 4G).
For the second case, where the network is much larger and some links are not working that well, the BFD is used to remove links that do not work, but it would be nice (as mentioned above) when there would be a more gradual approach. That will probably not be really possible with BFD, and I certainly do not want to see a BFD implementation delayed over and over again to get it improved that it could do that. Release the d*mn thing so that we can have link failure detection again!
…or at least help them TEST that thing and provide feedback for that!
(OT: r u an hamnet admin from germany?)
No, from the Netherlands. You can see my username, right?
okay. no cannot see any signatures (also i cannot change mine as i noticed last week)
did not know about hamnet until today. great work. currently reading the documentation about it. cheers
Mikrotik has not made BFD or routing protocols in general a priority in v7; as far as I or anyone else can tell, the priority is nice-to-have features like containers, and otherwise as long as the routing stack is functional enough to run a basic lab network Mikrotik is satisfied. Why make excuses for Mikrotik and their failures with v7? BFD is not nearly as hard to implement as OSPF or BGP, both of which Mikrotik reimplemented in v7, and Mikrotik implemented BFD in v6. Clearly Mikrotik's developers are capable of doing the work, but someone decided that actually meeting their users' needs is not worth the time.
Well, I can understand that not all developers are equal. To be tasked with something like new storage options is a different level than to write a BGP routing engine.
But what I find most surprising is that there apparently is a developer who is capable of extending the “netwatch” tool (which was very limited in the past, but in v7 has been reworked to allow different “watch types”, including a ping that allows for some packet loss) and that this same developer could not write a BFD implementation, even if it was only a new watch type in Netwatch.
frankly, we don’t know who is directing their priorities. Could be a big customer they are trying to appease. Rather than shit on them for not having the same priorities as us, we just need to tell them how important and widesprad the need is.
IMO, a worst case scenario would be if someone made the disastrous decision to develop an in-house solution from scratch, which is a pretty tough challenge given the complexity. If that’s the case we’ll proably have to wait pretty long for a first alpha release and even longer for a stable one. I can think of several reasons why it might go south.
https://github.com/dyninc/OpenBFDD
https://github.com/FRRouting/frr