I have an issue where OSPF randomly started bouncing on a fiber link.
It has been up and running for over 2 years when all of a sudden we started getting Hello timer mismatches.
I triple checked the settings and logs and nothing had changed.
To fix this problem I added encryption to the link turned the hello packet to 10s, rebooted and upgraded all routers to 6.34.1. It seemed to elevate the problem.
Until this morning when I received “Database description packet has different master status flag OSPF issue” and OSPF dropped and routed traffic stopped. But pings across the circuit have not dropped.
I have started seeing this since an upgrade to 6.35.2… on a wireless link where we have had some link stability issues.
Was about to start rolling out some upgrades after a couple of watchdog reboots over the past few days, but don’t want to end up with broken OSPF across a bunch of remote sites.
Ubiquity AirFiber A5X… and interestingly it is only on those links where we have seen the database errors.
Previously we have used broadcast network types and dynamic discovery of interfaces and neighbours without issue across Ubiquiti and other licensed link devices without issue.
These are the only ones reporting the database errors.
One of the routers is a RB951 the other one is a CRS can’t remember the model atm. No firewall rules whatsoever are being used, OSPF configuration is fine, this is a small network with less than 8 routers all of them working fine. Just this pair link isn’t working. Both devices are using the latest bug fix image.
Both routers involved have connections to other routers and no issues either with the adjacency, tested MTU, it’s as intended passing 1500 bytes packets, so no problems there whatsoever either.
In any case I tend to blame a link fault but so far we haven’t been able to find an issue with it. The link is idle when we do the test, same issue doesn’t matter if hellos are multicast o unicast.
Check the MTU on both ends of the link - all the parameters must match exactly or an adjacency won’t form. You could try debug logging for OSPF to see why it’s getting mad.
It would keep the adjacency on ex-start but on this case it forms (goes to full) and flaps randomly after x amount of time. May be one minute may be be 3 minutes but at the end it flaps, in any case the ospf configuration is right no issues there, no issues with the MTU capabilities of the transport. I’ve yet to try with some other devices to validate whether it’s the link (in my mind this is the only logical explanation other than a weird bug) , OP had the same issue with a different scenario though same issue.
If it’s flapping after establishing, I’ve seen similar behavior over UBNT P2P links (AirMax, not AirFiber), and it pretty much appeared to be related to packet loss due to either congestion spikes, or modulation changes, etc. (and this was using Cisco-to-Cisco OSPF, not Mikrotik)
Have you tried setting DSCP to EF on OSPF packets so that the UBNT gear knows that these are high-priority packets?
No, link failures produce other messages - “OSPFv2 neighbor x.x.x.x: state change from (Init|Full) to Down” should read as “dead timer expired” (no OSPF incoming packets in 40 seconds).
“packet has different master status flag” - issue with election master/slave.
I just had this exact same issue today with a client network on bugfix 6.34.6 and I resolved it by setting ospf interface priority back to the default of 1. It had been set to different integer values on several routers to force a DR Master.
Have you set an interface priority on any of the routers?
First off, thanks everyone for sharing your insights, I’ll answer some of your questions but the issue seems to have been solved (adjancency’s been up the past 3 days), it was as I suspected an issue with the transit network, the provider didn’t exactly tell me what they did, but they called me telling they discovered a fault and fixed it (didn’t touch any of our routers), everything’s been fine so far.
Haven’t changed the priority, will take this into account
Tried Broadcast, NBMA and PTP as well, same behavior
On this particular case, no wireless is involved, the transit provider reach us via FO then their CPEs provide us with copper interface, OP is the one that had the same issues with a wireless link I wrote here because I was experiencing the same problems
Router configuration is fine, no issues there, it’s pretty much a basic configuration on this particular point, the adjyacency forms both randomly happens this, I do think that it should show a different message just like you, I checked the system a few days ago and noticed that randomly I could also see LSA retransmisions (around 3 - 5, not always like 1 each hour) this is why I was blaming the link.
I’ve experienced this same behaviour for many years and the common denominator has always been Mikrotik. I used to think this was a Ubiquiti problem but i have experienced this using the following brands of wireless equipment
i’ve also experiment with different MTU’s, different priorities for NBMA Neighbors, tried point to point, broadcast, nbma etc
I’ve hunted through the forums for a long time without conclusive evidence what causes this. Strangely it seems some people are affected and others not.