For months now we have been struggling with OSPF “flapping” where OSPF will simply lose adjacency with a neighbor and then come back up.
I opened an issue with Mikrotik about it back in November and was told to upgrade to RouterOS v6.x. We have thirty+ routers, but I upgraded where I could. The issue was not resolved.
Basically in a nutshell, we use OSPFv2 and run NBMA with MD5 authentication. We run across wireless links and in almost all cases that is UBNT RocketM5’s on v5.5.4 or v5.5.6. Trust me when I say that we have tweaked OSPF settings, used broadcast, PtP etc… Nothing would help. Same issues.
After months of troubleshooting I think I have finally pinpointed the root cause of the problem: EoIP Tunnels!
The common denominator at all the locations which are affected is the use of EoIP on the same links that run OSPF.
The common denominator at the sites that are not affected are links with just OSPF and no EoIP.
Yesterday I setup a new EoIP tunnel between two of the NON-AFFECTED sites and within 24 hours the flapping started. These two sites never had an OSPF issue until that EoIP tunnel was setup. Now I’m receiving alerts as frequent as all my other affected sites.
To Mikrotik Support
I opened Ticket#2013110566000895 on 11/6/2013 about this. I sent you very detailed configuration information for my three routers in our backbone area. As stated above, I was told to upgrade to RouterOS v6.x - which we did.
I truly believe that something else is happening here that needs attention.
What traffic levels do you typically see on the relevant interfaces? Can you share any more configuration details? It would be interesting to see if this can be replicated in a lab setup.
That’s a hard question to answer because of the diversity of sites we have and user levels.
First off, we use EoIP to keep an old legacy layer 2 network running as we attrition people off to UBNT and onto our routed L3 network. So basically at most of our sites we have legacy gear + new gear and a router. We are tearing down legacy backhauls which are all part of this old layer 2 network and using the EoIP to tunnel the layer 2 traffic where it needs to go. The layer 2 traffic is riding on new UBNT links we are putting up.
So in a single sentence, we have bits and pieces of this massive layer 2 network sitting behind our routed sites and we are using the new UBNT links to move the layer 2 traffic via EoIP.
As for traffic. It varies as I stated. When you read this, remember that I have many EOIP tunnels and many backhaul links running OSPF. During peak hours (6p - 2a) I can see EoIP tunnel traffic anywhere from 6mbps to 20mbps. This traffic then hops onto the link running OSPF which is moving all our routed L3 traffic from whichever site. My UBNT links can see on average up to 40-50mbps during peak hours. As we approach our CORE location, two things happen:
All this legacy layer 2 traffic rides a dedicated RocketM5 link to our CORE router.
The routed traffic rides a dedicated AirFiber link to our CORE router.
Now, two more things to consider:
When the OSPF flaps it only affects our routed network customers (the majority).
The EoIP is flawless and never drops. EoIP only goes down if I lose a link or power at a site.
And to throw a curve ball into this whole mix, sometimes the OSPF flaps during off peak hours when hardly any traffic is moving across a link.
As for configuration. Here’s a basic example with some items masked:
You’re not alone. I’m doing some work for a firm on some WiFi stuff and an issue in the core is causing issues for the service dropping clients. I’ve found myself looking at OSPF issues over these MikroTik routers. The same issue is happening with a router losing adjacency daily. Until reading this I hadn’t considered the EoIP thing. I’ve looked and EoIP is also terminating on the OSPF interface of this particular device. I need to find out if this is the only device with the issue. They’re not running any wireless in this network, or using MD5 or NBMA, so this is a core OSPF issue somewhere.
When you say OSPF and no EoIP, do you mean no EoIP at all on the router? Or just no EoIP tunnels terminating onto the OSPF physical interface (or do you mean IP address if there are VLANs?) ?
I’ve looked through the configuration and we only appear to have OSPF enabled on the internal/core interface (which can include sub-interfaces). The majority of these have public IPs on the Internet where some tunnels terminate on other devices, but obviously OSPF isn’t enabled on that physical interface. They don’t have loopbacks set up so I’m going to recommend that is implemented as well for obvious reasons too.
When I say OSPF and no EoIP I basically mean a wireless link that does not have an EoIP tunnel configured between the two routers the link spans.
That doesn’t mean that either router doesn’t have EoIP configured on another link.
We do not use VLANs.
The issue is prevalent across any links stemming from router(s) that have EoIP configured. Routers that do not have any EoIP configured do not experience the issue.
I can almost certainly assure Mikrotik that there is an issue here related to EoIP and OSPF. About a week ago I enabled an EoIP tunnel between two routers that never had EoIP configured. Up until that point my redundant ring of links (3 links) never had an OSPF hiccup. Almost within hours of configuring a single EoIP link I began to receive OSPF flaps. We continue to get them day in and day out.
It is extremely frustrating that Mikrotik has not chimed in on this thread. We sink thousands of dollars into their equipment throughout our network.
If they are going to offer EoIP functionality in their software then they need to support it. I understand that it is not the most solid option but in my particular case I do not have many other choices.
+1. Same problem here. We were also told to move to RoS v6.x RCdunnowhat when it came out. Same problem still on v6.9. Eventually I got tired of trying to find an error and moved to RIP.
OSPF is very MTU sensitive. If the MTUs don’t match or the MTU you are sending is larger than the MTU of the transport segment, OSPF will flap.
Have you identified and calculated all of your L2 and L3 MTUs in the path to see if there is a mismatch? If you connect two test routers with the same config via an ethernet cable and build EOIP tunnels, does the issue persist?
I hear ya’ about RIP! Trust me, I’d use it exclusively if I could but I need the instantaneous fail over that OSPF provides.
IPAN -
In regards to MTU, that makes the most sense and we have already addressed that at a few locations but the problem persists. I’m assuming you are making reference to ensuring that switch ports, router ports and EOIP bridge interfaces are all consistent with the MTU settings (1500)?
I’m having a similar problem, is there anyone from Mikrotik looking into this issue? This seem to be a major bug, any advice from those that have solved it would be greatly appreciated.
Thanks a lot for that hint. I had a neighbor flapping every few minutes on a broadcast network because one router had BFD enabled on its link but others not.