Slow OSPF SPF calculation of Mikrotik

paolobyte · Thu Sep 01, 2022 5:26 am

Hi, I've seen an issue in our network where we use Mikrotik a lot. In the diagram below, the border routers are Juniper devices while the PE's are Mikrotiks. They are all part of area 0 OSPF.
- Juniper eBGP peers with upstream to receive default route
- Mikrotiks iBGP peers with Junipers to receive the default route
- Juniper and Mikrotiks are on OSPF area 0 to receive each other's loopbacks
- Each OSPF links are using network type "point to point" with BFD on it (5 x 100ms interval)
- PE1 and PE2 are CCR2004 while the rest are RB4011. All of them are using 6.49.3
- Total OSPF link in the network are 60 point to point. PEs have backup links,.

The issue is observed on PE3. PE3 has downstream sites in real life but for the purpose of demonstration, I did not include them in the diagram. PE3 chooses PE1 path to reach the border routers. The issue is observed when:
- PE1 dies or rebooted
- PE1 to PE3 link goes offline

What's happening is that the IBGP session on PE3 goes inactive because the border loopbacks are unreachable. The border loopbacks are not seen on PE3 routing table anymore BUT there is an LSA for it. PE4 and PE5 is not affected because their path goes via PE2. It takes a while for PE3 to install the routes of border loopbacks, sometimes 5 seconds, sometimes 20seconds and more. This causes major disruption in the network.

I used to work with a different company before using the same design but using Cisco as vendor and never had this problem.

So is this normal? I am assuming that the delay in recovery is caused by SPF calculation of Mikrotik where in it takes time for the device itself to calculate that the best path to border1 and 2, is via PE4 -> PE5 -> PE2. Note that PE3 and downstream sites of it recovers at the same time. I am running a ping test to all of them and they all come back at exactly the same time.

nichky · Thu Sep 01, 2022 7:07 am

when u saying Slow, how long does it take approximately?
Try disable BFD, and also can u provide tool/traceroute address=from A to B

paolobyte · Thu Sep 01, 2022 9:27 am

when u saying Slow, how long does it take approximately?
Try disable BFD, and also can u provide tool/traceroute address=from A to B

As mentioned above, it takes time.... very random. The worst was 20-30seconds. I did not timed it properly as I was capturing stats/info during the downtime window. Disabling BFD doesn't seem right, because that will result to complete loss of forwarding too and could prolong the downtime. Traceroute from PE3 to border shows nothing because there's no route for it. But from border to PE3, it does work because border has route to PE3 via PE2. However, last hop is missing because PE3 can't respond to the trace.

connectlife · Fri Sep 02, 2022 11:30 am

Hi, we also have a similar network, two CCR1072 routers doing eBGP with UPSTREAMs (R1 / R2)

Two PE (R3 / R4) CCR1072 doing iBGP with each other and with R1 / R2. The PEs have 2 uplinks on 2 different switches. If I restart a switch and then lose the OSPF path of that link, the PEs also take 30 seconds to redirect the path to the active OSPF link. BFD is enabled. RouterOS v 6.49.6. The slowdown in my opinion is due to the Routing engine that does not insert the route on the table. RouterOS 6 in fact uses a single CPU (always 100% in our case) for BGP

paolobyte · Sat Sep 03, 2022 1:12 pm

Hi, we also have a similar network, two CCR1072 routers doing eBGP with UPSTREAMs (R1 / R2)

Two PE (R3 / R4) CCR1072 doing iBGP with each other and with R1 / R2. The PEs have 2 uplinks on 2 different switches. If I restart a switch and then lose the OSPF path of that link, the PEs also take 30 seconds to redirect the path to the active OSPF link. BFD is enabled. RouterOS v 6.49.6. The slowdown in my opinion is due to the Routing engine that does not insert the route on the table. RouterOS 6 in fact uses a single CPU (always 100% in our case) for BGP

I timed the downtime I am experiencing. It is always close to 40 seconds. Really odd because that's the default dead timers? So is there a connection? Did you have any workaround?

connectlife · Sat Sep 03, 2022 2:49 pm

yes we also have these times. I have to try with v7 even though it doesn't have bfd. what version are you using?

connectlife · Sat Sep 03, 2022 2:49 pm

yes we also have these times. I have to try with v7 even though it doesn't have bfd. what version are you using?

paolobyte · Sat Sep 03, 2022 3:10 pm

yes we also have these times. I have to try with v7 even though it doesn't have bfd. what version are you using?

6.49.3
Can't live without BFD so ROS7 is not an option for us.
Lowering down the OSPF timers is causing some issues to us. Sometimes OSPF randomly flaps.

paolobyte · Mon Sep 19, 2022 3:46 pm

any help out there?

Slow OSPF SPF calculation of Mikrotik

Slow OSPF SPF calculation of Mikrotik

Re: Slow OSPF SPF calculation of Mikrotik

Re: Slow OSPF SPF calculation of Mikrotik

Re: Slow OSPF SPF calculation of Mikrotik

Re: Slow OSPF SPF calculation of Mikrotik

Re: Slow OSPF SPF calculation of Mikrotik

Re: Slow OSPF SPF calculation of Mikrotik

Re: Slow OSPF SPF calculation of Mikrotik

Re: Slow OSPF SPF calculation of Mikrotik

Who is online