Hi, I've seen an issue in our network where we use Mikrotik a lot. In the diagram below, the border routers are Juniper devices while the PE's are Mikrotiks. They are all part of area 0 OSPF.
- Juniper eBGP peers with upstream to receive default route
- Mikrotiks iBGP peers with Junipers to receive the default route
- Juniper and Mikrotiks are on OSPF area 0 to receive each other's loopbacks
- Each OSPF links are using network type "point to point" with BFD on it (5 x 100ms interval)
- PE1 and PE2 are CCR2004 while the rest are RB4011. All of them are using 6.49.3
- Total OSPF link in the network are 60 point to point. PEs have backup links,.
The issue is observed on PE3. PE3 has downstream sites in real life but for the purpose of demonstration, I did not include them in the diagram. PE3 chooses PE1 path to reach the border routers. The issue is observed when:
- PE1 dies or rebooted
- PE1 to PE3 link goes offline
What's happening is that the IBGP session on PE3 goes inactive because the border loopbacks are unreachable. The border loopbacks are not seen on PE3 routing table anymore BUT there is an LSA for it. PE4 and PE5 is not affected because their path goes via PE2. It takes a while for PE3 to install the routes of border loopbacks, sometimes 5 seconds, sometimes 20seconds and more. This causes major disruption in the network.
I used to work with a different company before using the same design but using Cisco as vendor and never had this problem.
So is this normal? I am assuming that the delay in recovery is caused by SPF calculation of Mikrotik where in it takes time for the device itself to calculate that the best path to border1 and 2, is via PE4 -> PE5 -> PE2. Note that PE3 and downstream sites of it recovers at the same time. I am running a ping test to all of them and they all come back at exactly the same time.