OSPF Transit Fabric - Issues with OSPF/BGP Drops

Hello!

I have implemented the ospf transit fabric in my network following a post and recommendation from IP Architects. This was a great fix for having unequal links between towers, and to use OSPF with ECMP to load balance across the two or three backhauls in-between our towers and fiber uplinks. This has been working great for some time, but as outlined in this post http://forum.mikrotik.com/t/ospf-transit-fabric/136307/1 we have had some of these similar issues.

This seems to only effect our mikrotik routers with OSPF and BGP enabled (CCR 1036 and RB4011), and when we shut off a vlan with OSPF running ( for example a 60ghz radio goes off in rain, or we manually shut down the VLAN interface for doing work) it causes that router to shut its OSPF and BGP processes, looses all neighbors and has to reconnect them. During that time, i completely loose access to the router if it is remote.

We are currently running more than 63 vlans with OSPF in-between 13 routers. Our core route reflector is a dell s4048on which does not have a problem when OSPF goes up or down within the network. I have seen this issue happen with most of the mikrotik routers on the network, if i enable or disable a vlan that has OSPF on it, usually that device will drop all OSPF and BGP connections.

I am in the process of upping the L2 MTU to maximum on all switches and routers as suggested in the previous post, however i am curious if there has been any fixes for this in newer firmware? We are running long term releases on ours, 6.47.9 (CCR1036) , 6.48.6, 6.48.2, 6.48.3 (RB4011). I was also thinking of disabling connection tracking as well if that was a fix that would resolve this issue as we do not use NAT so i do not believe it would be an issue unless it is necessary for ECMP?

As discussed in the previous post I was also seeing the same logs of “discarding packet : locally sourced” as well as " Received update of self-originated LSA" Which made me think there was a loop somewhere for quite some time.

Also curious is anyone else is using this transit fabric with a larger network to see if there is any scalability issues i might want to consider.

@gius64 @IPANetEngineer

thank you

Hi
I was about to post something similar when I found your post.

We are using Network Fabric (or Transit Fabric) for some time now but we used it between two towers only and it worked great. (We have multiple sites that are not directly connected that have couple of WiFi links across larger area but we’d usually use only the fabric for a link between the internet connected MT to One of other towers. So it was tower-to-tower configuration on multiple not-connected sites). We started with rOS6.x but had to move to 7 (don’t ask me why) and, apart for seeing bunch of “routes” that were (I guess) hidden under rOS 6, everything was working fine…

Just recently, we added three pairs of link on top of this configuration and we’re seeing OSPF drops – but we’re still unable to figure out what’s going on. We placed all in one OSPF area (I wonder if it’s better to use different area for each par of towers) and we use MPLS/VPLS on top of it (so I’m not sure where it would take me if I hade multiple areas - would I be able to “route” everything correctly)…

Now, I’m afraid that adding more NF/TF links into the configuration will make the whole system instable (will losing one VLAN/WiFi link take down the whole OSPF network, and MPLS/VPLS, until the routes are recalculated?)?

I’ll try to investigate more and post my findings, but if anybody has similar issues and/or success in making NF/TF stable - please share :slight_smile:

Thanks

Interesting, suddenly seeing the same behaviour.

http://forum.mikrotik.com/t/ospf-link-change-causes-all-bgp-sessions-to-drop/173659/1