Community discussions

 
User avatar
sri2007
Member Candidate
Member Candidate
Topic Author
Posts: 178
Joined: Wed May 20, 2015 10:14 pm
Location: Quito

Jumbo Frames, L2MTU mismatch with RouterOS crashing

Tue Apr 23, 2019 8:35 pm

Hello everyone! Hope you can help us, and I want to check if I'm wrong or the RouterOS has a bug (currently using 6.42.12 long-term)

Our topology consists in like 350 routers deployed around the country, and it's working with OSPF (multitarea) + BGP; everything was going fine, except when we got the requirement to allow JumboFrames at our network, so we were moving everything to 10.000 (using CCR only); and it was fine, but... a few weeks ago most routers go down suddenly, we've checked everything and those routers are affected at layer2 (which means, no Mac-telnet, no RoMON, no ARP, no LLDP), and everything is solved when the router is rebooted (manually or via watchdog).

After a few weeks of testing some scenarios we discovered that all of the affected routers are configured with the default L2MTU (1580 to 1600) and those are connected to a new router with a L2MTU (10000) at the interface and the trigger was a LSA flooding (after a convergence in a path inside the area); the router went down and the watchdog rebooted it.

Do you know if a L2MTU mismatch can cause this issue???
MikroTik Soporte y Consultoría - Español / English +593 98 709 3502
https://www.safenet.ec/consultoria.html/ soporte@safenet.ec
 
User avatar
sri2007
Member Candidate
Member Candidate
Topic Author
Posts: 178
Joined: Wed May 20, 2015 10:14 pm
Location: Quito

Re: Jumbo Frames, L2MTU mismatch with RouterOS crashing

Mon May 13, 2019 6:17 pm

Hi everyone, or at least who read this before and had no idea about a solution; I think that we found the real issue, it was related to RoMON; we had that fixture enabled in our entire network for a long time (when we were using the default L2MTU) and it keeps running with the part of my network with a different L2MTU; we found that when some path goes down, the OSPF is not the only one who converges, RoMON does it too; and for any reason (Mikrotik is still analyzing the supout info) when a Mikrotik with a different L2MTU gets a RoMON update with a higher L2MTU it crash the interface and in some cases it produces a kernel failure with the message "out of memory condition was detected" and in other routers we only got the "kernel failure in previous boot".

We've discovered this in our network by accident, because we add a new router to test with RoMON enabled and the network went down (watchdog rebooted everything), and after analyzing that, we start to remove RoMON and every single time that we removed RoMON from a device in the network, all of the routers connected to that device were rebooted by watchdog immediately; same scenario occurs later with different Mikrotik devices.

At the end of that day, we started by disabling RoMON in all of the routers with 10.000 as L2MTU, it didn't generate a crash, and we could remove RoMON of the entire network without crashing routers.
MikroTik Soporte y Consultoría - Español / English +593 98 709 3502
https://www.safenet.ec/consultoria.html/ soporte@safenet.ec
 
User avatar
vecernik87
Long time Member
Long time Member
Posts: 619
Joined: Fri Nov 10, 2017 8:19 am

Re: Jumbo Frames, L2MTU mismatch with RouterOS crashing

Tue May 14, 2019 5:42 am

Thanks for sharing! This is actually very interesting to know.
I wouldn't expect it but I am also not very surprised since ROMON has unresolved issues when connection has less than 1500 MTU (typically L2 tunnels etc..)
 
User avatar
sri2007
Member Candidate
Member Candidate
Topic Author
Posts: 178
Joined: Wed May 20, 2015 10:14 pm
Location: Quito

Re: Jumbo Frames, L2MTU mismatch with RouterOS crashing

Tue May 14, 2019 6:33 am

No prob!! I think that this is the idea of the forum :)

But, yes.. I can't believe it yet, it's been like a week that we did that change and the network has stay stable, interesting point here, I do have some GRE tunnels between cities, with a lower MTU and RoMON enabled before and it still was working great, the big issue started when the path with a L2MTU of 10.000 started to flap.
MikroTik Soporte y Consultoría - Español / English +593 98 709 3502
https://www.safenet.ec/consultoria.html/ soporte@safenet.ec

Who is online

Users browsing this forum: No registered users and 8 guests