About 3 months ago we noticed on our core routers that routing was going down for a minute or two then coming back by its self, Happens about once a week where OSPF and BGP on this core router would drop and then come back. A real problem when its one of your core routers.
So we did the norm of taking RIF’s and sending them thru to MT at the start of July
Over two month’s later MT came back with:
Routing crashed on the router. Most likely it happens due to interface status change on which OSPF is running. We are trying to fix this problem. Do you have any more details on how to repeat the problem?
Which is nice to know, shame it took 2 months for a response. We decided to do a clean netinstall of the router a week later after MT said that might help. Problem still happens each week under different loading and route table size etc. Our last advice from MT was in the middle of September and since then 3 e-mails have gotten no response apart from “We’re at MUM”
So question is, Is anyone else here seeing routing issues in ROS 4.x like this? and MT can you please respond (Ticket 2010090766000034)
To amend what I said in that post, we’ve now seen issues in various 4.x versions as well. As far as we can tell, (echoing the response you posted) an OSPF state change occurs somewhere in the network, and this causes various other routers to crash their routing, but this is an educated guess. Furthermore, I suspect its rapid OSPF state changes in succession that trigger it. Perhaps if OSPF is still performing an SPF calculation and gets a new one to perform, or the underlying database changes while the SPF calculation is in progress.
We’re currently working around it by using the ping watchdog timer to watch an address that won’t be reachable if routing crashes.
We’ve noticed however that sometimes routing will recover on its own after a minute or two (especially in the later betas), and sometimes generating a supout will make it recover; but it doesn’t always have a consistent routing table after this happens. (observable by traceroutes going a different direction than IP route says they should).
As mentioned in my other post, we’re using OSPF and BGP, with both ipv4 and ipv6, and we were also starting to deploy MPLS, so its been rather difficult trying to track down the issue. We’ve backed out of MPLS for now, but we are still seeing the issue.
We’ve sent various supouts, and received similar responses. I’m hoping the supouts contain crashdumps or similar to help mikrotik’s programmers find the problem and fix it.
I just found in my email a response from Mikrotik (I should have finished going through my email before posting; I’ve been off for a couple days), that they’ve been able to reproduce what we’ve been seeing, and it should hopefully be fixed in 5.0rc2 (or possibly the version after that).
i had that problem too. One of our PoPs had an electrical power and the power inverter was not sending 110v to the power supply. Then we have a quick interface flapping over and over, and ospf crashed!
with 5.01 and r5.02 the problem is worse because the router reboots almost every day, and caused a lot of interface RX drops , i had to disable it. But the support people where very kind and is working on fix that problems with OSPF.
We’ve seen a slow down in the crash’s since moving from tested cat5e to tested cat6 but the fact remains that a link state change shouldnt cause a crash of core routing
An electrician is working to fix the power issue. It couldn’t be the Ethernet cable. I use cat6e direct burial. When I disable ospf totally and use routing I have no issues. I tried putting the ospf setting router dead interval to 120 seconds and that didnt make a difference. 4 hours ago there no packet loss and ospf went off then back on. (Down then full). Anything you want me to try.
Burried cat6? Was it tested and are the 2 ends of the link on the same mains supply? I’ve seen Ethernet drop and blow ports when it was strung between 2 different grounding circuits as it can induce a voltage on the ground line
The cable is 100 ft long. I replaced the full cable 2 times. I tested, no errors. One end is a 50 tower and the other end is in the building. both connected to earth ground. Well the building wiring I am just trusting that the electrician did the wiring ok. Right now I have ospf running on 2 mikrotiks. The routes are all static. Even when there is no errors on ethernet ospf goes to down then back to full. I am running netwatch pinging every 1 second to test the ethernet connectivity and I get no errors. What settings will make ospf not to goto the down state?