In my case, we are not presently using BGP or IPv6, we are however using OSPF over a number of wireless links, which may cause routes to be changing relatively frequently. We plan to start using BGP soon, and IPv6 may come farther down the road, but for right now, it’s v4 only and OSPF.
We’ve now confirmed this in RouterOS 6.6, as we upgraded one of the routers from 6.4. I’ve also manged to get a supout file and will be emailing it to support.
I am experiencing this in 6.7 on one particular router running as an x86 VM. I was able to see the route cache completely full and networking shut down. Since it was a VM, I was able to reboot it and resolve the issue. It does happen regularly with this router. Other than a scheduled reboot, are there other workarounds?
I read in this thread (http://forum.mikrotik.com/t/5-x-routing-cache-bug-dropped-packets-lost-network/46491/1) that some were running dynamic protocols, and some were not. I noticed on the router that I am experiencing the issue is not running any dynamic firewalls but there are frequent PPTP connections. The PPTP server does create quite a few dynamic interfaces and routes on the router and I’m wondering if maybe that is the issue.
Is anyone else using dynamic protocols or inbound VPN connections when they experience the problem?
Yes, the issue is still very evident. We’ve got most of our network running 6.7 and it does still occur. We believe we’ve narrowed down the issue in our case to IPIP tunnels. We have a large number of devices that don’t run IPIP tunnels, but do run dynamic protocols like OSPF and have not seen this issue. However, ALL of the devices running IPIP tunnels have run-away route caches and will eventually stop working properly unless rebooted.
For now we have implemented a reboot script that prevents the device from hanging altogether, but is still VERY annoying in the bit about our routers rebooting every so often. I will see about upgrading one of our IPIP enabled routers with 6.9 to see if it remains an issue.
We’ve also added monitoring of the route cache to our Cacti server, which does help to keep an eye on these issues. I wasn’t able to find an OID for it, so I implemented it via a script that logs in with a read only user and parses the output for Cacti.
It looks like this issue is still evident in 6.12. I had hoped with the changelog noting:
*) l2tp - fixed “no buffer space available” problem;
That this issue might be resolved for tunnels in general, but it appears the issue is still around on devices using IPIP / IPsec tunnels.
Mikrotik: PLEASE PLEASE PLEASE PLEASE PLEASE FIX THIS. Or at least give us some form of communication that you’re working on the issue!! I’ve emailed support twice about this issue and beyond the automatic response have gotten NO communication back regarding this.
“The more things change, the more they stay the same.” Welcome to 2017.
After 4 years in production, we have one RB2011 that suddenly developed this same problem shortly after upgrading from 6.37.5 to 6.39.1. (Upgrading to 6.39.2 didn’t help.)
MT looked at the supout and pointed out that the route cache was full–which I hadn’t even heard of before this–but their explanations (hacking or torrents?) don’t fit the situation. This router manages traffic for a hotel, and is at least three private-IP hops away from an edge router with public IP. It isn’t using BGP, IPv6, or PPP; just OSPF, bridges, VLANs, and some queues to prioritize SIP–just like we have in every building on our network. And in this case, there is only one route out of the building, so routes shouldn’t be changing; everything is either on a local interface, or goes through a single interface on a rooftop backhaul router.
Taking their advice, I set / ip settings route-cache=no. cache-size would sit in the single digits for hours, then suddenly start climbing into the thousands, and over the course of ~15 minutes fill completely (in this case, 16K). With / ip settings route-cache = no!
I downgraded everything in the building to 6.37.5, and replaced the 2011 with an 1100XH2 (cache size 512K). 16 hours later, I’m seeing cache sizes in the 2K-3K range (with route-cache=yes). It is set up to start sending me emails if it exceeds 10K. If this proves stable, I’ll upgrade to 6.39.2 and post an update.
Fingers crossed that it keeps working (so we have a stable fallback position), but I’m never thrilled when we don’t have an explanation for a problem.
Recently I experience this issue on the ccr 1009 7G and the ccr 1009 8G. I had a number of tunnels flapping due to service providers instabilities. I am running OSPF so that routes come back in and out automatically. The cache grows quickly when this happens and the router stops responding on all ports. Disabling IP route cache resolves the issue. Tried different router os versions. no change. Currently running on the latest Bugfix version. With tunnels enabled and cache enabled the router cache sits at around 90. if the tunnels drop in and out it quickly grows. I disabled the cache again once I saw it go above 1000 in less than 30 seconds. It would be great if a solution could be found for this.
I now have this problem on a RouterBoard that provides backhaul to a motel via 5GHz. It’s running OSPF just like the rest of our equipment, including devices on other buildings doing the same thing with the same configuration.
With route-cache enabled, the cache grows consistently by about 1,000 per day. So it can run for ~2 weeks before it has to be rebooted, to clear the route cache.
With route-cache disabled, the cache grows consistently by about 1 per second. 4.75 hours later, like clockwork, it’s full and has to be rebooted. Ran it like this for days at a time, and saw no more than 15 minutes variation in uptime. :-/
Something is wrong there. But without being able to view the cache contents for clues, and without any way to purge it without rebooting, what to do? I leave route-cache enabled, and schedule a script to run daily at 3am: Check the cache size, and if it’s over 14000/16384, reboot. It’s a band-aid, but at least it’s an automatic band-aid.
My routers with OSPF max at around 154 - 200 routes cached with ip route cache disabled. How many devices do you have on either side sharing ospf routes.
We have a couple hundred MT devices with OSPF enabled–a couple hundred examples working perfectly well. With the exception of our edge routers, all our MTs have <1K routes cached (some <20) with route-cache enabled. And this one isn’t even an especially busy one. The motel is at the end of its particular branch, so the link isn’t carrying traffic for any other location: just for the motel itself. Couple Mbps each way, nothing much. (It’s directly on the beach, so most guests aren’t there for the Wifi.)
A dozen messages have passed between me and MT about this issue. They’re convinced there must be something about the traffic at that location that is causing the route cache to continually grow. I suspect there’s a bug behind some combination of settings that must be unique to this board, because the route cache fills 75x faster when route-cache is disabled. (Not 75% faster; 75 times faster.)
Hi guys!
I have the same issue on my CCR1009-7G-1C-1S+ ROS 6.40.1
Support case open: #2017071322000426 and answer from support: “Waiting for problem re-appears”
My router operating properly all time till:
With / ip settings route-cache =yes it’s increase ~12 000 per day!
While problem occurs, router has total disconnecting on all interfaces. Interfaces status has connecting, but when I’ve try sent ping from CLI, the status was: 105 (No buffer…
I don’t use OSPF, IPV6, BGP, only static routing and OVPN.