ip route cache BUG

ujin · October 22, 2013, 11:13am

The problem is following: The oldest entries are not flushed from cache.

During one day we got:
/ip route cache print
cache-size: 52442
max-cache-size: 65536

This bug is observed only on OS version 6.x.

OS version 5.x did not have such issue.

Please provide workaround HOW TO FLUSH CACHE.

ujin · October 22, 2013, 12:25pm

/ip route cache print
cache-size: 53528
max-cache-size: 65536

ujin · October 23, 2013, 7:04am

Please help, routers rebooted regularly!

pacas · November 8, 2013, 3:33pm

i use this script scheduled at 04:00 nightly:

:global datum [/system clock get date];
:global time [/system clock get time];

:local percentused ((100 * [/ip route cache get cache-size]) / [/ip route cache get max-cache-size])

:log info “RouteCacheUsed: $percentused %”;

:if ($percentused > 70) do={
/tool e-mail send server=10.10.10.10 to=“monitoring@email” subject=[/system identity get name] from=“router@email” body=(" on $datum at $time the route_cache on router got to “,[$percentused],”%" )
/system reboot ;
};

nigelvh · November 12, 2013, 6:17am

I’m seeing this as well on a couple of RB2011’s. Networking stops working, and if I get in through the console, I see:

[user@Router] > /system resource print        
                   uptime: 2w10h57m37s
                  version: 6.4
               build-time: Sep/12/2013 13:52:41
              free-memory: 95.3MiB
             total-memory: 128.0MiB
                      cpu: MIPS 74Kc V4.12
                cpu-count: 1
            cpu-frequency: 600MHz
                 cpu-load: 1%
           free-hdd-space: 110.6MiB
          total-hdd-space: 128.0MiB
  write-sect-since-reboot: 24631
         write-sect-total: 106765
               bad-blocks: 0%
        architecture-name: mipsbe
               board-name: RB2011UAS
                 platform: MikroTik
[user@Router] > ip route cache print   
      cache-size: 16383
  max-cache-size: 16384
[user@Router] > ping 8.8.8.8
HOST                                     SIZE TTL TIME  STATUS                                                           
8.8.8.8                                                 timeout                                                          
8.8.8.8                                                 timeout                                                          
8.8.8.8                                                 timeout                                                          
                                                        132 (No buffer space available)                                  
                                                        132 (No buffer space available)                                  
                                                        132 (No buffer space available)                                  
    sent=6 received=0 packet-loss=100% 

[user@Router] >

pacas · November 18, 2013, 1:41pm

guys, are you using bgp and ipv6 ?

nigelvh · November 18, 2013, 2:47pm

In my case, we are not presently using BGP or IPv6, we are however using OSPF over a number of wireless links, which may cause routes to be changing relatively frequently. We plan to start using BGP soon, and IPv6 may come farther down the road, but for right now, it’s v4 only and OSPF.

nigelvh · November 25, 2013, 5:47am

We’ve now confirmed this in RouterOS 6.6, as we upgraded one of the routers from 6.4. I’ve also manged to get a supout file and will be emailing it to support.

[user@Router] > /ip route cache print
      cache-size: 16383
  max-cache-size: 16384
  
[user@Router] > /system resource print
                   uptime: 1w4d14h11m49s
                  version: 6.6
               build-time: Nov/07/2013 13:04:08
              free-memory: 96.3MiB
             total-memory: 128.0MiB
                      cpu: MIPS 74Kc V4.12
                cpu-count: 1
            cpu-frequency: 600MHz
                 cpu-load: 0%
           free-hdd-space: 108.6MiB
          total-hdd-space: 128.0MiB
  write-sect-since-reboot: 54179
         write-sect-total: 168056
               bad-blocks: 0%
        architecture-name: mipsbe
               board-name: RB2011UAS
                 platform: MikroTik

iprob · February 3, 2014, 1:00pm

I am experiencing this in 6.7 on one particular router running as an x86 VM. I was able to see the route cache completely full and networking shut down. Since it was a VM, I was able to reboot it and resolve the issue. It does happen regularly with this router. Other than a scheduled reboot, are there other workarounds?

See this thread: http://forum.mikrotik.com/t/problem-with-mount-point/94/1

It suggests the issue is fixed in 6.5, but I don’t think it is. It is still evident in 6.7.

iprob · February 3, 2014, 2:07pm

I read in this thread (http://forum.mikrotik.com/t/5-x-routing-cache-bug-dropped-packets-lost-network/46491/1) that some were running dynamic protocols, and some were not. I noticed on the router that I am experiencing the issue is not running any dynamic firewalls but there are frequent PPTP connections. The PPTP server does create quite a few dynamic interfaces and routes on the router and I’m wondering if maybe that is the issue.

Is anyone else using dynamic protocols or inbound VPN connections when they experience the problem?

nigelvh · February 3, 2014, 3:21pm

Yes, the issue is still very evident. We’ve got most of our network running 6.7 and it does still occur. We believe we’ve narrowed down the issue in our case to IPIP tunnels. We have a large number of devices that don’t run IPIP tunnels, but do run dynamic protocols like OSPF and have not seen this issue. However, ALL of the devices running IPIP tunnels have run-away route caches and will eventually stop working properly unless rebooted.

For now we have implemented a reboot script that prevents the device from hanging altogether, but is still VERY annoying in the bit about our routers rebooting every so often. I will see about upgrading one of our IPIP enabled routers with 6.9 to see if it remains an issue.

We’ve also added monitoring of the route cache to our Cacti server, which does help to keep an eye on these issues. I wasn’t able to find an OID for it, so I implemented it via a script that logs in with a read only user and parses the output for Cacti.

timk · March 9, 2014, 11:53pm

I am hitting this bug too, running RouterOS 6.10.

I have three RB2011s with similar configs, however the problematic one is running an L2TP/IPSec VPN and the other two are not.

I obtained this info via the serial console, all other networking is unavailable:

uptime: 2d20h16m
version: 6.10

cache-size: 16384
max-cache-size: 16384

Other two routers which appear stable:

uptime: 2w4d17h15m15s
version: 6.10

cache-size: 716
max-cache-size: 16384

uptime: 2w3d15h19m13s
version: 6.10

cache-size: 53
max-cache-size: 16384

A reboot will fix the problem for about a day.

Cheers

iprob · April 3, 2014, 10:36pm

This problem appears to be much worse in 6.11. See the thread here: http://forum.mikrotik.com/t/rb2011uas-2hnd-stops-responding-spontaneously/75301/18

All indications are that it is a bug in the PPP code from MikroTik. My problem routers are the ones with L2TP set up.

nigelvh · May 5, 2014, 5:56pm

It looks like this issue is still evident in 6.12. I had hoped with the changelog noting:

*) l2tp - fixed “no buffer space available” problem;

That this issue might be resolved for tunnels in general, but it appears the issue is still around on devices using IPIP / IPsec tunnels.

Mikrotik: PLEASE PLEASE PLEASE PLEASE PLEASE FIX THIS. Or at least give us some form of communication that you’re working on the issue!! I’ve emailed support twice about this issue and beyond the automatic response have gotten NO communication back regarding this.

Hotz1 · June 12, 2017, 10:55am

“The more things change, the more they stay the same.” Welcome to 2017.

After 4 years in production, we have one RB2011 that suddenly developed this same problem shortly after upgrading from 6.37.5 to 6.39.1. (Upgrading to 6.39.2 didn’t help.)

MT looked at the supout and pointed out that the route cache was full–which I hadn’t even heard of before this–but their explanations (hacking or torrents?) don’t fit the situation. This router manages traffic for a hotel, and is at least three private-IP hops away from an edge router with public IP. It isn’t using BGP, IPv6, or PPP; just OSPF, bridges, VLANs, and some queues to prioritize SIP–just like we have in every building on our network. And in this case, there is only one route out of the building, so routes shouldn’t be changing; everything is either on a local interface, or goes through a single interface on a rooftop backhaul router.

Taking their advice, I set / ip settings route-cache=no. cache-size would sit in the single digits for hours, then suddenly start climbing into the thousands, and over the course of ~15 minutes fill completely (in this case, 16K). With / ip settings route-cache = no!

I downgraded everything in the building to 6.37.5, and replaced the 2011 with an 1100XH2 (cache size 512K). 16 hours later, I’m seeing cache sizes in the 2K-3K range (with route-cache=yes). It is set up to start sending me emails if it exceeds 10K. If this proves stable, I’ll upgrade to 6.39.2 and post an update.

Fingers crossed that it keeps working (so we have a stable fallback position), but I’m never thrilled when we don’t have an explanation for a problem.

dgnevans · August 13, 2017, 5:50pm

Recently I experience this issue on the ccr 1009 7G and the ccr 1009 8G. I had a number of tunnels flapping due to service providers instabilities. I am running OSPF so that routes come back in and out automatically. The cache grows quickly when this happens and the router stops responding on all ports. Disabling IP route cache resolves the issue. Tried different router os versions. no change. Currently running on the latest Bugfix version. With tunnels enabled and cache enabled the router cache sits at around 90. if the tunnels drop in and out it quickly grows. I disabled the cache again once I saw it go above 1000 in less than 30 seconds. It would be great if a solution could be found for this.

Hotz1 · August 13, 2017, 6:09pm

I now have this problem on a RouterBoard that provides backhaul to a motel via 5GHz. It’s running OSPF just like the rest of our equipment, including devices on other buildings doing the same thing with the same configuration.

With route-cache enabled, the cache grows consistently by about 1,000 per day. So it can run for ~2 weeks before it has to be rebooted, to clear the route cache.

With route-cache disabled, the cache grows consistently by about 1 per second. 4.75 hours later, like clockwork, it’s full and has to be rebooted. Ran it like this for days at a time, and saw no more than 15 minutes variation in uptime. :-/

Something is wrong there. But without being able to view the cache contents for clues, and without any way to purge it without rebooting, what to do? I leave route-cache enabled, and schedule a script to run daily at 3am: Check the cache size, and if it’s over 14000/16384, reboot. It’s a band-aid, but at least it’s an automatic band-aid.

dgnevans · August 13, 2017, 6:20pm

My routers with OSPF max at around 154 - 200 routes cached with ip route cache disabled. How many devices do you have on either side sharing ospf routes.

Hotz1 · August 13, 2017, 6:36pm

We have a couple hundred MT devices with OSPF enabled–a couple hundred examples working perfectly well. With the exception of our edge routers, all our MTs have <1K routes cached (some <20) with route-cache enabled. And this one isn’t even an especially busy one. The motel is at the end of its particular branch, so the link isn’t carrying traffic for any other location: just for the motel itself. Couple Mbps each way, nothing much. (It’s directly on the beach, so most guests aren’t there for the Wifi.)

A dozen messages have passed between me and MT about this issue. They’re convinced there must be something about the traffic at that location that is causing the route cache to continually grow. I suspect there’s a bug behind some combination of settings that must be unique to this board, because the route cache fills 75x faster when route-cache is disabled. (Not 75% faster; 75 times faster.)

bertj · August 17, 2017, 7:05am

Hi guys!
I have the same issue on my CCR1009-7G-1C-1S+ ROS 6.40.1
Support case open: #2017071322000426 and answer from support: “Waiting for problem re-appears”
My router operating properly all time till:

2 months and crash,
4 days and crash,
3 weeks and crash,
now 12 days without failures.

[admin@mikrotik] > /ip route cache pr
cache-size: 143684
max-cache-size: 262144

With / ip settings route-cache =yes it’s increase ~12 000 per day!

While problem occurs, router has total disconnecting on all interfaces. Interfaces status has connecting, but when I’ve try sent ping from CLI, the status was: 105 (No buffer…
I don’t use OSPF, IPV6, BGP, only static routing and OVPN.

Waiting for solution…