Page 1 of 1

RB2011UAS-2HnD stops responding spontaneously

Posted: Mon Mar 17, 2014 2:56 pm
by Fraction
Hi,

I have been suffering with a quite annoying problem with RB2011UAS-2HnD (ROS 6.10) last few weeks. This happened first time at same day than I upgraded from ROS6.9 to 6.10, so I'm not sure is it related to ROS-version or RB or what. I didn't change my configuration at that time (which has been working almost untouchable over year now).

Anyway, my problem is that my RB stops responding to any network-traffic (including ping to localhost address from device itself) occasionally. This happens maybe once per week and reboot resolves the issue to the next time.

When the device is jammed I can connect to it via serial and it seems to be working as expected, but ping to even its own localhost address gives only respond: "132 (No buffer space...".

Because the device is in unmanned location, I set up Watchdog timer to watch address 127.0.0.1 and it works as some kind of workaroud, it reboots the device and soon everything is working again.

Am I only one suffering this? I found some quite old threads with same symptoms but not actually any resolutions..

Re: RB2011UAS-2HnD stops responding spontaneously

Posted: Tue Mar 18, 2014 12:20 pm
by makuk66
I have a RB2011UiAS-RM that seems to lock up regularly, with 6.10 and an older version before that. Without the watchdog, it ends up not routing or allowing login and the touch-screen becomes unresponsive but the ethernet ports still flash. With the watchdog it duly resets. This seems to happen at least once a day, sometimes more often. Monitoring health shows ok temps and voltage, monitoring the OS shows free memory and a partly idle CPU, the logs show nothing particularly dubious. I'm currently experimenting with excluding functionality (I was using Traffic Flow and Web Proxy), to see if I can make it more stable. I'm not set-up for serial right this minute, but that's a good idea.

Any other suggestions appreciated.

Re: RB2011UAS-2HnD stops responding spontaneously

Posted: Wed Mar 19, 2014 7:59 pm
by JanezFord
Hello,

I have experienced the same issue as Fraction described. The unit was on remote location and was rebooted by a helper using LCD display and pin number. Another time I believe the same happened on one of our rb450g. All systems run latest firmware and v6.10.

Edit: The first unit was RB2011UAS-RM.

JF.

Re: RB2011UAS-2HnD stops responding spontaneously

Posted: Thu Mar 20, 2014 11:55 am
by Majklik
This problem with "No buffer space available" and stopped IPv4 communication is not ROS6.10 only related. I see it on some my routers (RB800, RB450G) for whole ROS6 line. If you look on "/ip route cache print" then you see that is full, so IPv4 stops communication (IPv6 works). In my configuration this problem is related to the SSTP server operations, if it is disabled then I have not this problem. The cache is full filled after few days. But after update to the ROS6.10 is the cache full after 12~24 hours. There is configured firewall that limits connections to the SSTP server only from allowed addresses. Only three SSTP clients connected to the affected routers and during whole routers uptime there are around 20~50 connections attemps made to the SSTP server (TCP/443 port).
I've other systems with similar configuration where I do not see this problem (RB1100AHx2, ROS6.5), on these systems is SSTP server unprotected with firewall and there is about 100 connected clients full time.
On affected systems I use this scheduled script to repair this state with reboot:
:local act [/ip route cache get cache-size]
:local max [/ip route cache get max-cache-size]

if (($max-$act)<=2048) do={
  /system reboot
}

Re: RB2011UAS-2HnD stops responding spontaneously

Posted: Fri Mar 21, 2014 3:23 am
by JanezFord
TNX for sharing info and your script with us Majklik. I don't use SSTP on any of those devices that suffered from this issue.

JF.

Re: RB2011UAS-2HnD stops responding spontaneously

Posted: Fri Mar 21, 2014 12:06 pm
by Majklik
This problem with route cache was there long time ago, with different ROS versions too and different confgirutations.
It is pity, that ROS do not allow show the contents of this cache and flush it. On linux this can be done with "ip route show cache" and "ip route flush cache".
This problem probably definitively will be solved when ROS switch to the linux kernel 3.6 because in this version was the route cache removed from the kernel with these arguments:
"The ipv4 routing cache is non-deterministic, performance wise, and is subject to reasonably easy to launch denial of service attacks."

Re: RB2011UAS-2HnD stops responding spontaneously

Posted: Sun Mar 23, 2014 7:10 pm
by uldis
make sure that you all upgrade the RouterOS to the v6.11 and as well upgrade the RouterBoard firmware to the latest available (at least v3.11 or newer) - that will guarantee that the gigigabit ethernets don't have temp hangs.

Re: RB2011UAS-2HnD stops responding spontaneously

Posted: Sun Mar 23, 2014 7:59 pm
by JanezFord
make sure that you all upgrade the RouterOS to the v6.11 and as well upgrade the RouterBoard firmware to the latest available (at least v3.11 or newer) - that will guarantee that the gigigabit ethernets don't have temp hangs.
Hello Uldis

I have already upgraded my devices to latest software/firmware. This "temp" hang lasted for about 12 hours before I could get someone to reboot my device (remote location)...Until this bug is confirmed fixed I prefer using watchdog (ping gateway or localhost) or Majklik's script just to be on the safe side.

JF.

Re: RB2011UAS-2HnD stops responding spontaneously

Posted: Tue Mar 25, 2014 6:46 am
by timk
I have also hit this problem on an RB2011 running L2TP/IPSec VPN server as described here:
http://forum.mikrotik.com/viewtopic.php?f=14&t=78107

I obtained this info via the serial console, all other networking is unavailable:
uptime: 2d20h16m
version: 6.10

cache-size: 16384
max-cache-size: 16384
Rolling back to 5.25 fixed the issue, I haven't had a chance to test 6.11 but MikroTik support said they could re-create my issue and would try have the fix included in it.

Cheers

Re: RB2011UAS-2HnD stops responding spontaneously

Posted: Tue Mar 25, 2014 9:07 am
by Majklik
Yes, I've still this situation witn ROS6.11 on the RB800 and RB450G. No more that one day of the uptime. Disabled SSTP eliminates this problem. But there are used many other services so this bug is combination with something another (VRRP on all interfaces, bonding, bridges, VLANs, GRE/IPsec, SIT tunnels, OSPFv2/v3, BGP).

Re: RB2011UAS-2HnD stops responding spontaneously

Posted: Tue Mar 25, 2014 3:48 pm
by NathanA
So, I have run into this bug recently, too. This post should probably be re-titled and moved to a different sub-forum, because this is not an RB2011 problem or even a RouterBoard problem. We are experiencing this bug with x86 RouterOS, too.

Ever since upgrading one of our x86 boxes to 6.11 from 5.26, the router has stopped responding once a day on account of this bug, and has needed to be rebooted. We are not running SSTP on this box, although we do run an L2TP server on it. I am not 100% convinced that L2TP is what is causing the Linux route cache to balloon in our case, though: usually after a reboot, the route cache size stays pretty low and doesn't change much, even if I rapidly make several L2TP connections/disconnections to it in the span of a few seconds.

I once was able to catch it in the act of failing, though, before it had completely done so: I logged into the box, peeked at the route cache, and it was increasing at the rate of about 20-30 per second, and sometimes faster (100+). In the span of a few minutes, it had reached 24K entries out of a maximum (on this box) of 32K. Nothing that I attempted to do managed to stop the rate of growth, and this is on a box that has a pretty simple configuration, has a very small routing table, and doesn't participate in any dynamic route exchange/forwarding protocols. So it was very strange to see this behavior. After a reboot, the route cache got up to a little > 100, and then stayed there. Part of me wonders if it is a bot mounting some kind of DoS/intentional route cache poisoning attack on vulnerable Linux boxes.

I don't know yet what causes the route cache to go into a tailspin, but since this router's utility is so small and isn't configured to do much, I'm hopeful that if I try to replicate in a lab environment, I will eventually be able to find the trigger. I'll file a report with MikroTik if I am able to do so. Hopefully they'll be able to do something about it, even if the problem ends up being in Linux itself rather than anything RouterOS-specific.

Very much looking forward to RouterOS 7, which will surely use a version of the Linux kernel >= 3.6...

-- Nathan

P.S. -- Not sure whether it is wise to mention this or not, but I did at least run across a workaround. I'm not sure what the possible negative effects and implications of this workaround might be, but if you can gain access to the 'devel' account on your specific router (...that's as much as I will say about that...), you can both manually flush the Linux IPv4 route cache as well as tweak the Linux route cache settings to auto-flush old entries at a much faster rate, which should prevent the cache from reaching max-cache-size once it starts going crazy.

This shell command will flush the cache:

echo 1 > /proc/sys/net/ipv4/route/flush

These two commands will dramatically increase the rate at which the route cache garbage collector expires entries, which should help it keep up when the growth rate decides to spontaneously explode:

echo 5 > /proc/sys/net/ipv4/route/gc_interval
echo 5 > /proc/sys/net/ipv4/route/gc_timeout

These changes are not permanent, and will revert to default settings (60 for gc_interval, 300 for gc_timeout) when you reboot the router.

EDIT: This workaround turns out not to always be effective; see my next post in this thread for details.

Re: RB2011UAS-2HnD stops responding spontaneously

Posted: Tue Mar 25, 2014 10:51 pm
by timk
Nice discovery Nathan!

Have you tried the '-C' option to the Linux route command within the devel login? It would be interesting to see what all the entries are!

Cheers

Re: RB2011UAS-2HnD stops responding spontaneously

Posted: Tue Mar 25, 2014 11:47 pm
by NathanA
Have you tried the '-C' option to the Linux route command within the devel login? It would be interesting to see what all the entries are!
Neither the 'ip' nor 'route' commands appear to be part of the busybox binary that MikroTik ships with RouterOS, which is why I am interacting with the 'proc' virtual filesystem directly. I have plans to build and try a more complete, statically-linked busybox binary that includes the 'ip' command, though I have not checked yet to see how complete busybox's version of 'ip' is.

-- Nathan

EDIT: Update:

I now have a statically-linked busybox binary that includes both 'ip' and 'route'. The busybox version of 'route' doesn't support the -C parameter, but 'ip route show cache' does work, and if I run that on the box in question while the route cache size is going berserk, it doesn't show anything abnormal: route cache size says 10000+ entires, but 'ip route show cache' only shows the 5 or so routes that I would expect to see on this particular router. So that's interesting.

The problem (well, at least, my problem) is definitely related to MikroTik's new PPP code, though. I'll be attempting to put together a step-by-step method of reproducing the bug once I have it 100% nailed down, but right now it appears to be triggered if you are running a PPP-based server (PPTP, L2TP, SSTP, maybe even PPPoE, etc.) and you have several connections to it go up and down. Eventually, after one of the PPPs disconnect, it seems like there might be some kind of race condition that occurs when it tries to tear the PPP interface down. 'dmesg' output shows a huge number of messages like this on my router shortly after the route cache starts spinning out of control:
unregister_netdevice: waiting for ppp1 to become free. Usage count = 1
unregister_netdevice: waiting for ppp1 to become free. Usage count = 1
unregister_netdevice: waiting for ppp1 to become free. Usage count = 1
unregister_netdevice: waiting for ppp1 to become free. Usage count = 1
[...]
...and something just keeps repeating that message over and over again. This is despite the fact that 'ppp1' as an interface no longer exists:
# busybox ifconfig ppp1
ifconfig: ppp1: error fetching interface information: Device not found
...so it's trying to unregister a device that doesn't exist?

More bad news: once it gets to this state, it appears that it is impossible to flush the cache of the entries being added to it (I assume by PPP?). It would seem that something in the PPP subsystem has a lock on those entries and they can't be freed. Tweaking with the route cache garbage collector values doesn't make a difference, either...the number doesn't go down. I know that those proc/sysctl values actually work because I tested them on a MikroTik with a fairly large route cache, but one that wasn't spiraling out of control, and was able to successfully flush the cache and visibly see the garbage collector behavior change. Once this particular bug is triggered, however, the only thing that can cure it is a reboot. I even tried killing the ppp and ppp-worker processes, but although they cleanly exited after being sent SIGTERM, the route cache remained bloated and the "unregister_netdevice" console errors continued apace.

Finally, it may have something to do with MPPE. I notice that after the problem starts, the 'ppp_mppe' kernel module shows that it is in-use by something and cannot be unloaded, even after I have terminated all PPP tunnels and shutdown PPP services:
# lsmod | busybox grep mppe
ppp_mppe 5585 6 - Live 0x90d5d000
This might just be another symptom, though, rather than a cause, especially if people are also running into this problem with SSTP, which should have no use for MPPE.

EDIT 2: Well, now this is interesting. I think I have managed to find a way to reproduce a version of this bug, but now that I've done so, I tried manually flushing the route cache again, and this time doing so has an effect. It will continue to grow and grow on its own even after a flush, but executing the flush actually works this time and clears the cache (temporarily). Weird.

Re: RB2011UAS-2HnD stops responding spontaneously

Posted: Wed Mar 26, 2014 10:03 am
by Majklik
I'm thinking too that there is problem relatet to the new PPP package. The problem with route cache is more worse from ROS6.10.
There is one another test, which I reported yesterday ( [Ticket#2014032566001708] ). I have two metarouters, one runs SSTP server with one dead connection (in some configurtion do not works keepalive timeout on the SSTP server side and the server do not close dead connection, this problem was primary reason for this test) and second metarouter is client which is trying connect to the server but connection fail because SSTP server allow only one connection. I see that route cache is slowly filled up until server stop responding totally after hours (if the SSTP server is disabled or is there only one live connection then metaroutet lives days). If I leave it at this state then after few hours metarouter reboots. If is SSTP server disabled and connection closed (before metarouter hangs), after some time is cache flushed.

Re: RB2011UAS-2HnD stops responding spontaneously

Posted: Wed Mar 26, 2014 10:18 am
by NathanA
I came up with a similar test, but one that I run with L2TP instead, which I am preparing a description of for MikroTik at this moment. Rather than limiting the connection to 1, however, I purposefully mismatched the encryption requirement between the server and the client: the server requires encryption but the client refuses it. The client tries to rapidly connect to the server over and over again and this quickly causes the scenario that I described in my last post, where something gets "stuck" trying to tear down one of the old pppX interfaces. It also generates several holds on the ppp_mppe kernel module as well. Interestingly, every time the L2TP client tries to connect again, it actually causes the route cache to be flushed. But if you let the L2TP client repeatedly try and fail to connect for 2-3 minutes, and then disable it, the route cache on the server will have a mind of its own and just grow and grow and grow after this.

-- Nathan

EDIT: Actually, I'm beginning to think there are 2 issues: 1) the explosive growth of the route cache when something in the PPP subsystem gets stuck, and 2) the route cache getting into a state where it cannot be flushed any longer. I know how to make #1 happen...that's easy. However, when #1 is happening, the route cache garbage collector seems to be able to keep up with it, so if #1 is happening but #2 is not, you probably still won't see a crash. The real problem happens when #1 is combined with #2, and that's what I experienced when I tried to flush the cache after it started growing and found that I couldn't...the cache size would not go down when I tried a manual flush, and the garbage collector was not doing anything. I don't yet know how to reproduce that state of things, but I have observed it once.

Re: RB2011UAS-2HnD stops responding spontaneously

Posted: Thu Apr 03, 2014 7:52 pm
by iprob
We are seeing this issue with router x86 machines that have a lot of inbound VPN connections. These are a combination of on-demand L2TP and site-to-site IPSec tunnels. Failure occurs in less than 24 hours.

We've implemented the check for the route cache to automatically reboot.

This issue has been around a long time but clearly was made MUCH worse with the 6.11 release. The 6.11 release is not really usable at this time.

Re: RB2011UAS-2HnD stops responding spontaneously

Posted: Thu Apr 03, 2014 9:44 pm
by iprob
I know it isn't nearly as useful as Nathan's detail information...but here is an odd scenario that happened.

- The MikroTik running routeros x86 crashed after about 16 hours after the upgrade
- Rebooted and one L2TP connection was made
- Crashed again within 45 minutes
- Two L2TP connections made (saw the route cache flush happen when second connection was made). One connection was the same user as the previous session. The second connect was from a different IP/user.
- Route cache memory look stable at this point (see int increase and decrease).

I don't have all the detailed tools you are using to capture the data, Nathan. Thanks for your work!

Re: RB2011UAS-2HnD stops responding spontaneously

Posted: Fri Apr 04, 2014 1:08 am
by iprob
Up time was only 5 hours last go around. What version is the suggested downgrade? I'd prefer not to have to go all the way back to 5.24 since the queues are redone for version 6 and all the ipsec configuration scripts had to be updated with "aes-256-cbc" instead of aes-256. I don't know why MikroTik changes simple things like that which break backwards compatibility with configuration scripts.

Re: RB2011UAS-2HnD stops responding spontaneously

Posted: Mon Apr 28, 2014 3:36 pm
by lotnybartek
Same problem here using RB2011UAS-2HnD and latest firmware / software.

Happened few times already (I have this router for 3 weeks), always while L2TP/IPSec clients connected (last time crash - 5 clients connected). I can't ping it, I can't login into it (ssh, telnet, winbox, web). Only reboot fix this. Mikrotik - please solve this bug.

For the time being I'll just use Majklik script.

Re: RB2011UAS-2HnD stops responding spontaneously

Posted: Mon Apr 28, 2014 3:46 pm
by makuk66
I upgraded to 6.12 13 days ago and have not seen a reboot since.

Re: RB2011UAS-2HnD stops responding spontaneously

Posted: Mon Apr 28, 2014 3:52 pm
by iprob
Have you tried version 6.12? I haven't seen the issue yet with 6.12 although I still leave the automatic reboot scripts in place.

Re: RB2011UAS-2HnD stops responding spontaneously

Posted: Mon Apr 28, 2014 4:30 pm
by lotnybartek
Yes, I have 6.12 / 3.14.

Re: RB2011UAS-2HnD stops responding spontaneously

Posted: Mon Apr 28, 2014 4:36 pm
by iprob
If you are seeing the same "No buffer space available" then I would recommend contacting support. They reported this bug as fixed in 6.12 and I haven't seen it so far on any of the 26 routers we upgraded to 6.12. Unfortunately, you'll need to be on the router to verify that the problem is the buffer space. I was able to do this pretty easily with my x86 VM's since I could still connect to the console via the VM manager. I couldn't do that with the RB951 models we have because they were remote. You could also try monitoring the route cache available with a script and writing out the values to a persistent file on the routerboard so you can at least get those statistics and read them after the reboot.

Sorry I can't be of more help.

Re: RB2011UAS-2HnD stops responding spontaneously

Posted: Mon Apr 28, 2014 4:57 pm
by Majklik
I see this problem with ROS6.12 still on my routers. But I use SSTP. The changelog for 6.12 mentions only L2TP.
This problem is not only PPP specific, "full route cache" can come from others places too because I have this problem on RB1100AH/AHx2 (with ROS6.7) routers where is not PPP used after 100~150 days of the uptime - maybe related to the GRE/IPsec tunnels.

Re: RB2011UAS-2HnD stops responding spontaneously

Posted: Mon Apr 28, 2014 5:02 pm
by iprob
Good point, I only saw the issue when using L2TP. I never did see the issue with site-to-site IPSec tunnels, I'm not running any ppp on any "core" BGP routers and I don't use SSTP.

Re: RB2011UAS-2HnD stops responding spontaneously

Posted: Mon Apr 28, 2014 6:42 pm
by kazi33
make sure that you all upgrade the RouterOS to the v6.11 and as well upgrade the RouterBoard firmware to the latest available (at least v3.11 or newer) - that will guarantee that the gigigabit ethernets don't have temp hangs.
Hello Uldis

I have already upgraded my devices to latest software/firmware. This "temp" hang lasted for about 12 hours before I could get someone to reboot my device (remote location)...Until this bug is confirmed fixed I prefer using watchdog (ping gateway or localhost) or Majklik's script just to be on the safe side.

JF.
Hi JF,
I see exactly same issues on my router. It's running 6.7 though, hangs for 12 hours or so, every 4-5 days and then comes back up. Did you have to do something on your configuration to avoid this or still suffering with this ROS issue?

Kz

Re: RB2011UAS-2HnD stops responding spontaneously

Posted: Mon Apr 28, 2014 11:21 pm
by lotnybartek
:local act [/ip route cache get cache-size]
:local max [/ip route cache get max-cache-size]

# print some debug info
:log info ("Actual route cache size: $act")
:log info ("Max. route cache size: $max")
:log info ("If active route cache size: $act>=14336 reboot required")

if (($max-$act)<=2048) do={
  /system reboot
}
Original script written by Majklik.

I just added some print info so you'd know little earlier that there will be reboot soon.

This version is for RB2011UAS-2HnD with "Max. route cache size: 16384".

I'm a newbie so correct it if there is something wrong.

Mikrotik - please fix this VERY annoying issue.

Re: RB2011UAS-2HnD stops responding spontaneously

Posted: Thu May 01, 2014 12:56 am
by thayward
We're still seeing this ballooning route cache issue on 6.12 on routers utilizing ipip tunnels.

Re: RB2011UAS-2HnD stops responding spontaneously

Posted: Fri May 02, 2014 6:36 pm
by mcooper06
We are on RB2011UAS using 6.12 and firmware 3.14 - still occuring here.

We have this machine setup to run the following:

L2TP over IPSec for Road Warriors
Site to Site IPSec

I am planning on downgrading to 6.9 today if possible.

Re: RB2011UAS-2HnD stops responding spontaneously

Posted: Mon May 05, 2014 8:29 pm
by mcooper06
I downgraded to 6.9 and the issue persisted - I checked and my firmware was still 3.14 showing an upgrade available to 3.10. I applied the firmware (I assume 3.10 is the latest for use with 6.9) and rebooted. More info to follow.

Michael

Re: RB2011UAS-2HnD stops responding spontaneously

Posted: Mon May 05, 2014 11:00 pm
by lotnybartek
Apparently this issue has been fixed in 6.13. From yesterday, all clients (6 clients using L2TP/IPSec) were connected. Today cache size is 56 now. Normally it would be something between 2k-4k.

Re: RB2011UAS-2HnD stops responding spontaneously

Posted: Tue May 20, 2014 6:28 pm
by Fraction
Apparently this issue has been fixed in 6.13. From yesterday, all clients (6 clients using L2TP/IPSec) were connected. Today cache size is 56 now. Normally it would be something between 2k-4k.
Has not happened for me either with 6.13, so this looks promising!

Re: RB2011UAS-2HnD stops responding spontaneously

Posted: Tue May 20, 2014 7:13 pm
by iprob
We opened up a support ticket about this issue and they indicated it is fixed in 6.13. We're only now beginning to roll it out so I don't have any definitive results yet.

Re: RB2011UAS-2HnD stops responding spontaneously

Posted: Tue May 20, 2014 8:31 pm
by lotnybartek
So I can only confirm. This one is fixed in 6.13. No problems for couple of days.

Thank you Mikrotik ;-)

Re: RB2011UAS-2HnD stops responding spontaneously

Posted: Sun Jun 01, 2014 12:16 pm
by kazi33
I just filed a support ticket Ticket#2014060166000098. My Rb2011UiAS-IN falls in reboot loop with power cycle. It recovers after 2 -3 hours. One router is rebooting for few days. From LCD, it says
-Loading kernel from nand
-Starting services
After around 30 seconds falls in same loop.
I saw this issue with 6.12. I was hoping to have it fixed in 6.13. But, in my case, upgrade to 6.13 did not help.

Re: RB2011UAS-2HnD stops responding spontaneously

Posted: Sun Jun 01, 2014 4:15 pm
by jarda
Kazi33, have you tried netinstall? If not, try it.

Re: RB2011UAS-2HnD stops responding spontaneously

Posted: Sun Jun 01, 2014 4:22 pm
by iprob
I've upgraded several routers to 6.13 and haven't had an issue and the failures related to route cache have stopped. We only have one open bug. That is in a scenario with dual ISP setups and marking PPTP packets. L2TP/IPSec was fixed in 6.13 and PPTP is expected to be fixed in 6.14.

I agree with jarda, if you're having that reboot loop then try a netinstall and restore the config. I haven't seen that issue on any of the hardware we've upgraded (RB2011, RB751, RB951 and x86).

Re: RB2011UAS-2HnD stops responding spontaneously

Posted: Thu Jun 05, 2014 7:24 pm
by kazi33
Sorry, I did not get email for past few days on both of your postings above. Problem was having "Dude" package installed on mikrotik-RB2011UiAS-IN.
After I removed that using "system package uninstall dude" and rebooted the router problem disappeared. This happened
with their pre-release build 6.14c25 as well. I think, our IT team installed dude on mikrotik for monitoring purpose. It's like
a poison pill on mikrotik. After 2nd or 3rd reboot, it falls in reboot loop when "dude" is running as a service on mikrotik.
I was in Aruba test engineering for about 8 years and we used to fix this kind of issues asap in next release. Not sure how
seriously mikrotik takes this issue. If a package acts like poison pill, software should block that installation on router.

To deal with this:

Check dude:
>store print
Flags: X - disabled, A - active
# NAME TYPE DISK S
0 A dude1 dude system a

To uninstall dude:
[admin@ccc] > system package uninstall dude
[admin@ccc] > system package print
Flags: X - disabled
# NAME VERSION SCHEDULED
0 advanced-tools 6.13
1 dude 4.0beta3 scheduled for uninstall
2 X wireless 6.13

Re: RB2011UAS-2HnD stops responding spontaneously

Posted: Fri Jun 06, 2014 5:11 am
by jarda
Welcome to the club!

Re: RB2011UAS-2HnD stops responding spontaneously

Posted: Mon Jul 07, 2014 10:36 am
by Pengu1n
Hello
Same problem with route-cache on RB1100AHx2.
ROS v.6.15, fw 6.10
Attaching memory usage graph, when it becomes up to about 400MB router stops respond to IPv4 and can be accessed only by MAC telnet.
Router is actively using for site-to-site and l2tp VPNs.
graph1.JPG

Re: RB2011UAS-2HnD stops responding spontaneously

Posted: Tue Jan 13, 2015 3:47 pm
by mvalsasna
just happened again with 6.21.1 on RB433AH

/ip route cache print
cache-size: 16384
max-cache-size: 16384

no ppp, one IPSEC tunnel and one VRRP instance

Re: RB2011UAS-2HnD stops responding spontaneously

Posted: Thu Jan 07, 2016 1:12 pm
by Kraken2k
Solved this issue finally! (tested on version 6.32.1)

I had these problems since upgrade from 6.25 to newest version (last incident it was 6.30.2) on RB1100AHx2 - after few days, the router stopped to respond - still running, reacts to cable connect/disconnect but no response on ethernet ports.

I was able to connect to connect using serial port and found everything running as expected, but just no response to network traffic. When attempted to ping local addresses or even 127.0.0.1 I got the error message "No buffer space available" - later I found that an issue with the same symptoms existed in the past and was claimed to be already fixed (route cache overflow).

Interesting thing is, that we have two RB1100AHx2 routers with the same configuration (just few different IP addresses), but the second one is just backup with a little traffic and deactivated IPsec tunnels - that one works without any issue.

(Months with daily reboots passed)

Yesterday, I finally managed to resolve this issue on router with ~20 IPsec tunnels.

tl;dr version: Guess what... it was solved by turning the ip cache feature back on.

This settings had no effect in version 6.30.2. - when I opened the ticket back then, I got the advice from MT support : "turn the IP cache feature off", but it has no effect and the setting stayed there.

But turning on the route cache ( /ip settings set route-cache=yes) in 6.32.1 (I did not test the next versions yet) actually force the cache to work as it should. However if you change it on running system, this change affect the cache records from that point only - cache entries created prior to the point you turn the route cache feature on, stays there forever, until the router is restarted.

It almost looks like IPsec tunnels use router cache regardless the cache on/off settings, but if the case is turned off in IP settings, no one cares about the records in cache any more, so it will overflow in the end, causing all IPv4 traffic to stop. Turning on the cache feature forces all records to be managed by regular cache algorithm, so it works as it should.

Re: RB2011UAS-2HnD stops responding spontaneously

Posted: Sat Jan 09, 2016 4:12 pm
by whitbread
Solved this issue finally! (tested on version 6.32.1)
...
it was solved by turning the ip cache feature back on.
...
I can confirm, that enabling /ip settings route-cache works on RB750GL, RB951G-2HnD both on Rel. 6.34rc34 :D