I’m on a RB 532, running RouterOS 3.10. I have been experiencing my problem over the past several months, on the past few ROS versions.
After the router being up about 5 days, the load average will slowly start to climb, and then it spikes and hangs the router sometime late in the 6th day or into the 7th day. Sometimes it’s been closer to 8 days, sometimes closer to 6.
Without a PS command, I am unable to tell what is taking the CPU time.
When the load spikes, the router slows down on passing traffic, and will slowly begin to refuse to answer queries from the console, or via winbox. So, I am unable to generate a supout at the time when I need it the most.
I don’t believe I’m doing anything funky in my configuration, and I don’t see any other changes in my traffic graphs to indicate that the router is getting overworked with traffic at these ~6 day points.
Is anyone else experiencing anything similar?
Any ideas on how to generate the supout, if I can’t get the CPUs attention long enough to generate it?? I got close last week, and let the supout generate for 30 minutes, and then the console timed out and kicked me out, and the supout file was not left behind.
I am about two days away from this happening again, so I would appreciate any thoughts before I go thru the next cycle…
Hmmm, current is 2.12, I see now that there is a 2.15 (there wasn’t one, last time i looked)
I will upgrade it now, which will buy me another week.
I checked the changelog for 2.15, but there doesn’t appear to be anything related to my problem. Was there a discussion on this problem somewhere else??
Did you notice progressive memory consumption during period of uptime, not just CPU usage? In my case, there was less and less available RAM to router, and eventually at ~ 20% available memory he starts slowing so u cant even console log in, for that leak made CPU usage to 100% IMHO. I had that experience on one router, upgrade solved problem. Not sure if it is directly related to your problem, for in my case it was RB112. I replied because they use same RB500 packages.
There was indeed progressive memory consumption, but not down enough to the point where it should have been unusable. There is usually 10-12 MB remaining free on my 32 MB RB532.
attaching RRD graphs of the past month, where you can see the trends. The Load Avg ramp-up usually starts 1.5-2 days before I reboot (or, in some cases, power-cycle). You can see it in the graphs.
I’m seeing this on some routers as well. One is an RB232 running a hotspot. I don’t see memory leaking, just the CPU running up to 100%. I’m waiting to see if we see this on other routers not running hotspots.
I do have a PPTP client connected to my RB532 most of the time. But I haven’t correlated that connection being up with the CPU load. I’m not seeing an influx of traffic around my high-load times either.
Okay, here’s another thing to try, but it’s a long-shot. Export your config with an /export file=xxx1 then do a /system reset. Edit the dumped config to just put back what is changed from the default and paste it back in. See if that clears up the problem. The theory here is that something under the covers is whacked due to the 2.9 to 3.0 upgrade. The /system reset should get it all back to known values and then program back only what you need.
Write a script which periodically creates one, maybe every couple of hours, with an initial delay after startup. You can have a startup script which disables the periodic script, and enables another script to start at t+1 hours. This second script will reenable the periodic dump script. This gives you a fairly large window to download the last .out after a reboot. When the router gets to the 100% state, reboot it and copy off the last .out file.