I wanted to revisit this post as I am still noticing this problem on a different router now.
As for the router I originally was posting about, the problem magically went away after a number of reboots and months of running.
I've got another router with a very similar config that is starting to do the same thing. I'm noticing a pattern here as this takes time to "ramp up" to this state (months). This particular router was originally setup with 2.9.17. Then it was upgraded to 2.9.24 when this version was released. Now, I'm on 2.9.26 due to the hopes the DNS bugs are fixed. Versions and reboots don't seem to affect this problem...but at some point, it just goes away.
Here's how both of these routers have started to do this:
- Only after having an uptime of >month or so does this even begin.
- Once this starts, we notice random CPU spikes, but they are not that noticable, except via our SNMP monitoring which starts to show missing data randomly in the graphs once or twice a day.
- Once we start to notice a few random spikes, within another month or so, they start to increase. At this point, we also notice than when winboxed into the router, opening a terminal window can take some time, and the cpu usage hits 100% during this delay. We start to get more and more misses on our SNMP graphs as we've got scripts that login to these routers every 5 minutes - these start to timeout as the login process takes 5-10 seconds which is too long, resulting in no data.
- After 2-3 months, we notice a lot of CPU spikes, every second or two we are hitting 50% or greater even with the router completely idle. The terminal from within winbox or direct SSH connections take at least 5 seconds (they used to be much faster), but typically longer.
At this point, our SNMP graphs have more missing data than graphed data, sometimes hours of nothing. This is *not* our network connection of SNMP server causing this - the login scripts are simply timing out.
- The base memory usage after a reboot steadily increases over time. So, the router starts up with 10MB or more used than it used to with a fresh reboot. Again, this number slowly climbs over months. As does the baseline CPU usage at idle.
I'm beginning to wonder if this problem is caused by our SNMP scripts logging into these routers every 5 minutes for months on end, leaving behind some memory or something as we never see this problem with wireless access points that don't have login scripts hitting them or a hotspot package running on them. Like I said, at some point (within 4 or 5 months maybe) this problem can completely disappear, but it gets bad before this happens.
I'm sending a supout today from our current problematic router. I wanted to post here in the event anyone else has continued to notice this, or any of the previous posters in this thread have seen anything relating to this since last posting here....or if there has been a workaround discovered.