ROS Versions Newer Than ~3.20 Cause 100% CPU Usage

I had a RB150 that I used for a couple of years that ran ROS v3.13. I used this little RouterBoard primarly as a firewall to automatically catch and block spam, deny certain types of traffic, and use it as a transparent proxy in combination with address lists and NAT redirect to cut off past due customers by redirecting their web traffic to a page that told them they haven’t paid their bill. I thought it was a pretty standard setup.

Around two years ago I remember upgrading it to 3.20 (or maybe 3.30) and seeing the CPU max out on it after about 24 hours. The high CPU caused pings to be in the 1000’s across my whole network and made the internet virtually unusable for my customers. So, I rebooted it and it went back to normal. Around 24 hours later the same thing happened again and again. So, I figured there was a bug in the version I upgraded to so I downgraded and my router was working and reliable again.

When ROS version 4 came out I upgraded to it. However, the same high CPU usage happened time and time again. I tried a couple different versions of ROS4, but they all seemed unreliable.

About a year ago I finally outgrew my RB150, so I decided to get a RB450G for the extra CPU power since my RB150’s CPU was about 70-80% during my peak hours. Sadly, I was never able to get my RB450G to work well. It would do the same thing as my RB150 when I upgraded firmwares. The CPU even on the powerful 450G would climb to 100% for seemingly no reason and at random times. So, I put the 450G aside and went back to my RB150 with v3.13 since it worked better.

Over the last year I have tried a few times to get a new ROS working on my 450G. Every version I have tried has the same problem with the CPU maxing out. I thought I might have had a faulty RB450G, so I ordered a new one, but it has the same problem. The only thing that still works is my old RB150 with ROS v3.13.

Naturally, I thought maybe there is something in my configuration that when upgraded is causing new versions of ROS to have high CPU load on my RouterBoards. So, I started my configuration from scratch. I only pasted my firewall configuration so I wouldnt have to retype all my rules and address list entries. I still can’t prevent my RB150 or my 2 RB450Gs from malfunctioning on me with new ROS versions.

Two weeks ago I thought I finally had this fixed. I started a config from scratch on my newest RB450G with ROSv4.15. I entered as much of my configuration as I could by hand. I only pasted in my firewall config and my proxy config. My 450G ran fine for about a week. Then the CPU maxed out at a random time not even during peak hours. Then, today (8 days later) it did the same thing. A reboot fixed it, but then the CPU maxed out again about 2 hours later.

Over the last couple of years when dealing with all of this I thought that the recent versions of ROS were just full of bugs that caused my problem. However I haven’t heard of this exact problem from someone else, and I cant imagine that these bugs would have existed for almost 2 full versions of ROS. Now, I am thinking that there is something wrong with my configuration.

My configuration is very simple and I don’t have a lot of firewall rules or anything that would obviously tax the CPU. I can’t imagine what it could be. Is there any way to view the process list in ROS to allow me to see what part of the OS would be causing problems?

One interesting thing I have noticed is that during high CPU load if I disable and then re-enable ANY firewall rule then the CPU goes back down to below 10% and stays there for a long time. It’s seems to solve the problem just like a reboot would. So, that makes me think that my problem is firewall related.

Would it be helpful to post my config, or is there something better that I can do to help solve this problem?

Thanks to anyone who has any ideas.

P.S. Does this problem sound familiar to anyone?