6.37.1 and 6.36.4 100% CPU usage on Firewall

I have a Xeon box that was humming along today, then became unresponsive. I drive to the data center and the firewall is using up all that the CPU has to give. Normally CPU usage is between 0% and 3% when under “load”. The upstream interface now is using… 3 megs. I disconnect the upstream cable and it goes away. I reconnect and it comes back. I can’t get a supout while the box is loaded. Can’t seem to get one to fire even when the box isn’t loaded.

I went from 6.37.1 down to 6.36.4 with no change and then up to 6.38rc25 and the problem went away. I don’t like rocking an RC on my PE router, but I also don’t like downgrading just to see if the problem is still there.

I added firewall rules to only accept DNS, Winbox and SSH, drop all else. No difference in the operation of the box.

What was going on? I can provide a 20M PCAP to qualified parties. I should have taken a much larger one.

this is guesswork at this point. in changelog there are at least few entries that can explain that. usually 100% is down to some kind of crash.

But then why would unplugging\plugging the upstream interface make a difference?

Are you sure you aren’t receiving excessive traffic (potential DDOS) from your upstream?

My upstream interface was only showing about 3 megabit/s peak. It certainly wasn’t a volumetric attack, but that doesn’t rule out other kinds of attacks.

Also, it’s an 8 core Xeon box. I only have 1 gigabit of upstream, so I’m not sure I could max out the CPU under regular or volumetric conditions anyway.

Your badly organised firewall receiving huge amount of small packets can easily flood the cpu. See the profiler, watch the torch and you will get a clue… You have to know what kind of traffic is the reason in order to be able fight with it.

I did use Profiler. I used Torch to some degree. I did a packet capture for later analysis.

3 megabit of traffic can topple eight Xeon cores?

Should not. So,what is the result?