NV2, CPU and Latency

Hello All,
We have a working box that has been upgraded to NV2 recently, and we’ve had more than our fair share of issues on this puppy. Thankfully we’ve had most of them solved, but we have one last lingering issue.
The original symptoms are that customers complain about frequent disconnects. Normally the internet is fine and dandy, but then it just drops out for anywhere from 20 seconds to a few minutes, and these dropouts are frequent enough that using the connection for gaming, etc. becomes impossible.


After numerous hours of debugging, we’ve come to the point where we are pretty sure that CPU usage spikes on certain types of traffic with NV2, causing latency for all clients on the box (not just the affected radio card!) to go to crap. The box is an RB800 with 3 R52Hns and all the clients in station-bridge mode (including a link to another AP, which has the same setup, two APs and the link, which is suffering similar symptoms). Signals are good. AP is on ROS5.11 and clients are either 5.6 or 5.11 - doesn’t seem like ROS version has any effect on the issue.

When do the CPU spikes occur, you ask? Well that’s a fun question! Because they are not all the time. In fact, the box regularly does ~7 meg with less than 20% CPU, as long as TX/RX packets stay say under 1000 total. In fact, I can do bandwidth tests of 8/9 meg to clients and CPU hovers around 30% no problem. But if any card starts doing over 1,000 packets by itself, CPU spikes to 70-80% and latency on all the cards goes to crap, with frequent time-outs and 300ms+ to the gateway (normally under 10), and worse as CPU gets higher. All these packets are usually from a single TCP connection (so the card isn’t trying to pass through an excessive amount of connections, just packets). Sum total bandwidth during these periods is usually under 3 megs, as the primary card maintains that high count connection and the other cards die a miserable throughput death and can’t maintain anything higher than a meg. I can’t winbox into anything, telnet sessions struggle, it is very indicative of what our users would see as a “drop.”

Now I’ve stripped out all our QOS and Firewall rules, so this is just a dumb box running a bridge from the ether to the cards. I’ve tried messing with AMPDU settings just to see, but I need ideas on what could be causing the CPU spikes, how to prevent those spikes and hopefully stabilize the connections. I’m not sure what information I’ve missed, but I’ll be happy to provide any.

Bumping this in the hopes that someone has answer on why we’re seeing CPU spikes when NV2 sees a tx rate of greater than 1000 packets on an RB800 with R52Hns…