I have several CCR2216 (running ROS 7.16.1) that I use for a few purposes but I have some issues I have observed mainly with TCP performance which gets progressively worse as traffic volume and users increase as well as the higher latency gets (80ms, 120ms, 150ms, 200ms)
Running tests within a lab with a few devices does not showcase the issue that much and we can get very good speeds however in production environment with over 500,000 packets per second as example we start to see some issues.
I cant use L3HW offloading at all due to 2 reasons, on some routers I have very few routes but I have multiple VRF’s so offloading is not an option and on other routers I have multiple full BGP routing tables so offloading doesnt help much here either.
After reading some forums I saw alot of comments about some people’s performance and only-hardware-queue and recommendation to use multi-queue-ethernet (mq pfifo) and adjust the queue size accordingly.
1 weird thing when i use mq pfifo is that i still see all ethernet running off CPU 0 only which i think is weird, is it not the whole point of mq pfifo to distribute across all cores?
I tested this and saw some improvements but I wanted to open a discussion to get everyone else’s opinion on best parameters and settings to use based on amount of packets/s or traffic volume as well as what recommendations do people have for improving TCP performance.
I need to cater for up to 1 million packets per second and traffic volume of around 15Gbps per router.
I haven’t noticed similar issues, but then I’m only pushing about 3Gbps of traffic, and at the moment (morning) we’re only around 200Kpps. (I haven’t looked at PPS during peak hours.)
On all but one of my CCR2116’s (three as border routers, two as BGP core and two as CGNAT), I created an fq-codel queue and kept the default settings. I named it ‘fq-codel-ethernet-default’ and assigned it as the queue to each of the ethernet interfaces on the routers.
I’m also running 7.16.1. I looked at the profiler and it appears everything spreads across all cores.
I’d be curious to see if that helps your situation at all.
1 setup connected to upstreams with bgp etc, this has fastpath enabled and there are no vrf and when I run tests with this setup I have above results with single tcp connection
1 setup acting as core router this also has fastpath enabled, this has vrf
1 setup as edge router (this has vrf as well) this has few firewall rules so can’t use fastpath, I have not setup any fasttrack rules. Running tests from behind here gives slowest results and when I disable the filter/mangle rules and fastpath gets enabled then I get similar results as previous 2
It’s now peak hours and I checked in on one of my BGP core routers. It has no forward firewall rules, only input. It runs at about 10% CPU pushing 3.5-4Gbps, and 400K packets. All RX traffic is hitting FastPath. This is over a bonded pair of SFP+ interfaces.
I also looked at the busiest border router, and it’s doing a similar amount of traffic (3Gbps, 400Kpps). It does have a few more firewall rules on the forward chain, and a bunch more on the input chain. CPU is 20%, virtually all RX traffic is hitting FastPath.
I have connection tracking either off or tuned in with mangle rules to bypass almost everything.
L3HW offload is off (I’m not seeing an improvement with it turned on, and on other routers in the network it occasionally gets “stuck”).
I only started to notice issues and complaints after I got to around 4.5Gbps to 5Gbps and I could start see on my traffic graphs as if traffic was “shaped” when I knew the bandwidth was to be higher.
My solution at that time was to introduce a new core router to try and distribute traffic across 2 different routers which worked but im at that point again where its being “shaped” and its not practical to add more routers when current should be able to handle more traffic.