I finally got 1gbps uplink to my ISP, and after setting up queue trees on my CCR1009-7G-1C-1S+ single TCP streams never seem to be able to go past 600-700mbps. Disabling the queues immediately allows full 1gbps throughput. The queue is limited at 950M and never drops packets so the queue itself shouldn’t be the limiting factor, I tried different queue types as well as different interface queues with no effect. Multiple TCP streams work fine and push 1gbps no problem, so it seems like something is bottlenecking single TCP performance. CPU seems to be balanced according to profiler.
by introducing the queue, the latency is affected, and may just be visible in your case.
To reduce latency, use hardware only queues and no other buffering.
Window sizes are fine, test server is 3ms away so latency shouldn’t be an issue. I want to use queues for traffic shaping so a single HTTP download doesn’t starve more important traffic, so only using hardware queue is not really an option.
After further testing, even with queues disabled there is still a bottleneck. Removing some queue mangle rules improves speed and turning on fasttrack goes even faster, so I guess there is some single stream / single core limit in the firewall engine that is limiting TCP speeds.
And that objective has been achieved, so maybe you should just leave it at this?
This situation should be fine for typical usage and maybe you should not be benchmarking and speedtesting but make actual normal use of the connection?
There will be plenty of high bandwidth TCP connections in real world usage (lots of large file uploads for example). If they can’t use the full connection capacity that’s a bit disappointing.
The queue type is irrelevant since the throughput limitation still happens even without queuing. It seems something in the netfilter processing causes a single TCP stream to become bottlenecked since removing mangle rules improves speeds, as does using fasttrack, despite the CPU showing only 6-7% load.
pe1chl, for my situation there will often be single high bandwidth TCP streams. Yes, most people can ignore this since with enough users and flows the capacity will easily be maxed, but we have a video department that is uploading very large raw 4k footage over a single connection. I want them to be able to take full advantage of the 1 gbps connection in such a case.
In test 3), there is very little CPU usage: < 20% (~10% is in firewall and ~6% is in networking). There does appear to be something bottlenecking single streams, but it is not evident what is causing it using any of the available profiling tools.
Always check the detailed load of the CCR in tools->profile by selecting CPU: all.
When you get 10 or 20% CPU load on a CCR1009 it can mean that one or two cores are fully loaded and the others are almost idle.
The CPU is still the bottleneck in that case because it apparently is a single-threaded task.
Well, that can still happen when the task is single-threaded and limited by CPU.
The immediate performance is limited by the single CPU, but the actual CPU running the code is switched a few times per second, so you still see evenly loaded processors in the profiling.
There are two distinct things that you need to consider separately:
how can a single-threaded process on a multi-core system be processor bound even when the system load appears to be low
how can such a comparatively simple task take up so much resources on a supposedly powerful router
My connections are not that fast that I can enter in this area on the CCR’s that I manage.
But I can theoretically explain the first point. Only a single processor can be active for the thread at one time, when the thread is
scheduled on a different processor regularly, the average load of the processors can be low even when the thread is CPU-bound.