CCR1009-7G-1C-1S+ single stream TCP performance limit with queues

If that’s what is actually happening, I would consider that a pretty big flaw in the scheduler. It shouldn’t take away the CPU of a 100% CPU bound process when there are plenty of other CPUs available.

That is of course not what is happening. The processing of each packet in your TCP session is a separate event being handled, and it
can well be that it is handled by a different CPU every time, yet it is not possible that several CPUs are working on these events (from a single session)
in parallel. Unsurprising, because TCP has a sequence number which of course has to be handled atomically.

When you have different TCP sessions in parallel, different CPUs can be working on a different session at the same time, and the resulting total
throughput is higher. Probably the 1Gbps link can be saturated by just 2 sessions, given the measurements done by you and others.

I don’t agree that a single stream has to be handled in a single thread. I’ll agree to disagree and leave it at that.

I disagree. While it might be implemented like this in RB devices, it is not required to process TCP packets in sequence by intermediate devices (e.g. routers). Every receiver’s TCP stack has to implement out-of-order delivery mechanism (incidentally this mechanism is also used when doing retransmits).
I don’t think that even NAT should change this much as NAT is done on L3 while TCP is a step higher so sequence numbers of TCP packets shouldn’t be affected by NAT.

That is true, but the connection tracking has to implement a sliding sequence number to be able to reject segments with a bad sequence number as “invalid”.
So while the segments itself may arrive out-of-sequence and should not be queued (some segments may not be seen at all, e.g. when load balancing is in use), there still should be handling of the acked sequence numbers.

I can’t think of a way when NAT would not be atop of load-balancing. E.g. how could NAT work if not all packets would pass single NAT instance (the physical link on either side might be load-balanced)? Or am I misunderstanding what you wrote about packets not seen when load-balancing?

[edit] Argh … I was too deep into my thinking of NAT to notice you explicitly wrote about conn tracking. My bad.

Actually, in NAT the TCP sequence numbers ARE affected!
This happens only in very special cases, where an IP address appears inside a data packet in string representation and has to be NAT’ted.
For example (and that is the only example I know) in FTP there is a PORT command over the control connection to initiate a transfer like this:
PORT 10.0.0.1:20
This has to be translated to the external address like this:
PORT 123.123.123.123:20
See that it can be longer?
The NAT layer fumbles with the sequence numbers to accomodate the extra bytes, and from then on in the session all sequence numbers will be translated.

@pe1chl, thanks a lot for the explanation. FTP with its control/data port cludge completely slipped my mind (I can’t remember when I last used FTP protocol … it used to be really common protocol back in previous millenium).

We are having a similar problem with queueing of 1Gbps of MPLS traffic. In our case, it isn’t a single stream performance that we are hitting, but instead is the total MPLS traffic across the interface, but exactly matches your 700Mbps figure. I suspect that the router is treating the bulk MPLS traffic (which just have labels identifying the packets) similarly to how it would a single TCP stream, resulting in similar bottlenecks.

Did you have any answer from MT?
I too opened a similar topic that got in the dust.

i think for this kind of situations a good chance of improvement is with a CCR 2116 because of 2ghz Higher CPU clock than CCR 10xx series which has only 1.2ghz cpu clock and lower performance

I was never able to solve this, it was probably just low per-core speed causing bottlenecking. The hardware was replaced with a Xeon E-2388G based router which has no problem with 3+gbps single connection TCP performance.

Xeon E-2388G has a very high Base Frequency of 3.20 GHz and is an OoO out-of-order state of the art x86 heavy core

is no surprise it can beat a 1.2ghz in-order light core like the ones of CCR10xx product Line

ccr2116 has OoO Arm cores at 2.0 Ghz i think can give a good fight in heavy processing scenarios like this

Hi @mducharme, we have experienced the same behavior with ARM devices. On what devices have you experienced this behavior? Have you found any way to solve it?