Community discussions

MikroTik App
 
R1CH
Forum Guru
Forum Guru
Topic Author
Posts: 1107
Joined: Sun Oct 01, 2006 11:44 pm

CCR1009-7G-1C-1S+ single stream TCP performance limit with queues

Sat Mar 03, 2018 2:43 pm

I finally got 1gbps uplink to my ISP, and after setting up queue trees on my CCR1009-7G-1C-1S+ single TCP streams never seem to be able to go past 600-700mbps. Disabling the queues immediately allows full 1gbps throughput. The queue is limited at 950M and never drops packets so the queue itself shouldn't be the limiting factor, I tried different queue types as well as different interface queues with no effect. Multiple TCP streams work fine and push 1gbps no problem, so it seems like something is bottlenecking single TCP performance. CPU seems to be balanced according to profiler.

Anyone have any ideas?

Image
 
User avatar
sebastia
Forum Guru
Forum Guru
Posts: 1782
Joined: Tue Oct 12, 2010 3:23 am
Location: Antwerp, BE

Re: CCR1009-7G-1C-1S+ single stream TCP performance limit with queues

Sat Mar 03, 2018 4:20 pm

Speed of single stream depends on:
* speed of channel
* windows size
* latency of connection

by introducing the queue, the latency is affected, and may just be visible in your case.
To reduce latency, use hardware only queues and no other buffering.
 
R1CH
Forum Guru
Forum Guru
Topic Author
Posts: 1107
Joined: Sun Oct 01, 2006 11:44 pm

Re: CCR1009-7G-1C-1S+ single stream TCP performance limit with queues

Sat Mar 03, 2018 5:49 pm

Window sizes are fine, test server is 3ms away so latency shouldn't be an issue. I want to use queues for traffic shaping so a single HTTP download doesn't starve more important traffic, so only using hardware queue is not really an option.

After further testing, even with queues disabled there is still a bottleneck. Removing some queue mangle rules improves speed and turning on fasttrack goes even faster, so I guess there is some single stream / single core limit in the firewall engine that is limiting TCP speeds.
 
pe1chl
Forum Guru
Forum Guru
Posts: 10486
Joined: Mon Jun 08, 2015 12:09 pm

Re: CCR1009-7G-1C-1S+ single stream TCP performance limit with queues

Sat Mar 03, 2018 8:28 pm

I want to use queues for traffic shaping so a single HTTP download doesn't starve more important traffic
And that objective has been achieved, so maybe you should just leave it at this?
This situation should be fine for typical usage and maybe you should not be benchmarking and speedtesting but make actual normal use of the connection?
 
R1CH
Forum Guru
Forum Guru
Topic Author
Posts: 1107
Joined: Sun Oct 01, 2006 11:44 pm

Re: CCR1009-7G-1C-1S+ single stream TCP performance limit with queues

Sun Mar 04, 2018 12:24 am

There will be plenty of high bandwidth TCP connections in real world usage (lots of large file uploads for example). If they can't use the full connection capacity that's a bit disappointing.
 
ToBeFrank
newbie
Posts: 33
Joined: Mon Dec 18, 2017 7:31 pm

Re: CCR1009-7G-1C-1S+ single stream TCP performance limit with queues

Sun Mar 04, 2018 12:52 am

Looks like you're using queue trees. How does it do with simple queues?
 
marcin21
Member Candidate
Member Candidate
Posts: 215
Joined: Tue May 04, 2010 4:50 pm

Re: CCR1009-7G-1C-1S+ single stream TCP performance limit with queues

Sun Mar 04, 2018 9:49 am

Try changing default queue type to "sfq"
 
pe1chl
Forum Guru
Forum Guru
Posts: 10486
Joined: Mon Jun 08, 2015 12:09 pm

Re: CCR1009-7G-1C-1S+ single stream TCP performance limit with queues

Sun Mar 04, 2018 11:48 am

There will be plenty of high bandwidth TCP connections in real world usage (lots of large file uploads for example). If they can't use the full connection capacity that's a bit disappointing.
But they can... only not in a single TCP session. When many users are trying to up- and download lots of things, it will work just fine.
 
R1CH
Forum Guru
Forum Guru
Topic Author
Posts: 1107
Joined: Sun Oct 01, 2006 11:44 pm

Re: CCR1009-7G-1C-1S+ single stream TCP performance limit with queues

Sun Mar 04, 2018 6:33 pm

The queue type is irrelevant since the throughput limitation still happens even without queuing. It seems something in the netfilter processing causes a single TCP stream to become bottlenecked since removing mangle rules improves speeds, as does using fasttrack, despite the CPU showing only 6-7% load.

pe1chl, for my situation there will often be single high bandwidth TCP streams. Yes, most people can ignore this since with enough users and flows the capacity will easily be maxed, but we have a video department that is uploading very large raw 4k footage over a single connection. I want them to be able to take full advantage of the 1 gbps connection in such a case.
 
ToBeFrank
newbie
Posts: 33
Joined: Mon Dec 18, 2017 7:31 pm

Re: CCR1009-7G-1C-1S+ single stream TCP performance limit with queues

Sun Mar 04, 2018 10:29 pm

I can reproduce this on my CCR1009. I tested using iperf between two VLANs. Here are my results.

1) With firewall rules, mangle rules disabled, queues disabled, fasttrack disabled: 925Mbps.
2) With firewall rules, mangle rules disabled, queues enabled (but logically disabled since no mangle rules), fasttrack disabled: 800Mbps.
3) With firewall rules, mangle rules enabled, queues enabled, fasttrack disabled: 575Mbps.
4) With firewall rules, mangle rules enabled, queues enabled, fasttrack enabled: 925Mbps.

In test 3), there is very little CPU usage: < 20% (~10% is in firewall and ~6% is in networking). There does appear to be something bottlenecking single streams, but it is not evident what is causing it using any of the available profiling tools.
 
R1CH
Forum Guru
Forum Guru
Topic Author
Posts: 1107
Joined: Sun Oct 01, 2006 11:44 pm

Re: CCR1009-7G-1C-1S+ single stream TCP performance limit with queues

Mon Mar 05, 2018 1:43 pm

That's good to hear it is reproducible. I will contact Mikrotik support and hope for an explanation.
 
pe1chl
Forum Guru
Forum Guru
Posts: 10486
Joined: Mon Jun 08, 2015 12:09 pm

Re: CCR1009-7G-1C-1S+ single stream TCP performance limit with queues

Mon Mar 05, 2018 2:55 pm

Always check the detailed load of the CCR in tools->profile by selecting CPU: all.
When you get 10 or 20% CPU load on a CCR1009 it can mean that one or two cores are fully loaded and the others are almost idle.
The CPU is still the bottleneck in that case because it apparently is a single-threaded task.
 
ToBeFrank
newbie
Posts: 33
Joined: Mon Dec 18, 2017 7:31 pm

Re: CCR1009-7G-1C-1S+ single stream TCP performance limit with queues

Mon Mar 05, 2018 4:03 pm

Always check the detailed load of the CCR in tools->profile by selecting CPU: all.
When you get 10 or 20% CPU load on a CCR1009 it can mean that one or two cores are fully loaded and the others are almost idle.
The CPU is still the bottleneck in that case because it apparently is a single-threaded task.
As R1CH showed in his image, no single CPU is bottlenecking. The load is nicely distributed across all cores.
 
pe1chl
Forum Guru
Forum Guru
Posts: 10486
Joined: Mon Jun 08, 2015 12:09 pm

Re: CCR1009-7G-1C-1S+ single stream TCP performance limit with queues

Mon Mar 05, 2018 4:18 pm

Well, that can still happen when the task is single-threaded and limited by CPU.
The immediate performance is limited by the single CPU, but the actual CPU running the code is switched a few times per second, so you still see evenly loaded processors in the profiling.
 
ToBeFrank
newbie
Posts: 33
Joined: Mon Dec 18, 2017 7:31 pm

Re: CCR1009-7G-1C-1S+ single stream TCP performance limit with queues

Mon Mar 05, 2018 4:39 pm

Well, that can still happen when the task is single-threaded and limited by CPU.
The immediate performance is limited by the single CPU, but the actual CPU running the code is switched a few times per second, so you still see evenly loaded processors in the profiling.
If no single core goes over 60%, how exactly is it limited by CPU?
 
pe1chl
Forum Guru
Forum Guru
Posts: 10486
Joined: Mon Jun 08, 2015 12:09 pm

Re: CCR1009-7G-1C-1S+ single stream TCP performance limit with queues

Mon Mar 05, 2018 4:56 pm

The load figures you see in profiling are averages and the CPU limits are instaneous.
 
ToBeFrank
newbie
Posts: 33
Joined: Mon Dec 18, 2017 7:31 pm

Re: CCR1009-7G-1C-1S+ single stream TCP performance limit with queues

Mon Mar 05, 2018 5:34 pm

The load figures you see in profiling are averages and the CPU limits are instaneous.
Describe the scenario where an average load of 60% would result in a throughput of 62% of max.
 
pe1chl
Forum Guru
Forum Guru
Posts: 10486
Joined: Mon Jun 08, 2015 12:09 pm

Re: CCR1009-7G-1C-1S+ single stream TCP performance limit with queues

Mon Mar 05, 2018 5:38 pm

Apparently you have it sitting in front of you...
 
ToBeFrank
newbie
Posts: 33
Joined: Mon Dec 18, 2017 7:31 pm

Re: CCR1009-7G-1C-1S+ single stream TCP performance limit with queues

Mon Mar 05, 2018 6:27 pm

I was hoping you could explain it. Going through 2 mangle rules and 2 queue checks (but not actually queuing) and maxing out the CPU seems not right.
 
pe1chl
Forum Guru
Forum Guru
Posts: 10486
Joined: Mon Jun 08, 2015 12:09 pm

Re: CCR1009-7G-1C-1S+ single stream TCP performance limit with queues

Mon Mar 05, 2018 6:54 pm

There are two distinct things that you need to consider separately:
- how can a single-threaded process on a multi-core system be processor bound even when the system load appears to be low
- how can such a comparatively simple task take up so much resources on a supposedly powerful router

My connections are not that fast that I can enter in this area on the CCR's that I manage.
But I can theoretically explain the first point. Only a single processor can be active for the thread at one time, when the thread is
scheduled on a different processor regularly, the average load of the processors can be low even when the thread is CPU-bound.
 
ToBeFrank
newbie
Posts: 33
Joined: Mon Dec 18, 2017 7:31 pm

Re: CCR1009-7G-1C-1S+ single stream TCP performance limit with queues

Mon Mar 05, 2018 7:13 pm

Only a single processor can be active for the thread at one time, when the thread is
scheduled on a different processor regularly, the average load of the processors can be low even when the thread is CPU-bound.
If that's what is actually happening, I would consider that a pretty big flaw in the scheduler. It shouldn't take away the CPU of a 100% CPU bound process when there are plenty of other CPUs available.
 
pe1chl
Forum Guru
Forum Guru
Posts: 10486
Joined: Mon Jun 08, 2015 12:09 pm

Re: CCR1009-7G-1C-1S+ single stream TCP performance limit with queues

Mon Mar 05, 2018 8:39 pm

That is of course not what is happening. The processing of each packet in your TCP session is a separate event being handled, and it
can well be that it is handled by a different CPU every time, yet it is not possible that several CPUs are working on these events (from a single session)
in parallel. Unsurprising, because TCP has a sequence number which of course has to be handled atomically.

When you have different TCP sessions in parallel, different CPUs can be working on a different session at the same time, and the resulting total
throughput is higher. Probably the 1Gbps link can be saturated by just 2 sessions, given the measurements done by you and others.
 
ToBeFrank
newbie
Posts: 33
Joined: Mon Dec 18, 2017 7:31 pm

Re: CCR1009-7G-1C-1S+ single stream TCP performance limit with queues

Mon Mar 05, 2018 10:49 pm

I don't agree that a single stream has to be handled in a single thread. I'll agree to disagree and leave it at that.
 
User avatar
mkx
Forum Guru
Forum Guru
Posts: 12310
Joined: Thu Mar 03, 2016 10:23 pm

Re: CCR1009-7G-1C-1S+ single stream TCP performance limit with queues

Tue Mar 06, 2018 4:24 pm

Unsurprising, because TCP has a sequence number which of course has to be handled atomically.
I disagree. While it might be implemented like this in RB devices, it is not required to process TCP packets in sequence by intermediate devices (e.g. routers). Every receiver's TCP stack has to implement out-of-order delivery mechanism (incidentally this mechanism is also used when doing retransmits).
I don't think that even NAT should change this much as NAT is done on L3 while TCP is a step higher so sequence numbers of TCP packets shouldn't be affected by NAT.
Last edited by mkx on Tue Mar 06, 2018 4:29 pm, edited 1 time in total.
 
pe1chl
Forum Guru
Forum Guru
Posts: 10486
Joined: Mon Jun 08, 2015 12:09 pm

Re: CCR1009-7G-1C-1S+ single stream TCP performance limit with queues

Tue Mar 06, 2018 4:27 pm

Unsurprising, because TCP has a sequence number which of course has to be handled atomically.
I disagree. While it might be implemented like this in RB devices, it is not required to process TCP packets in sequence by intermediate devices (e.g. routers). Every receiver's TCP stack has to implement out-of-order delivery mechanism (incidentally this mechanism is also used when doing retransmits).
That is true, but the connection tracking has to implement a sliding sequence number to be able to reject segments with a bad sequence number as "invalid".
So while the segments itself may arrive out-of-sequence and should not be queued (some segments may not be seen at all, e.g. when load balancing is in use), there still should be handling of the acked sequence numbers.
 
User avatar
mkx
Forum Guru
Forum Guru
Posts: 12310
Joined: Thu Mar 03, 2016 10:23 pm

Re: CCR1009-7G-1C-1S+ single stream TCP performance limit with queues

Tue Mar 06, 2018 4:33 pm

That is true, but the connection tracking has to implement a sliding sequence number to be able to reject segments with a bad sequence number as "invalid".
So while the segments itself may arrive out-of-sequence and should not be queued (some segments may not be seen at all, e.g. when load balancing is in use), there still should be handling of the acked sequence numbers.
I can't think of a way when NAT would not be atop of load-balancing. E.g. how could NAT work if not all packets would pass single NAT instance (the physical link on either side might be load-balanced)? Or am I misunderstanding what you wrote about packets not seen when load-balancing?

[edit] Argh ... I was too deep into my thinking of NAT to notice you explicitly wrote about conn tracking. My bad.
 
pe1chl
Forum Guru
Forum Guru
Posts: 10486
Joined: Mon Jun 08, 2015 12:09 pm

Re: CCR1009-7G-1C-1S+ single stream TCP performance limit with queues

Tue Mar 06, 2018 4:50 pm

Actually, in NAT the TCP sequence numbers ARE affected!
This happens only in very special cases, where an IP address appears inside a data packet in string representation and has to be NAT'ted.
For example (and that is the only example I know) in FTP there is a PORT command over the control connection to initiate a transfer like this:
PORT 10.0.0.1:20
This has to be translated to the external address like this:
PORT 123.123.123.123:20
See that it can be longer?
The NAT layer fumbles with the sequence numbers to accomodate the extra bytes, and from then on in the session all sequence numbers will be translated.
 
User avatar
mkx
Forum Guru
Forum Guru
Posts: 12310
Joined: Thu Mar 03, 2016 10:23 pm

Re: CCR1009-7G-1C-1S+ single stream TCP performance limit with queues

Tue Mar 06, 2018 7:33 pm

@pe1chl, thanks a lot for the explanation. FTP with its control/data port cludge completely slipped my mind (I can't remember when I last used FTP protocol ... it used to be really common protocol back in previous millenium).
 
mducharme
Trainer
Trainer
Posts: 1777
Joined: Tue Jul 19, 2016 6:45 pm
Location: Vancouver, BC, Canada

Re: CCR1009-7G-1C-1S+ single stream TCP performance limit with queues

Mon May 10, 2021 9:32 am

That's good to hear it is reproducible. I will contact Mikrotik support and hope for an explanation.
We are having a similar problem with queueing of 1Gbps of MPLS traffic. In our case, it isn't a single stream performance that we are hitting, but instead is the total MPLS traffic across the interface, but exactly matches your 700Mbps figure. I suspect that the router is treating the bulk MPLS traffic (which just have labels identifying the packets) similarly to how it would a single TCP stream, resulting in similar bottlenecks.
 
User avatar
Maggiore81
Trainer
Trainer
Posts: 570
Joined: Sun Apr 15, 2012 12:10 pm
Location: Italy
Contact:

Re: CCR1009-7G-1C-1S+ single stream TCP performance limit with queues

Sat Oct 15, 2022 8:11 am

Did you have any answer from MT?
I too opened a similar topic that got in the dust.
 
User avatar
chechito
Forum Guru
Forum Guru
Posts: 3074
Joined: Sun Aug 24, 2014 3:14 am
Location: Bogota Colombia
Contact:

Re: CCR1009-7G-1C-1S+ single stream TCP performance limit with queues

Sun Oct 16, 2022 5:26 am

i think for this kind of situations a good chance of improvement is with a CCR 2116 because of 2ghz Higher CPU clock than CCR 10xx series which has only 1.2ghz cpu clock and lower performance
 
R1CH
Forum Guru
Forum Guru
Topic Author
Posts: 1107
Joined: Sun Oct 01, 2006 11:44 pm

Re: CCR1009-7G-1C-1S+ single stream TCP performance limit with queues

Tue Oct 18, 2022 10:32 pm

I was never able to solve this, it was probably just low per-core speed causing bottlenecking. The hardware was replaced with a Xeon E-2388G based router which has no problem with 3+gbps single connection TCP performance.
 
User avatar
chechito
Forum Guru
Forum Guru
Posts: 3074
Joined: Sun Aug 24, 2014 3:14 am
Location: Bogota Colombia
Contact:

Re: CCR1009-7G-1C-1S+ single stream TCP performance limit with queues

Wed Oct 19, 2022 1:55 am

Xeon E-2388G has a very high Base Frequency of 3.20 GHz and is an OoO out-of-order state of the art x86 heavy core

is no surprise it can beat a 1.2ghz in-order light core like the ones of CCR10xx product Line

ccr2116 has OoO Arm cores at 2.0 Ghz i think can give a good fight in heavy processing scenarios like this
 
User avatar
clambert
Member Candidate
Member Candidate
Posts: 154
Joined: Wed Jun 12, 2019 5:04 am

Re: CCR1009-7G-1C-1S+ single stream TCP performance limit with queues

Sun Sep 10, 2023 10:16 pm

We are having a similar problem with queueing of 1Gbps of MPLS traffic. In our case, it isn't a single stream performance that we are hitting, but instead is the total MPLS traffic across the interface, but exactly matches your 700Mbps figure. I suspect that the router is treating the bulk MPLS traffic (which just have labels identifying the packets) similarly to how it would a single TCP stream, resulting in similar bottlenecks.
Hi @mducharme, we have experienced the same behavior with ARM devices. On what devices have you experienced this behavior? Have you found any way to solve it?

Who is online

Users browsing this forum: No registered users and 14 guests