QoS: worse throughput when the latency increases

System: RB5009 with desktop at 2,5Gbps at ether1, and SFP XPon (at 2,5Gbps) at the SFP+ interface.
Problem: In order to prevent dropped packages, I set a 949 Mbps limit on the 1Gbps ports and SFP+ egress rate, on the switch menu. Local network works like a charm. Targets on the internet up to about 130ms, work very well too. Targets with higher latency show severely degraded speed.

Now the weird part: The degradation ONLY happens if the eth1 is running at 2,5Gbps. If I set it to 1Gbps the problem goes away.
I already tried limiting speed on the switch menu, simple queues, queue trees and interface queues. All of them work limiting the speed - but the latency problem happens too. If I move from 2,5Gbps ethernet to 1Gbps ethernet, the problem goes away.

P.S. - I know that the 949 Mbps limiting is higher than my ISP speed. It it there just to protect the internal buffers on the router. If I run this test from the 1 Gbps port and without limits the results are about the same as from 1Gbps port and with the 949 Mbps.

Example with switch speed limiting, egress rate only, 949Mbps (ethernet at 2,5Gbps)

   Speedtest by Ookla

      Server: IPA CyberLab 400G - Tokyo (id: 48463)
         ISP: Predlink Rede de Telecomuniccoes Ltda
Idle Latency:   269.61 ms   (jitter: 1.48ms, low: 269.18ms, high: 271.59ms)
    Download:   563.42 Mbps (data used: 851.0 MB)                                                   
                418.40 ms   (jitter: 82.31ms, low: 270.11ms, high: 907.15ms)
      Upload:   137.91 Mbps (data used: 160.1 MB)                                                   
                402.99 ms   (jitter: 81.18ms, low: 269.07ms, high: 537.17ms)
 Packet Loss:     0.0%
  Result URL: https://www.speedtest.net/result/c/336482d1-a53c-4a2e-8fac-76717615307d

   Speedtest by Ookla

      Server: Gold Data - Miami, FL (id: 35678)
         ISP: Predlink Rede de Telecomuniccoes Ltda
Idle Latency:   108.17 ms   (jitter: 1.95ms, low: 107.36ms, high: 111.49ms)
    Download:   541.45 Mbps (data used: 965.5 MB)                                                   
                170.83 ms   (jitter: 55.21ms, low: 113.57ms, high: 227.38ms)
      Upload:   220.29 Mbps (data used: 238.2 MB)                                                   
                200.25 ms   (jitter: 62.02ms, low: 132.43ms, high: 268.18ms)
 Packet Loss:     0.0%
  Result URL: https://www.speedtest.net/result/c/492aaf3f-df91-4170-8635-7a50a83a3b25

   Speedtest by Ookla

      Server: Arias Telecom - Sao Paulo (id: 37180)
         ISP: Predlink Rede de Telecomuniccoes Ltda
Idle Latency:    11.24 ms   (jitter: 1.10ms, low: 10.29ms, high: 11.94ms)
    Download:   525.85 Mbps (data used: 732.4 MB)                                                   
                 10.24 ms   (jitter: 1.25ms, low: 8.76ms, high: 14.26ms)
      Upload:   519.85 Mbps (data used: 757.0 MB)                                                   
                  9.76 ms   (jitter: 0.93ms, low: 8.72ms, high: 16.05ms)
 Packet Loss:     0.0%
  Result URL: https://www.speedtest.net/result/c/51abf941-c492-42e8-a90d-e11388ebdd1b

Example with switch speed limiting, egress rate only, 949Mbps (ethernet at 1Gbps)

   Speedtest by Ookla

      Server: IPA CyberLab 400G - Tokyo (id: 48463)
         ISP: Predlink Rede de Telecomuniccoes Ltda
Idle Latency:   264.13 ms   (jitter: 0.98ms, low: 263.52ms, high: 265.41ms)
    Download:   571.49 Mbps (data used: 790.6 MB)                                                   
                402.97 ms   (jitter: 81.19ms, low: 270.47ms, high: 536.47ms)
      Upload:   330.17 Mbps (data used: 517.9 MB)                                                   
                396.76 ms   (jitter: 80.90ms, low: 263.07ms, high: 527.82ms)
 Packet Loss:     0.0%
  Result URL: https://www.speedtest.net/result/c/1223e82b-03b2-4238-a2ef-4d048e254cfd

   Speedtest by Ookla

      Server: Gold Data - Miami, FL (id: 35678)
         ISP: Predlink Rede de Telecomuniccoes Ltda
Idle Latency:   110.32 ms   (jitter: 0.81ms, low: 108.04ms, high: 110.52ms)
    Download:   526.92 Mbps (data used: 689.2 MB)                                                   
                162.18 ms   (jitter: 53.57ms, low: 108.92ms, high: 233.64ms)
      Upload:   531.85 Mbps (data used: 569.0 MB)                                                   
                161.91 ms   (jitter: 53.68ms, low: 108.28ms, high: 216.72ms)
 Packet Loss:     0.0%
  Result URL: https://www.speedtest.net/result/c/d64aebc2-749e-409f-b4f4-40d542ed80d8

   Speedtest by Ookla

      Server: Arias Telecom - Sao Paulo (id: 37180)
         ISP: Predlink Rede de Telecomuniccoes Ltda
Idle Latency:     9.82 ms   (jitter: 1.13ms, low: 9.01ms, high: 11.40ms)
    Download:   535.20 Mbps (data used: 664.3 MB)                                                   
                 10.83 ms   (jitter: 2.40ms, low: 8.94ms, high: 51.69ms)
      Upload:   526.79 Mbps (data used: 726.0 MB)                                                   
                  9.53 ms   (jitter: 2.94ms, low: 8.82ms, high: 241.12ms)
 Packet Loss:     0.0%
  Result URL: https://www.speedtest.net/result/c/73894259-b30e-4f28-bab0-1043ff3e09b8

Any ideas?

@patemot,

any rate-limit will always produce latency. we just need to adapt or to fine tune it to an acceptable rate. latency could happen anywhere along the path.

Now the weird part: The degradation ONLY happens if the eth1 is running at 2,5Gbps. If I set it to 1Gbps the problem goes away.

have you disable that 949mb rule when you change the 1gbps to 2.5g port? simple queue only works on interface level (if the queue is way too high - the interface simply doing forward/backward congestion notification or discard eligible bit set. dropping packet).

yes, it may affect the cpu and ram if we don’t carefully calculate the queue - but I’m sure any product datasheet has been tested and verified.

I don’t mind the (very small) induced latency. My problem is the reverse: when the connection have a higher latency (I’m in Brazil, the Japan server is about 270ms away from me) the uplink sees a significant reduction on speed, when using QoS. EVEN if the speed limit is bigger than the link speed. And the worse the latency between the hosts, the more the speed is degraded.

Take a look again at the tests I posted: You will see that the latency doesn’t change much between different runs from the same host - but the upload throughput does. And the hosts with more latency (the ones further away from me) have worse speed degradation.

the uplink sees a significant reduction on speed, when using QoS. EVEN if the speed limit is bigger than the link speed.

I’m sorry, can you be more specific about that qos type you have implemented? wred? have you classify your traffic?

you said uplink. did you mean you shaped the outgoing to internet?

And the worse the latency between the hosts, the more the speed is degraded.

yes. on 'lan" facing interface - this might be the symptoms of internal cpu buffer problem. but this could also be derived from suboptimal design as well. what kind of link your hosts connected to? ppp? wireless?

I tried everything, to no avail.
Yes, I shaped the outgoing link - the download one was fine. My ISP has about zero of bufferbloat, no need to mess about it.

But I think I made a mistake: I think my problem is the TCP window size on my desktop. I started debugging the problem as a QoS (or lack thereof) because it appeared when I migrated for a 2,5Gbps nic on my desktop.
And, indeed, if I set it to 1Gbps the problem goes away. This led me to think it was a problem of upload network buffers overflowing, not of TCP window starvation.

Sorry about Your time. :frowning:
I’ll mark this as “solved”, and will update as needed. May very well help someone in the future.

it’s nice to hear you have solved your qos problem. great :+1:t2:

no offense, but I hardly see the relationship between your original question description and your solution?

tcp window size? smaller size to bigger one makes latency smaller. great :+1:t2::smiley: and what happened to that ether1 port?

As I promised, here is what happened so far.

Router: RB5009
Internet: FTTH, XPON (module DFP-34X-2C2)
Desktop: Ryzen 5600, with mainboard Asus TUF B550M-Plus (2,5 Gbps Realtek 8125), plugged on RB5009 eth1
Second computer: Ryzen 5600, mainboard Asus TUF B450M-PRO II (1 Gbps Realtek 8111), plugged on RB5009 eth2

After changing the mainboard (the old one was gigabit), I noticed something weird: my upload to higher latency sites (about 150ms and up) was terrible. First I thought it was a simple matter of TCP window size. Never messed with it, since I rarely go to somewhere above 90ms and rarely send big files. So…

But after one full day messing with the parameters, the problem wasn’t going away. And so I found something: if I inserted 250ms delay on my local network, while synced at 2,5Gbps, I had the same behavior! Set the network to 1Gbps and the problem went away! Well, first thing solved: now my TCP window parameters where working: I can saturate the 1Gbps local link even inserting 250ms latency.

And now I can do 540/400Mbps from Brazil to Japan, on a residential 500/500 link. Not bad.

This leaves me with the weird upload problem, while synced at 2,5 and with high latency.
Ok, let’s tackle this. Inserted the 250ms on the local network, set the NIC to 2,5Gbps and started an iperf UDP, with 30Mbps speed. Good. 40Mbps. Good. 50Mbps. 20% packet loss. ONLY on the upload - if I reverse, the download can saturate the gigabit network. 100Mbps?.. 50% packet loss. 200Mbps? 75% packet loss. But only with high latency. Drop the 250ms and I can saturate the local link both ways.

And this is where I am. Don’t think it’s the RB5009 fault - 50Mbps is TOO low a speed to be a switch buffer problem. Still don’t know what is the problem. At least now I’m closer to it.