CAKE, FQ-codel etc, which ROS7 queue is best in your testing?

With all the new and now bit more reliable ROS7 features I start to think getting back using it on some of my boxes.
And one of the big steps forward and hopefully useful to me are the new available queue types to migrate a big queue tree
over to R7 and benefit from the improvements.

What is your experience with the new queue types and which one do you prefer (und what use case).
To me they look good and on paper it seems they are all much much better than the “old” ROS6 queues?

Searching the net, CAKE should be the best nowadays?
But maybe its all just good marketing?

Please provide your advice, I think this can help lot of folks in same situation as me.

What marketing? How is code provided through sponsored research and included for years in Linux kernels marketed exactly?

Searching where? The authors have a very complete website explaining the purpose of CAKE, how to use it and its limitations. Furthermore, there’s many published papers comparing queue types for different networking scenarios.

Without knowing what your are trying to achieve or solve, it’s impossible to say if CAKE is the right scheduler or not. The first thing to know about schedulers is that they require memory and CPU time - so the best for latency is always none at all i.e. avoiding congestion in the first place. Assuming we are talking about a home router connected to a WAN, then maybe CAKE will work best for you…and since it isn’t vendor proprietary code, you can try it for free with v7 or any router that supports OpenWrt.

Hi Moba,

thanks for your comments.

I am looking for real life experience from other Mikrotik users running complete systems, having many different clients
each with various applications and bandwidth needs, in very constraint WAN access speed where they share a limited bandwidth
(this is where you really need queues to keep user experience at acceptable level).

Say 30+ clients, from phones, tablets, laptops, TV, Echos to various IOT devices.

I understand that all this needs CPU, but this is why I have Mikrotik router with enough CPU to do the job…

I could overwhelm my RB4011 easily with a 400M WAN link using VLANs and badly configured mangle rules/queues. CPU is relative.

Anyway, to answer your question clearly: There is no best queue type. It’s the word “best” that made me reply and hype in general annoys me. When someone has used two things in specific scenarios, it’s easy to pick a winner. It can also be very misleading like a lot of advice and opinions given online. I’ll admit that if someone posts numbers, I’ll look at them, but I won’t base any decisions on them without carefully testing everything myself (even when it’s a peer reviewed research paper).

Old school FIFO is still used for a reason (a simple or “worst” discipline) and contrary to popular comments online, it’s not because hardware manufacturers are lazy even though large FIFO buffers are the main cause of bufferbloat. Here are my results: On my network, bFIFO works better combined with small buffers and a parent SFQ than CAKE ever has because I prefer to drop packets from various iToys, console downloads, MS update and torrents clogging up my network when I absolutely need stable low latency for work. I have zero critical packet loss and A+ on bufferbloat tests.

Do I believe that SFQ is better than CAKE? Absolutely not, but it uses a lot less CPU time because (it’s a “dumb” fair queuing algorithm).

BTW, I have used FQ-CoDel at home on various routers for years and CAKE when it was released in 2017 on OpenWrt. I can’t imagine CAKE working much better with MikroTik despite all the nagging for it - it is what it is i.e. a great tool to have for simple congestion management on home networks. Maybe RGB CAKE would solve it all for me…

Or maybe someone else has magic powers that I do not and will post the holy grail of all QoS strategies…or you could grab a few cold ones and have fun on your next day off and post your own numbers :wink:

You might find this thread relevant:

http://forum.mikrotik.com/t/ccr2004-high-cpu-usage-ros7/152163/1

Hi guys, I do not have customer complaints of slow internet (100’s of clients) due to one device downloading and hogging all allowed speed limit. This as I use Mikrotik version 6.x customer client router with a combination QOS via Mangle for VoIP and total Queue queueing that dynamically per millisecond evenly distributes the total allowed customer bandwidth. I then ensure licience free super good 2ms radio backhaul capacity (not Mikrotik) to the tower AcessPoints sectors. By doing the queueing in the customer router then one puts no CPU loading on the WISP main router. So no extra software or Cake or CoDel software needed.

@WeWiNet This subject is also interesting to me. I agree it would be nice to see more real world examples and feedback on how queues work for people. I’m new to MikroTik, and it took some time reading a lot of forums, documentation, getting bits and pieces from various places to understand queues better. Certainly not as easy as OpenWRT guides (which I never used though) that present a ready solution.

I also agree with Moba it’s probably not as simple as just saying “this one is the best, ditch the other ones”. The consensus seems to be that cake is better than fq-codel since it’s newer and more advanced. But most of user tests online are usually done on OpenWRT, which is a different platform. For one, cake in RouterOS cannot be used without HTB despite cake having its own shaper. Whether that impacts the outcome or not is another question.

In all my tests with Waveform and Flent I could never get cake to show better numbers than fq-codel, so the latter is my preferred choice for now. Only upload bandwidth was slightly better, but latency under load was worse. There are dozen ways one can configure queues, so the results might be different with different configuration or scenarios. Note that I’m using interface HTB (as opposed to global) which allows me to leave fasttrack enabled. I found that queue themselves are not using that much CPU (for my bandwidth), but it’s fasttrack being enabled vs disabled that makes a huge difference for CPU.

Attached are some graphs from the Flent tool, which also allows to compare them side by side. It would be great to see more results from other users. I found examples in other threads very helpful when trying this out.

Cake configuration

/queue type
add cake-ack-filter=filter cake-flowmode=dual-srchost cake-mpu=64 cake-nat=yes cake-overhead=18 cake-overhead-scheme=docsis kind=cake name=cake-up
add cake-diffserv=besteffort cake-flowmode=dual-dsthost cake-mpu=64 cake-overhead=18 cake-overhead-scheme=docsis kind=cake name=cake-down
/queue tree
add bucket-size=0.01 max-limit=118M name=download packet-mark=no-mark parent=bridge1 queue=cake-down
add bucket-size=0.01 max-limit=11M name=upload packet-mark=no-mark parent=ether1 queue=cake-up

Cake 2 on the graphs is the same, only flowmode is set to triple-isolate for both directions.

Fq-codel configuration

/queue type
add fq-codel-limit=1000 fq-codel-quantum=300 fq-codel-target=12ms kind=fq-codel name=fq-codel
/queue tree
add bucket-size=0.01 max-limit=118M name=download packet-mark=no-mark parent=bridge1 queue=fq-codel
add bucket-size=0.01 max-limit=11M name=upload packet-mark=no-mark parent=ether1 queue=fq-codel

ping_box.png
ping_cdf.png
cake_2.png
fq_codel_1.png
icmp_cdf.png
box_upload.png
cake_1.png
box_download.png
box_totals.png

Very good post. However the reason why you see worse udp BK ping latency and a worse average latency on the rrul test cdf for cake is that the foreground tcp traffic is getting more priority than the background traffic, which is actually what you want. The rrul_be test, or using cake besteffort, would probably show equivalence. cake’s integral shaper is better than htb + fq_codel, but in this environment cannot be easily used. The principal features of cake are per-host fairness, ack-filtering, the integral shaper, and diffserv support. If you don’t need those (and in many circumstances you don’t), use fq_codel instead.

Testing for per host fairness is in the flent rtt_fair_var tests. -H A -H B -H A -H A would show machine “B” getting roughly the same amount of bandwidth as machine A.

Ironically the ack-filter mode is giving you, oh, about 15% ? more upload bandwidth at this level of asymmetry but this comes at a cost of measured latency on this test (as there are less small packets to interleave). ack-filtering is increasingly important above 10x1, absolutely necessary at 20x1 or higher down/up ratios.

fq_codel & cake share more or less the same basic FQ and AQM algorithms. Cobalt (cake’s derivative of codel) is mildly more accurate, and has some protections against unresponsive traffic that fq_codel doesn’t.

My first goal has generally been to replace all the FIFOs and REDs and the SFQs of the world with fq_codel derived algorithms, and I am perversely glad y’all think the battle is between fq_codel and cake. :slight_smile: I’d really like more to try fq_codel or cake running natively on an interface without a bandwidth parameter or shaper… and on simpler tests than the rrul or rtt_fair tests.

The cake_2 plot above seems to show that something went wrong twice during the test - either there was other traffic, or something glitched somewhere. Doing a comparison plot of that test run will show lower bandwidth and higher latency - this is why we show the detailed results first before going to produce the summary cdf or bar chart.

cake IS more cpu intensive than fq_codel, so it ALSO is entirely possible it glitched, too! If it’s reliably glitching and fq_codel isn’t, well, that’s a good datapoint. Trying to debug or improve matters from where I sit has been difficult, and I am loving more folk doing more and more benchmarks with flent.

fq-codel-target=12ms should not be needed at speeds > 4Mbit. 5ms is the default. However does this make any difference on throughput for you?

Here are more tests. I will split them into several posts for better organization. The plots were generated on a 4K monitor with high DPI, so not sure how they would look like on lower DPI screens. The font might be a bit small, but better resolution overall, so the plots could be zoomed in.

The 12ms for fq-codel was from a recommendation I read somewhere. My unloaded pings to ISP default gateway are 9-11ms, so targeting lower latency didn’t make sense. At least that is the explanation I read, clearly it could be wrong. It doesn’t look like there is much of a difference.

Fq-codel setttings for all tests below. The 5ms target is as indicated on the plots.

/queue type
add fq-codel-limit=1000 fq-codel-quantum=300 fq-codel-target=12ms kind=fq-codel name=fq-codel
/queue tree
add bucket-size=0.01 max-limit=118M name=download packet-mark=no-mark parent=bridge1 queue=fq-codel
add bucket-size=0.01 max-limit=11M name=upload packet-mark=no-mark parent=ether1 queue=fq-codel

rrul_-_fq-codel-12ms_vs_5ms_5.png
rrul_-_fq-codel-12ms_vs_5ms_3.png
rrul_-_fq-codel-12ms_vs_5ms_4.png
rrul_-_fq-codel-12ms_vs_5ms_2.png
rrul_-_fq-codel-12ms_vs_5ms_1.jpg
rrul-2022-06-25T134216.270710.fq-codel-12ms.flent.gz (114 KB)
rrul-2022-06-25T134535.470270.fq-codel-5ms.flent.gz (115 KB)

Comparison between fq-codel and cake. I didn’t use the default docsis overhead (as in my previous tests) since I read it could be 22 instead of 18. Getting it higher than needed won’t hurt too much, but getting it lower can have an impact. This is from what I read online. So I set it manually to 22.

Cake settings for all tests. Besteffort for downloads in both tests, diffserv3 or besteffort for uploads as indicated on the plots.

/queue type
add cake-ack-filter=filter cake-flowmode=dual-srchost cake-mpu=64 cake-nat=yes cake-overhead=22 kind=cake name=cake-up
add cake-diffserv=besteffort cake-flowmode=dual-dsthost cake-mpu=64 cake-overhead=22 kind=cake name=cake-down
/queue tree
add bucket-size=0.01 max-limit=118M name=download packet-mark=no-mark parent=bridge1 queue=cake-down
add bucket-size=0.01 max-limit=11M name=upload packet-mark=no-mark parent=ether1 queue=cake-up

rrul_-_fq-codel_vs_cake_1.jpg
rrul_-_fq-codel_vs_cake_2.png
rrul_-_fq-codel_vs_cake_3.png
rrul_-_fq-codel_vs_cake_4.png
rrul_-_fq-codel_vs_cake_5.png
rrul-2022-06-25T135148.045882.cake_up_diffserv3.flent.gz (118 KB)
rrul-2022-06-25T135344.229643.cake_up_besteffort.flent.gz (118 KB)

Comparison using rtt_fair_var test. I used west and eu servers as you suggested: 3 -H switches for one server, 1 -H switch for another.

I must have done something wrong because EU server gets the same bandwidth as each of the West ones. I would expect 3 West flows total get the same amount as one EU.
rtt_fair_var_-_fq-codel_vs_cake_1.jpg
rtt_fair_var_-_fq-codel_vs_cake_2.png
rtt_fair_var_-_fq-codel_vs_cake_3.png
rtt_fair_var_-_fq-codel_vs_cake_4.png
rtt_fair_var_-_fq-codel_vs_cake_5.png
rtt_fair_var-2022-06-25T135839.408470.cake_be_fair_var.flent.gz (84.3 KB)
rtt_fair_var-2022-06-25T140500.637023.fq-codel_fair_var.flent.gz (82.4 KB)

Queue trees are disabled, no bandwidth limits. Direct interface queues are used. For upload - ether1. For download I tried to use bridge (facing my LAN), but RouterOS didn’t allow it with error “failure: non rate limit queues are useless on this interface”. So for this test I put the queue on ether2 where my test PC was connected (over an unmanaged switch). This probably wouldn’t work well in real world as then all download flows would be treated independently on each ether interface. So it’s more for academic purposes.

In this test cake was using significantly more CPU than fq-codel, 2.5-3x more, although overall utilization was still pretty low, around 10%.

I also added data from fq-codel test for comparison. The no limit configuration behaves as if no queuing technologies are applied at all, latency is through the roof.
rrul_no_limits_1.jpg
rrul_no_limits_2.png
rrul_no_limits_3.png
rrul_no_limits_4.png
rrul_no_limits_5.png
rrul-2022-06-25T141256.677726.cake_no_limits.flent.gz (120 KB)
rrul-2022-06-25T141018.542465.fq-codel_no_limits.flent.gz (120 KB)

–socket-stats will capture the tcp rtt directly for uploads and supply plot options to look at (for example) the difference between target 5ms and 12ms. While you
might think you want a little more bandwidth than what you got, smaller queues lead to better behavior in the advent of a hash collision.

The codel target is for queuing delay, not path delay. The codel interval is intended to be set to oh, 98% of your maximum observed path delay, the default of 100ms has been shown to scale well to world-girdling rtts (240ms) but tends to fall off at > 280ms.

cake ack-filter on the up will show more latency and more throughput there as I said, on your 10x1 ratio.

The circumstances where I was hoping for a line rate test was actually to/from a device running at line rate, not to your awful (yet typical) ISP link.

thx for the comparison plots! makes life a lot easier, huh? Go to a local coffee shop or hotel and see how bad that wifi gets…

As for not seeing per host fairness, are you going through nat on this router? If so, use the nat option.

Also, triple-isolate.

I loved seeing solid and better throughput on the down whilst doing the up, vs the overbuffered tests.

I think this is the correct way to use CAKE.
Just need to add CAKE on ether1 for outbound traffic and on ether2 for inbound traffic.
qt.png
cake.png
I can’t set CAKE on virtual bridge for downloading, but maybe adding a switch on physical ether2 works for it.
It may work better than using HTB+FQ_CODEL.

My bandwidth is 500M/250M (FTTH EPON PPPoE MTU=1492),upload=cake,download=queue tree+fq_codel↓↓
st.png
I found that CAKE’s reacting time for reducing latency in the beginning was always faster than htb+fq_codel.
Tested on AC2 with Fasttrack=Enabled.

Thanks for sharing, indeed, one could use different queue types for upload vs download.

I concur. As I understand it, CAKE should be used alone, it’s by design and has an integrated packet scheduler. I’ve got 2000Mb dl/600 Ul and I’m only using CAKE as an Interface Queue and I’m seeing much better results (less bufferbloat) and without it. Also, less CPU usage: on a CCR2004-16G, an upload test (600Mbits/s) consumes about 15% of CPU time, whereas using simple queues brings the CPU usage to about 19%.

thanks for sharing. ping was reduced, works fine