Community discussions

MikroTik App
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 209
Joined: Sat Aug 03, 2013 5:46 am

Now that fq_codel and cake are stable... how are we doing?

Sun Aug 14, 2022 1:46 am

I was very very happy with all your help testing and exploring fq_codel and cake on this enormous thread: viewtopic.php?p=937633
(which I don't want to add to!) and equally happy that it seems to have stabilized in current mikrotik releases. But I had questions unanswered then, that perhaps y'all, with some deployment under your belts, can answer.

my update on bufferbloat

Since that thread, ookla released a version of their speedtest apps that test for responsiveness under load: https://www.ookla.com/articles/introduc ... ed-latency
The hysterical thing about this is I've seen test after test published since (notably on starlink) where the reviewer completely missed the latency figures it now reports for a simultaneous up or download. Do any of y'all have before/after cake data from speedtest you can share?

Also since that thread I went heads down on fixing up some bugs in the fq_codel for wifi implementation in openwrt 22.03, and the fixes are now in 22.03-rc6. I hope that mikrotik has followed along with that stuff in their wifi stack(?)

I helped a fairly large ISP recently analyze a mikrotik + libreqos implementation. They started with libreqos, which made such an enormous difference that they immediately started deploying cake directly on the mikrotik cpe they had. The sudden silence of speed related calls from tech support was palpable. I wasn't able to get good stats from the middlebox before they started over to mikrotik native but truly amazing to me were:

* The sheer number of packet drops - nearly 1% at the lowest "speed" tier, led to very perceptible improvements in speed.
* a goodly number of their subscribers actually were using ecn for congestion control, effectively or not I can't tell.
* a significant of data was actually marked with a non-default dscp marking, landing in the cake voice queue. I generally see zero drops and latency in this queue so it seems like a win. I surmise this traffic is lte over wifi.

Another thing I found (using flent) was that a given "2 gbit" fiber peering arrangement actually peaked at 1.2gbit (line card limit??), and had over 250ms of buffering in it if you hit it with 16 flows, which was invisible with only a single flow. Fixed with cake. I would encourage y'all to stress test and cake your peers now that you can slap cake on your bigger links. (and I really wanted data to how high it could scale on the bigger mikrotik routers)

Open Questions

* Anyone have results on replacing their default fifos and/or sfq with fq_codel?

* Has correct documentation for how to configure cake landed anywhere on the mikrotik site?

* Is there any way to get mark and drop stats out of mikrotik after applying these algos? snmp?
... really, this is the biggest thing that makes me nuts, is not having that kind of data. I'm a data junky.

* does cake work over mpls?

Open Feature requests

* I'd dearly like mikrotik to provide a way to tune down the tx ring on older mikrotik wifi gear. It's easily 3x too big at mcs-12.

* It would be great if cake's bandwidth shaper could be used in both directions on the interface, rather than hfsc. For many users of this gear, that simplicity would make a big difference, and be lighter weight.

...

I'm going out for some grant money soon (or other funding) and if there's any features y'all want in libreqos/cake/fq_codel now is the time to ask...

@Larsa @BitHaulers @WeWiNet @gtj0 @DanielJB @mducharme @Amm0 @kevinb361 @jmszuch1 @blurrybird @ivicask
@eider @Tporlapt @WeWiNet @IPANetEngineer @jult @mke @skoenman @denisun @rooneybuk @kikikaka @ilium007 @Rfulton @felixka
@Trunkz @arm920t @Lodion @chechito @brotherdust @jbl42 @dalami @kaixinwang @syadnom @Techtress
 
User avatar
chechito
Forum Guru
Forum Guru
Posts: 2989
Joined: Sun Aug 24, 2014 3:14 am
Location: Bogota Colombia
Contact:

Re: Now that fq_codel and cake are stable... how are we doing?

Sun Aug 14, 2022 4:02 am

I think will be useful to integrate some kind of strategy or mechanism to differentiate maybe 4 kinds/priorities of traffic:

1. High-priority traffic like VoIP, and real-time gaming match traffic
2. Videoconferencing and remote management protocols.
3. Light, bursty but low-time connections like interactive traffic and speed test, also initial stage of downloads and streaming.
4. Heavy traffic like downloads, streaming, and p2p.
 
Zoxc
just joined
Posts: 17
Joined: Fri Aug 13, 2021 4:01 pm

Re: Now that fq_codel and cake are stable... how are we doing?

Sun Aug 14, 2022 4:33 am

One thing I wonder is why there isn't a total queue byte limit with per-packet overhead instead of the very vague packet limit for codel, fq_codel and cake. For fixed bandwidth links I'd like to configure them to constraint the worst case buffering to say 30 ms. codel is quite slow to act which can result in short-term bufferbloat, for example a 40 Mbps queue with 10 concurrent LAN connections (tested with crusader):
10s ipv4.png

I also saw some odd latency spikes with fq_codel:
100s ipv6.png
These test were done wired to a cAP ac running RouterOS 7.4.
You do not have the required permissions to view the files attached to this post.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 209
Joined: Sat Aug 03, 2013 5:46 am

Re: Now that fq_codel and cake are stable... how are we doing?

Sun Aug 14, 2022 6:06 pm

I think will be useful to integrate some kind of strategy or mechanism to differentiate maybe 4 kinds/priorities of traffic:

1. High-priority traffic like VoIP, and real-time gaming match traffic
2. Videoconferencing and remote management protocols.
3. Light, bursty but low-time connections like interactive traffic and speed test, also initial stage of downloads and streaming.
4. Heavy traffic like downloads, streaming, and p2p.
Both FQ_codel and cake pretty much do all that already. The power of fair queuing is that items 1 and 3 automatically get priority. Things like the early stages of tcp connections also get it as their arrival rate is usually much slower than the departure rate of all the other traffic. (paper: http://www.diva-portal.org/smash/get/di ... TEXT01.pdf )

Single big downloads also automatically get mixed in with all the other traffic and don't interfere all that much.

For those that want to mark or classify traffic, cake's addition support for diffserv marking is there. The default is diffserv3. We put in the classic categories for voice, video, best effort and background into the diffserv4 model for cake, and by and large, it's difficult to see a difference in day to day performance, A) because so little traffic is marked appropriately and B) because FQ generally works so well, and the C) the AQM makes tcp less disruptive.

I'm always looking for benchmarks that would actually show a difference, but for example we get roughly the same scores from VOIP MOS, until you hit the heaviest (artificial) loads with or without diffserv markings. Of all the diffserv markings the one I have most use for is background, but given how problematic using CS1 for that was, supported rfc8622's LE codepoint for which support went into cake in linux 5.8 (I have requested that mikrotik add that 2 character patch to cake!). Certainly (IMHO) marking whatever traffic you do not care about as background has seemed like a good idea.

Where fq_codel breaks down is against many, many flows (such as bittorrent or steam downloads). This is where the per host FQ in cake helps. Where per host FQ
then breaks down is in the case of a gamer also doing twitch, which mixes up really sensitive gaming traffic with bulk uploads. Ironically most of the "solutions" I've seen have involved finding ways to give the twitch flow more bandwidth, rather than (by appropriately marking) giving the gaming packets more responsiveness. We've talked to twitch folk about making their stuff more responsive to packet loss and delay a couple times.

The big gaping point in most of our knowledge about how well cake performs is against various videoconferencing protocols. None are marked by default.All have been designed with really enormous jitter buffers and have pretty absurd peak delays today because they were aimed at FIFOs. Yet: All (with one glaring exception) seem to work pretty darn good without classification.(more data needed). Facetime, however, is *really oversensitive* to packet loss, and worse, reacts to it by quadrupling its rate and doing extensive FEC. Instead of addressing that Apple is pushing the ietf to make a change to how RFC3168 ECN works (rather than trying to use how it exists today). How this will play out.... no idea.

Most of our research into better videoconferencing is centered on the galene.org videoconferencing server, because it's easy to hack on.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 209
Joined: Sat Aug 03, 2013 5:46 am

Re: Now that fq_codel and cake are stable... how are we doing?

Sun Aug 14, 2022 6:30 pm

One thing I wonder is why there isn't a total queue byte limit with per-packet overhead instead of the very vague packet limit for codel, fq_codel and cake. For fixed bandwidth links I'd like to configure them to constraint the worst case buffering to say 30 ms. codel is quite slow to act which can result in short-term bufferbloat, for example a 40 Mbps queue with 10 concurrent LAN connections (tested with crusader):
10s ipv4.png


I also saw some odd latency spikes with fq_codel:
100s ipv6.png
These test were done wired to a cAP ac running RouterOS 7.4.
Crusader is coming along pretty well! I hope more people use it! I'd really like the author(s) to add staggered start test more like these http://caia.swin.edu.au/reports/140630A ... 40630A.pdf as starting up 10 big flows at exactly the same time is not representative of real traffic. What matters if a link is saturated is that a new flow can smoothly enter, get it's fair share, and exit. Most real world flows (such as web traffic and dns) are very very short.

Secondly crusader is measuring the latency within a flow (which is GREAT, most of our other tools don't do that), not the latencies between sparser flows. Making TCP behave better is in the province of other algorithms such as BBR (and BBR behaves really badly when all started at the same time) and much other congestion control research such as CDG, ledbat, etc etc. My perfect world would revert to a kinder, gentler version of TCP reno, as the default of cubic today and it's decay pattern (.7 rather than half) have IMHO been thoroughly shown to really hurt other traffic on the internet.

Now to answer two of your questions -

"why there isn't a total queue byte limit with per-packet overhead". In a sense, there is - memlimit parameter. However that parameter is in there to *protect the router*, and is actually accounting for the total number of bytes used by the allocation. Your typical device uses a 2k "slab" per packet, no matter the actual packet size.

People are perpetually also using a smaller packet limit than they should on fq_codel - I see 200 a lot - which is 200 64 byte packets, or 200 64k packets (in the case of GRO)

Bytes are a rough proxy for time and at 10Mbit, a 1500 byte packet takes 1.3 ms. Measuring real packet size by bytes and queuing that (as in a bfifo) is VASTLY to be preferred over a pfifo! But... the other issue is we wanted the AQM to be the dominant function here, and accepting short bursts is part of the nature of how the internet works. We also needed the algorithm to be no-knobs and be on by default over the widest possible number of actual bandwidths (the 32MB memlimit starts becoming a problem at about 20Gbit) Possibly over-opinionated on our part! If I could do nothing else but recommend the internet switched from pfifos to bfifos I would, but
FQ, aqm, etc, seemed to be a bigger win.

A way to reduce the size of the bursts is to use a shorter interval (the recommendation is that the interval be set to the max 98% rtt you see in your network).

Using a shorter target (5-10% of the interval) is ok too, but all kinds of noise enter the system at target settings much below a ms. But we see 500us/5ms in use in some data centers.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 209
Joined: Sat Aug 03, 2013 5:46 am

Re: Now that fq_codel and cake are stable... how are we doing?

Sun Aug 14, 2022 6:43 pm

In looking at your crusader data a couple notes made by inference (can you post your config?)

A) On inbound, shaping is less effective than outbound. It's the nature of the beast. You are going to overshoot at least 200ms with codel in place on a test this extreme,
and taking 4 seconds to get these flows under control at this short rtt is not out of line. Cake has a more abrupt response to that much slow start, might - IF you could use it's shaper rather than TBF, on inbound traffic, only peak at around 80 or 120ms rather than 200ms on this test.

This is one of those "developments" in the TCP world really overshot what we could do at the AQM level: The world switched to IW10, which made sense (at the time) for some web traffic -- https://datatracker.ietf.org/doc/id/dra ... ul-00.html and since then pacing has been added, which helps, but then folk started taking on even more packets to the IW (I've seen IW70 from some CDNs). A bigger impact on codel's basic assumptions is that it is tuned to send traffic "around the world", at the default 100ms interval it only starts degrading "bandwidth equality" at about 260ms RTT.

Your token bucket burst could be overlarge. Don't like token buckets... But a key thing to note about them is when going from 0 to X they are bursty. If you already at X, new flows should (and there's a flent test for this) enter it without incurring anywhere near as much delay for themselves.

Still, at short RTTs codel takes a *really long time* to find the right drop or marking rate. It helps a lot to have a test showing this to be less of a problem in the real world - the staggered start tests I linked to earlier. There is work (L4S) ongoing in the IETF to change ECN to have a more rapid signal than it does today. It's got tons of problems that I don't want to talk to today.

Cake has a RTT parameter.

You can indeed fiddle with the interval to something closer to your typical max RTT, but be aware that that hurts longer distance connections while doing so.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 209
Joined: Sat Aug 03, 2013 5:46 am

Re: Now that fq_codel and cake are stable... how are we doing?

Sun Aug 14, 2022 6:46 pm

Your last result is really puzzling, though. 30sec of oscillation like that... this on ethernet or wifi? this was to, rather than through? What if you tune the inbound shaper down a bit more (at least another 10% to start with). I wish crusader would distinctly show the up vs down latency on this last panel...

Nobody (until recently) was testing up + download behaviors enough.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 209
Joined: Sat Aug 03, 2013 5:46 am

Re: Now that fq_codel and cake are stable... how are we doing?

Sun Aug 14, 2022 6:53 pm

@zoxc I just was about to file a feature request for staggered start over here: https://github.com/Zoxc/crusader and then I realized you were the author!!!

GREAT WORK. Nice to meet ya! Thx for writing this tool.
 
User avatar
chechito
Forum Guru
Forum Guru
Posts: 2989
Joined: Sun Aug 24, 2014 3:14 am
Location: Bogota Colombia
Contact:

Re: Now that fq_codel and cake are stable... how are we doing?

Sun Aug 14, 2022 7:04 pm

some simple guide how to install and use crusader for Linux illiterate people like me?
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 209
Joined: Sat Aug 03, 2013 5:46 am

Re: Now that fq_codel and cake are stable... how are we doing?

Sun Aug 14, 2022 9:20 pm

are the differences between your ipv6 and your ipv4 run repeatable?
 
Zoxc
just joined
Posts: 17
Joined: Fri Aug 13, 2021 4:01 pm

Re: Now that fq_codel and cake are stable... how are we doing?

Mon Aug 15, 2022 1:21 am

Secondly crusader is measuring the latency within a flow (which is GREAT, most of our other tools don't do that), not the latencies between sparser flows.
You lost me a bit there. crusader uses a dedicated UDP connection to measure latency.

In looking at your crusader data a couple notes made by inference (can you post your config?)
cAP ac is using the default config (acting as a typical firewall) and I added a simple queue with 40 Mbps up and down max limits with codel / fq_codel queues (both at their defaults). crusader is running on 2 PCs (with Fedora 36) wired in to the LAN and WAN ports.

A) On inbound, shaping is less effective than outbound. It's the nature of the beast.
There isn't a ISP with bad queueing involved in my test setup, so I don't see why that would happen. I suspect one of the PCs may produce a more synchronized TCP startup than the other.

I don't get why token buckets would generate additional latency (for my test setup). There does seem to be extra latency when using HTB to shape vs. using cake to shape however.

CAKE shaping (10 connections):
10s ipv4.png

HTB shaping with CAKE queue (10 connections):
10s ipv4.png

are the differences between your ipv6 and your ipv4 run repeatable?
The last fq_codel test is done with 100 connections on IPv6, so it's not really comparable to the first. It could perhaps be explained by hash collisions between connections with loads and the UDP latency measurement connection where the loading connections turn off and on. It happens on every IPv6 test so far and on some IPv4 tests. I've not seen it with cake. Here's a more typical IPv4 result:
100s ipv4.png

some simple guide how to install and use crusader for Linux illiterate people like me?
Not more than the readme. You can find binaries at https://github.com/Zoxc/crusader/tags however.
You do not have the required permissions to view the files attached to this post.
 
Zoxc
just joined
Posts: 17
Joined: Fri Aug 13, 2021 4:01 pm

Re: Now that fq_codel and cake are stable... how are we doing?

Tue Aug 16, 2022 8:54 am

I managed to reproduce some fq_codel spikes with flent (100 connections on IPv6):
flent 100s ipv6 fq-codel.png

I also implemented measurement of up and down latencies in crusader and redid some fq_codel tests (100 connections on IPv6). 5 tests had spikes on the down latency and 1 on the up latency.

Down latency spike sample:
100s ipv6 fq-codel -5 gui.png

Up latency spike (behaves more like I'd expect from a hash collision):
100s ipv6 fq-codel -3 gui.png

I also redid the codel test with 10 IPv4 connections. You can see the latency oscillate between the up and down directions when both directions are loaded:
10s ipv4 codel gui.png
You do not have the required permissions to view the files attached to this post.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 209
Joined: Sat Aug 03, 2013 5:46 am

Re: Now that fq_codel and cake are stable... how are we doing?

Tue Aug 16, 2022 6:49 pm

I am on vacation and in general far from internet. Expect sparse replies if any for the next week or so.

1) I had thought crusader was sampling TCP_INFO, not using another measurement flow. My bad. The "packet loss" you are reporting is actually "measurement packet loss", not the loss within the tcp flows? When you get a FQ'd network to a state where sparse measurement packets are being lost, you know you are in trouble.

2) There is a deeply philosophical question about how queues and packets should behave on a 0 length path. Going back to a 10Mbit example, 1.3ms per 1500 bytes,
gives you enough room for 4 packets with 5.2ms of queueing. Hit that with 100 flows instead. Where are they supposed to go? There's no "path" to store them in in this case.

a) grow the queue large enough to accommodate 1 packet to all the flows that wish to transit.

b) allow temporarily new flows to transit while hammering harder at the old ones

c) tail drop everything over some arbitrary limit all the time.

d) the perfect answer?
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 209
Joined: Sat Aug 03, 2013 5:46 am

Re: Now that fq_codel and cake are stable... how are we doing?

Mon Aug 22, 2022 9:18 pm

I managed to reproduce some fq_codel spikes with flent (100 connections on IPv6):
flent 100s ipv6 fq-codel.png


I also implemented measurement of up and down latencies in crusader and redid some fq_codel tests (100 connections on IPv6). 5 tests had spikes on the down latency and 1 on the up latency.

Down latency spike sample:
100s ipv6 fq-codel -5 gui.png


Up latency spike (behaves more like I'd expect from a hash collision):
100s ipv6 fq-codel -3 gui.png


I also redid the codel test with 10 IPv4 connections. You can see the latency oscillate between the up and down directions when both directions are loaded:
10s ipv4 codel gui.png
To comment on your first graph. This is case where at least some flows, couldn't get a syn/ack pair through to start until much later in the test.

I have no idea what went south at t+14. It's "interesting". One of my missions in life is to convince folk that the anomalies are very real, and they shouldn't filter them out but investigate further. Taking that approach would lead to the discovery of the cosmic background bufferbloat radiation: https://theconversation.com/the-cmb-how ... erse-45126

In the other graphs - like trying to pound 100 flows through a 4 flow pipe - well, that's why the generic recommendation is to try and use as few flows as possible to get the job done and let congestion control fill the pipe. The IW10 standard I always thought was far, far too much for most networks, even when it was first proposed. https://datatracker.ietf.org/doc/id/dra ... ul-00.html

PS thx for adding the staggerd start facility and working on an android version of crusader!!!!!!
 
Zoxc
just joined
Posts: 17
Joined: Fri Aug 13, 2021 4:01 pm

Re: Now that fq_codel and cake are stable... how are we doing?

Thu Aug 25, 2022 2:54 pm

The packet loss measured by crusader is indeed measurement packet loss. I'd probably instrument a QUIC implementation if I wanted some portable metrics. I kind of aim crusader to be more iperf-like with latency metrics than something more flexible like flent though.

I have no idea what went south at t+14. It's "interesting".
The latency spike on the flent graph seems similar to the spikes on the crusader graphs to me, what makes it different?
 
User avatar
sirbryan
Member Candidate
Member Candidate
Posts: 298
Joined: Fri May 29, 2020 6:40 pm
Location: Utah
Contact:

Re: Now that fq_codel and cake are stable... how are we doing?

Fri Aug 26, 2022 5:45 pm

I helped a fairly large ISP recently analyze a mikrotik + libreqos implementation. They started with libreqos, which made such an enormous difference that they immediately started deploying cake directly on the mikrotik cpe they had. The sudden silence of speed related calls from tech support was palpable. I wasn't able to get good stats from the middlebox before they started over to mikrotik native but truly amazing to me were:

* The sheer number of packet drops - nearly 1% at the lowest "speed" tier, led to very perceptible improvements in speed.
* a goodly number of their subscribers actually were using ecn for congestion control, effectively or not I can't tell.
* a significant of data was actually marked with a non-default dscp marking, landing in the cake voice queue. I generally see zero drops and latency in this queue so it seems like a win. I surmise this traffic is lte over wifi.
Were the changes you noticed after LibreQOS but before Cake on CPE, or after both? Also, that bit about the using Cake with peering ports, was that toward the provider, or towards the customers, or both?

I'm looking to deploy LibreQOS soon in my core, and could deploy Router OS 7 on the CPE routers. Only issue is how to apply Cake properly on the home router's interfaces without a central queue to combine the Wifi and four ethernet ports.
 
Keljian
just joined
Posts: 6
Joined: Tue Sep 29, 2015 5:24 am

Re: Now that fq_codel and cake are stable... how are we doing?

Thu Sep 29, 2022 2:49 pm

First of all, Thankyou for all the work you have done.

Second, you asked for requests, please work on optimising cake so it is less processor heavy. Hex devices and other marginal devices would greatly benefit from this. A few quick wins will pay off manyfold.

This would also allow for some devices to be uprated to gigabit speeds which are becoming common in some countries.
 
CTSsean
Frequent Visitor
Frequent Visitor
Posts: 61
Joined: Fri Sep 15, 2017 12:56 pm

Re: Now that fq_codel and cake are stable... how are we doing?

Sun Oct 09, 2022 3:55 am

Here's been my experience....

If I use a simple queue using the max limit in the simple queue itself, Cake / FQ_Codel seems to work well enough.
If I use a simple queue using the limits inside the Cake type itself, during testing, after the download queue has been maxed out, when testing the upload queue, IP communication on the router (RB5009) completely stops. All interface / all vlans. (which of course kicks me out of winbox)

I have to get back into the router via mac winbox. Then I have to disable the queue to return IP connectivity.
Here is my config.

/queue type
add cake-bandwidth=550.0Mbps cake-mpu=64 cake-overhead=18 cake-overhead-scheme=docsis kind=cake name=CakeDefaultDownload
add kind=fq-codel name=FQ_Codel
add cake-bandwidth=22.0Mbps cake-mpu=64 cake-overhead=18 cake-overhead-scheme=docsis kind=cake name=CakeDefaultIUpload
add cake-mpu=64 cake-overhead=18 cake-overhead-scheme=docsis kind=cake name=CakeDefault
/queue simple
add comment="WORKS" disabled=yes max-limit=22M/550M name=MainInternetQueue queue=CakeDefault/CakeDefault target=10.69.0.0/16
# CAKE type with bandwidth setting detected, configure traffic limits within queue itself
add comment="DOES NOT WORK" disabled=yes name=CakeBandwith queue=CakeDefault/CakeDefaultDownload target=10.69.0.0/16
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 209
Joined: Sat Aug 03, 2013 5:46 am

Re: Now that fq_codel and cake are stable... how are we doing?

Tue Oct 11, 2022 6:28 am

you can't use simple queues and cake's bandwidth parameter at the same time.
 
Zoxc
just joined
Posts: 17
Joined: Fri Aug 13, 2021 4:01 pm

Re: Now that fq_codel and cake are stable... how are we doing?

Sat Oct 15, 2022 6:40 am

I was able to produce a similar issue to what CTSsean saw their router.

I reset my cAP ac (on RouterOS 7.5) to factory defaults. I created a new CAKE queue with a 50M bandwidth limit. I created a new simple queue with target 192.168.88.0/24 and the CAKE as the download queue type. To reliably trigger the issue I have to log in to WinBox over IP.

The download queue's queued packed count keeps increasing, so it seems like CAKE's shaper has stopped pulling packets from the queue for some reason.
 
Jutolas
just joined
Posts: 3
Joined: Mon Oct 03, 2022 9:23 am

Re: Now that fq_codel and cake are stable... how are we doing?

Sat Oct 15, 2022 10:39 am

@dtaht Hi!
I have a home router running the CHR v7.5
PPPoE Client on ether1, and other interface is bridged.
With no queueing, the maximum bandwidth under **NAT** is 600+Mbps download and 70+Mbps upload.
I took a closer look at your replies in other threads and set up the CAKE queue tree as below:
/queue type
add cake-diffserv=diffserv4 cake-flowmode=triple-isolate cake-memlimit=32.0MiB cake-rtt=60ms cake-overhead-scheme=ethernet cake-nat=no kind=cake name=cake_rx
add cake-diffserv=diffserv4 cake-flowmode=triple-isolate cake-memlimit=32.0MiB cake-rtt=60ms cake-overhead-scheme=ethernet cake-nat=yes kind=cake cake-ack-filter=filter name=cake_tx

/queue tree
add comment="qosconf: download queue with cake" bucket-size=0.05 max-limit=500M name=cake_download packet-mark=no-mark parent=bridge1 queue=cake_rx
add comment="qosconf: upload queue with cake" bucket-size=0.03 max-limit=50M name=cake_upload packet-mark=no-mark parent=pppoe-out1 queue=cake_tx
Now I’ve got very ideal latency on upload, but download latency is unstable.

I'm confused about my setup right now
1. With PPPoE, should I set the parent interface to ether1 or pppoe-out1 ?
2. With NAT, for download queue, besteffort & dual-dsthost is better then diffserv4 & triple-isolate ?
3. Should I enable NAT Option on download queue when parent interface is set to bridge1?
4. With fiber-to-the-home, cake-overhead-scheme is set to ‘ethernet’ , I've tested overhead 42/44 mpu 84 but I got even worse latency.

Seeking guidance !
 
SpiritedPear
just joined
Posts: 1
Joined: Fri Jun 02, 2023 8:08 pm

Re: Now that fq_codel and cake are stable... how are we doing?

Fri Jun 02, 2023 8:19 pm

@dtaht Hi!
I have a home router running the CHR v7.5
PPPoE Client on ether1, and other interface is bridged.
With no queueing, the maximum bandwidth under **NAT** is 600+Mbps download and 70+Mbps upload.
I took a closer look at your replies in other threads and set up the CAKE queue tree as below:
/queue type
add cake-diffserv=diffserv4 cake-flowmode=triple-isolate cake-memlimit=32.0MiB cake-rtt=60ms cake-overhead-scheme=ethernet cake-nat=no kind=cake name=cake_rx
add cake-diffserv=diffserv4 cake-flowmode=triple-isolate cake-memlimit=32.0MiB cake-rtt=60ms cake-overhead-scheme=ethernet cake-nat=yes kind=cake cake-ack-filter=filter name=cake_tx

/queue tree
add comment="qosconf: download queue with cake" bucket-size=0.05 max-limit=500M name=cake_download packet-mark=no-mark parent=bridge1 queue=cake_rx
add comment="qosconf: upload queue with cake" bucket-size=0.03 max-limit=50M name=cake_upload packet-mark=no-mark parent=pppoe-out1 queue=cake_tx
Now I’ve got very ideal latency on upload, but download latency is unstable.

I'm confused about my setup right now
1. With PPPoE, should I set the parent interface to ether1 or pppoe-out1 ?
2. With NAT, for download queue, besteffort & dual-dsthost is better then diffserv4 & triple-isolate ?
3. Should I enable NAT Option on download queue when parent interface is set to bridge1?
4. With fiber-to-the-home, cake-overhead-scheme is set to ‘ethernet’ , I've tested overhead 42/44 mpu 84 but I got even worse latency.

Seeking guidance !
Hi,
A little late for you perhaps. I'm new to Mikrotik and not an expert on this stuff also, so take with a hint of salt.

I applied my cake queue to the ether1 (which pppoe-out1 uses). I set my overhead parameters to PPPOE (since I'm using PPPOE), and RTT-Scheme to Internet (100ms). I believe the RTT-scheme is the parameter that controls how cake detects when there is an issue that needs to be mitigated. Bandwidth is set to 110M and I disabled 'auto-rate ingress'.

I use diffserv4, and I have NAT and triple-isolate enabled since I want it to treat each flow individually and to 'look behind the NAT' so to speak.

With this configuration, I am able to get an A on bufferbloat: https://www.waveform.com/tools/bufferbloat. My connection is 1gbps down, 115mbps up, and the loaded tests get a +6ms added latency on the download, and +0ms on the upload.

I have also applied a second cake queue with bandwidth set to 1000M on ether5 which is the router port to which I have connected the rest of my Home network including wifi APs. On this second cake queue, I have configured an RTT-scheme of Metro (10ms). I am not sure if this has much impact on things, but it doesn't hurt.
 
evergreen
just joined
Posts: 12
Joined: Tue Mar 07, 2023 9:41 pm

Re: Now that fq_codel and cake are stable... how are we doing?

Tue Nov 07, 2023 8:17 am

Hi, going to give a CAKE experience report here with setting up CAKE.

Thanks
First, thank you so much to @dtaht for all your hard work on this. Thanks for daring to land a patch on the entire internet.

My Hardware:
RouterOS 7.11.2 on RB5009UG+S+

ISP setup:
I have a 1G/1G GPON fiber subscription
It's delivered GPON box with 1000baseT copper.
The service uses a VLAN which the rb5009 supports using an L2MTU of 1514, so the VLAN MTU is 1500
The service also uses PPPoE, so it's a 1492 MTU (after 8 bytes of PPPoE overhead)

Config:
# by RouterOS 7.11.2
# model = RB5009UG+S+
/queue type add cake-mpu=84 cake-overhead=46 cake-rtt-scheme=internet kind=cake name=cake-eth-pppoe-custom
/queue tree add comment="Try tree cake" max-limit=825M name=cake_rx packet-mark=pmark-cake parent=pppoe-out1 queue=cake-eth-pppoe-custom
/queue tree add comment="Try tree cake" max-limit=825M name=cake_tx packet-mark=pmark-cake parent=br-vlans queue=cake-eth-pppoe-custom
/ip firewall mangle add action=mark-connection chain=forward comment="Mark for CAKE" connection-mark=no-mark in-interface-list=wan-ports new-connection-mark=cake_conn
/ip firewall mangle add action=mark-connection chain=forward comment="Mark for CAKE" connection-mark=no-mark new-connection-mark=cake_conn out-interface-list=wan-ports
/ip firewall mangle add action=mark-packet chain=forward comment="Mark for CAKE" connection-mark=cake_conn new-packet-mark=pmark-cake
I also tried these, and they seemed not to work as well, but were probably all within the margin of error for the web tests:
/queue type add kind=cake name=cake-default
/queue type add cake-mpu=72 cake-overhead=8 cake-rtt-scheme=internet kind=cake name=cake-eth-pppoe
/queue type add cake-mpu=84 cake-overhead=38 cake-overhead-scheme=ethernet cake-rtt-scheme=internet kind=cake name=cake-eth-pppoe-conservative
/queue type add cake-mpu=84 cake-overhead=46 cake-rtt=30ms cake-rtt-scheme=regional kind=cake name=cake-eth-pppoe-custom-regional

Performance of my setup

I was able to get a pretty consistent A+ grade on the waveform web test with ~795 in either direction. It was better than the ~650 I was getting with simple queues

Experience Report:
This is my second day using ROS queues and I find the documentation not cohesive or complete. There's not shortage or words but I just don't see how it all fits together. E.g. what does a simple queue total do? Why does selecting `default-small` as a total queue for CAKE appear to disable CAKE when running the waveform.com speed test? It seems like I first need to understand the old ways of queue use first before jumping into CAKE. I chalk this up to my inexperience with queues.
  • After finding simple queues confusing, I found trees easier to edit in the CLI than simple queues; made A/B testing easier.
  • Another funny aspect I noticed is that setting a `cake-overhead-scheme` is a sort of macro that locks in some of the other parameters.
  • This may be a UI bug, but webfig seems to add targets to simple queues if you only have one, the CLI doesn't do this, two of the same target is allowed, and it's not clear what happens when you have that configuration
  • Nowhere did I see docs explain how to monitor a queue in the CLI, I just stumbled on printing stats. Not bad docs, just hard learning curve.
  • The max-limit in a tree queue seems to be the main determining parameter to whether my setup bloats or doesn't
Cake related:
  • I was able to get CAKE working and see the effects at https://www.waveform.com/tools/bufferbloat
  • I could never tell if any overhead or RTT settings mattered. It could be my setup. If I select a PPPOE interface with `target=pppoe-out1`, does the cake code get to know that my MTU is 1500-8 for that interface? Is that even how it works?
  • I still have no idea what overhead settings to use with a PPPOE interface as a target.
  • Most cake config feels like a black box, never really made it bast black-box debugging and trial-and-error.
  • I just can't get enough information out of the ROS CLI to determine what is going on and what the settings are sending to the CAKE code. I feel fairly technical and I get the sense that there needs to be a `monitor` feature for the cake queue so we can get some knowledge of the internals. My sense of what I need to do to work with this effectively is just to understand CAKE itself well enough to infer how mikrotik has implemented it and is letting us access the core features. I'm not at that point yet and TBH I think it should be easier to understand if my config meets my intent and I'm not botching overheads, rtts, etc.
  • I think my issues are a seeing/debugging/information problem. Cake clearly works great. It's amazing!
  • Maybe if I knew how to diagnose CAKE with wireshark I'd be in business, but I was too lazy to get a clean capture setup using multiple machines.
  • I think this would be an exciting feature worth highlighting if MikroTik put a little more effort into the Ui/UX of interacting with this kind of queue.
Again I'm a queue newbie and this is just my experience. In the end I got it working and I'll probably leave it on.
 
User avatar
Taverius
just joined
Posts: 2
Joined: Wed Jan 24, 2024 8:24 pm

Re: Now that fq_codel and cake are stable... how are we doing?

Wed Jan 24, 2024 11:23 pm

I'll echo @evergreen on the documentation front.

Coming from an Arista/Untangle setup on my previous VDSL2 where all I had to do was check a box and put in 2 numbers, this has been a fraught couple of days with many, many tabs open. Also sleep deprivation.

Anyway, this works well here, +4 down +1 up according to the waveform test; if anyone wants to take a look and see if there's anything egregious below, you have my thanks.

I'm just a simple end-user that streams a lot of video and plays online games with family members that love netflix and such.
# 2024-01-24 22:04:35 by RouterOS 7.13.2
# software id = 
#
/queue type
add cake-ack-filter=filter cake-diffserv=diffserv4 cake-mpu=84 cake-nat=yes \
    cake-overhead=42 cake-overhead-scheme=ethernet,ether-vlan cake-rtt-scheme=\
    internet kind=cake name=cake-up-simple
add cake-diffserv=diffserv4 cake-mpu=84 cake-nat=yes cake-overhead=42 \
    cake-overhead-scheme=ethernet,ether-vlan cake-rtt-scheme=internet kind=cake \
    name=cake-down-simple
/queue simple
add max-limit=1625M/525M name=cake-simple queue=cake-down-simple/cake-up-simple \
    target=pppoe_wan
Simple queue attached to the WAN, which I hear can have CPU overhead (?) compared to a tree, but frankly this is CHR on an i7 Proxmox host (a Protectli 2.5Gx6-port box) and I have room to run a link ~4x as fast before the CPU starts getting stressed, and I also don't really have the networking knowledge to mess around with custom marking things, so I figure why bother?

Is there another reason why I should switch to trees given that?

The only thing that bothers me is that I have to set the up-queue on the down-queue field and vice-versa, and my OCD will live.

Link is maximally 2.5G/1G but even bare it chokes not far above what I'm getting here, if I add 25M to either side it spike ping and plateaus speed; I never saw more than +10 on the up ping though, so they're doing some decent SQM of their own on that side of the link. Down is a nightmare and can go +100 or more unmanaged.

Dimensione ISP in Italy, the ONT listens to VLAN100 so I started with ethernet and added ether-vlan until things got worse, which happened at 2.

There may be a better ISP with a more local datacentre that would get me closer to link speed, but these guys were willing to bother the national fiber installer to hook me up to the cable laid down this summer in front of my house while nobody else showed me as eligible for FTTH, have static IPv4, native IPv6 with a 48 prefix, and I'd still be rolling the dice on congestion and over-provisioning so eh?

Who is online

Users browsing this forum: Bing [Bot], Google [Bot] and 60 guests