Page 1 of 1

some quick comments on configuring cake

Posted: Sun Oct 10, 2021 3:29 pm
by dtaht
Hi, one of the contributors to cake here. I'm pleased y'all are finally shipping it, but I have a few comments:

* A modern version of cake has support for the new diffserv LE codepoint. I'd dearly like support for that in mikrotik given how problematic CS1 proved to be, and it's a teeny patch.

* One feature of cake is that it runs the same whether at line rate, or with the shaper enabled, so you can get per-host/per flow fq, diffserv classification, etc. I'm very interested in learning of results when you try to run it or fq_codel at line rate, rather than shaped. fq_codel is the default on all interfaces, rather than pfifo_fast, in most linuxes today. I would really like it
if people put it through a battery of flent rrul tests or heavy iperf, and took captures, and plotted rtts, particularly on the higher end mikrotik hw. It is most useful with working BQL in the device driver.

* is missing support for the gso-splitting option. When using the shaper component, below 1gbit, gro "super"packets are automatically split up back into packets (and then interleaved with other flows), when unshaped, or above 1gbit, they are not. If you've got the cpu, split up superpackets.

* If you are natting at the router, try the nat option. This does not work with some forms of offloaded nat.

* If you have major bandwidth asymmetry on a link (greater than 10x1), try the ack-filter option on the slower part of the link. It gets to be a hugely *necessary* idea at ratios higher than that, see:

* It would be nice if mikrotik had some way of polling and displaying statistics from fq_codel for backlog, reschedules, drops, and marks, and from cake for the same. Exposing these statistics to more users would drive understanding of the role of packet loss (and marking) in controlling network delay. tc supports json output, multiple tools can parse that. See the enthusiasm for collecting stats over in the starlink community... I would love to see at the very least, drop stats out of the mikrotik userbase.

* When shaping dsl especially, it's very important to get the link type "framing" right, but also useful on cablemodems to set the docsis parameter. You can get hard up against the actual configured cablemodem rate in particular in this way instead of wasting 5-15%, and in the dsl case it is *impossible* to get a consistent shaped rate unless you set it right, or at least, conservatively. I mean that. Impossible to get some forms of dsl right unless you compensate.

* If you aren't going to use diffserv, use cake besteffort, to save on memory and cpu. To save on cpu further, don't use the ack-filter or nat options.

* There are a bunch of per host/per flow fq options that are dependent on your use cases for regulating traffic between ip addresses or ports.

* Use wash on ingress when you don't trust the diffserv markings from upstream. This a pretty heavy hammer, and it is preferred that y'all communicate with your customers about how you treat diffserv and let them optimize their own traffic, only remarking from 0 (best effort) to something else if you need to. There is a published guide to zoom traffic, among others. Wash on egress if you aren't following the relevant RFCs.

* Cake tries really hard to follow a bunch of mutually conflcting diffserv RFCs, and in an age where videoconferencing is very important the cake diffserv4 model is closer to how a wifi AP treats it. see: for this underused facility in webrtc.

* Despite saying all this about diffserv it generally ranks dead last as an optimization technique verses better statistical multiplexing from FQ, and the short queues you get from an AQM.

I should stress that these are options and are optional, aside from getting shaped dsl compensation right, the cake defaults are pretty good.

Other notes:

* Telling your customers how they can have better wifi at home is useful also! In most cases the bufferbloat starts to shift to the home wifi at above 40mbit, and no matter all the contortions you've done here to manage your bandwidth to/from them better, everybody benefits from better home routers with sqm on the link and fq_codel on the wifi: ... y-3Nov.pdf

* The cake mailing list is the best place to ask questions or make feature requests: - see also the archives there or on the related "Bloat" mailing list. Cake is the most advanced smart queue management (SQM) system, we've been able to design, as yet: ... anagement/ and whilst we initially targeted it at cpe and home gateways it is certainly proving useful in the middle of an ISP's network. We are very interested in feedback as to how to make it, or something like it, better for ISPs. One example (that I have NO idea how to make work on mikrotik) is here:

* There are multiple academic papers on how fq_codel and cake actually work, the best summary of most of the things we did to beat bufferbloat in linux is in the online book; - but feel free to hit google scholar for "bufferbloat", and the cobalt AQM.

* I'm really big on explaining the why (in addition to the how, above), at various levels, including entertaining ones like this: ... -over-yet/

Re: some quick comments on configuring cake

Posted: Sun Oct 10, 2021 3:33 pm
by Larsa
Dave, thanks for very useful tips! Should be included as "best practice" in the ros documentation.

Re: some quick comments on configuring cake

Posted: Sun Oct 10, 2021 8:44 pm
by dtaht
To give an example of where I'd hoped to see fq_codel or cake make more of a dent in the mikrotik universe, consider a topology like this:

10Gbit -> 1GBit port A
-> 1Gbit port B
10 more ports

In ANY fast->slow rate transition fair queuing, and aqm, can soften the impact of that 10Gbit interface (or multiple 1Gbit interfaces) fed into 1Gbit Port A here, achieving near zero latency for sparse flows and ultimately 5ms or less for incoming traffic. It's a complete unknown how deep the buffers are on those 1Gbit ports throughout the world, (or in any stepdown) but I strongly suspect they are far deeper than 5ms, and few have anything other than a FIFO on them. This recent paper was good:

Some offload engines for switches have gained RED of late, but that's still finicky to configure. The bulk of the bufferbloat effort has been on fixing the last mile, but we are seeing deep within the
ISP's network, signs of bloat there, also.

Re: some quick comments on configuring cake

Posted: Tue Oct 12, 2021 6:05 am
by BitHaulers
Do you have any tips for LTE connections? Especially ones that go from ~5Mbps to 70Mbps in a few hours? The auto ingress doesn't always act as I'd expect it to, and I'm not sure if it's RouterOS' implementation, or a bug, or me not understanding things.

Re: some quick comments on configuring cake

Posted: Tue Oct 12, 2021 10:53 am
by WeWiNet
Hi dtaht,

thanks for posting all this usefull information. I asked already in seperate post a bit in this direction but you really provide
massive data (which half I don't yet fully understand).
But it shows that cake is a complex tool, which is worth learning more how to use it.

Don't assume all Mikrotik affinados are queue/cake experts. Please make it (if possible) simple so all can benefit a max from your

What do you mean exactly with :
When shaping dsl especially, it's very important to get the link type "framing" right
This is one of my use cases where queuing is really really important. Can you give short example for say link of 5M down and 800k up (or whatever you want to use)

The other question from Bithaulers
any tips for LTE connections? Especially ones that go from ~5Mbps to 70Mbps in a few hours?

is also very valid, same problem again on my side. LTE (and soon 5G even worse) is a medium where in 24h the "pipe" itself changes heavily.
In this situation it is really hard to define the pipe size and do queueing with fixed values gets almost impossible.
What can CAKE do in this case?

Again thanks for the good data you provided.

Re: some quick comments on configuring cake

Posted: Tue Oct 12, 2021 5:59 pm
by gtj0
Thanks dtaht!
I wish we could upvote posts and threads. I'd do both.

Re: some quick comments on configuring cake

Posted: Wed Oct 13, 2021 7:08 pm
by dtaht
Do you have any tips for LTE connections? Especially ones that go from ~5Mbps to 70Mbps in a few hours? The auto ingress doesn't always act as I'd expect it to, and I'm not sure if it's RouterOS' implementation, or a bug, or me not understanding things.
Don't use them? We get the "how can an end user make LTE generally usable and consistently low latency" question a lot. It's often worse than wifi. We've ( been after that entire industry for years now to do better queue management everywhere - the handsets are horrifically overbuffered, the enode-bs as well, the backhaul's both encrypted and underprovisioned...

And instead we get back all sorts on non-useful and actually extra-latency inducing things like "network slices", and other places where they've thoroughly shot themselves in the foot (like distributed cpus for the wireless connection) from a queue theory perspective and so on. One company is afraid to even look at the packet headers inside the encapsulation, so no fq or ecn is possible by their lawyer decreed policy. There's a been a ton of good research published on how to make the queuing saner on 3/4/5g but I'm still not aware of any actual products. I am hoping that the next gen of cell phones from both apple and google get that more right (I just finished up a stint at apple, can't say more) , but as for managing the downlink...

Cake's Auto-ingress is somewhat suitable for rates that fluctuate slightly, but many/most LTE/5g systems fluctuate too much. We made cake easily and transparently reconfigurable, so with adaquate stats from the hardware, or passive measurement of flows passing through it, some answers for managing inbound are more possible... but the right (I'm trying to avoid cursing here), answer was to fq and aqm the enode-bs, improve the backhauls, and stop trying to create for-pay services that don't work.

Various moral equivalents to FQ have long been in play in the rest of the "fixed wireless" market (which consists of a lot of telco folk talking to themselves, rather than recognise that non-5g tech - like most of mikrotiks market - dominates in the field).

A possible avenue for improving LTE inbound is leveraging kathie nichol'snew queue estimator that's now in bbr, and the ebpf "pping" tool we're working on... but ENOFUNDING. If they spent a little less on the marketing and a little more on the tech - or opened up more binary blobs, we could make progress, rapidly.

Re: some quick comments on configuring cake

Posted: Wed Oct 13, 2021 7:43 pm
by Larsa
Various moral equivalents to FQ have long been in play in the rest of the "fixed wireless" market (which consists of a lot of telco folk talking to themselves, rather than recognise that non-5g tech - like most of mikrotiks market - dominates in the field).

Hear, hear! Much like 3GPP trying to reinvent "Internet" and related tech stacks using their own acronyms. ; -)

cake (or fq_codel) vs sfq

Posted: Thu Oct 14, 2021 4:40 am
by dtaht
I thought I'd write a brief note about SFQ vs CAKE. I think highly of SFQ. If I could go back in time to 2002, when it first arrived in linux, I'd have tried to make it the default, instead of a FIFO, given what I know now. It was *the* fundamental component in wondershaper. Nearly any place you have a FIFO today, SFQ would be better, so long as you have it properly sized.

It is still very possible to get a good result with SFQ at higher rates, if you increase the packet limit, and if you have a good mix of flows, increase the number of flows. However therein lies the rub - if you increase the packet limit, you end up with 100s of ms of bufferbloat - if you don't increase the packet limit you won't be able to achieve full bandwidth at high rates - and setting a per packet limit is not as good as setting a byte limit in an age where a packet can range in size from 64 bytes to 64k bytes.

DRR is an option that can work better than SFQ. That said, if all you have is SFQ, USE IT. Anything that breaks up bursts is good.

So... 4 improvements that came from fq_codel over SFQ.

0) It does better FQ for "sparse" flows than SFQ
1) You don't need to set the queue length, the AQM attempts to hold latencies to 5ms
2) The default number of flows is 1024, which seems to be "enough"
3) fq_codel drops from the head, not the tail of the queue, signaling congestion earlier, and avoiding bursty tail loss

What follows are two "rrul" plots, taken from the tool we use heavily in the bufferbloat project and highly recommend over, for example, web based benchmarking tools.

They test - simultaneously - 4 tcp upload streams, 4 tcp download streams, 4 measurement streams (both udp and icmp) for 1 minute, by default, and both of these are *good* results. This particular test was against a cablemodem provisioned for 100Mbit down, 10Mbit up. I'll show what a bad result looks like at the end.
sfq-spectrum-hapac2 (1).png
Take a look at the third panel on the bottom on both these plots. That's fq_codel's DRR++ derived scheduler, taking the measurement flows and putting them in the front of the queue. SFQ and DRR put new flows at the tail of the FQ queues - so if you have 32 flows, a new flow's arrival will end up at the 33th queue (Which is still WAY better than a FIFO), and be served in turn. A variant of SFQ, called SQF, noticed that it was possible to take a new arrival, serve that first, and thus newer flows - of all sorts, not just voip, dns, tcp syn/syn ack got a little boost and lower latency, than fatter flows. The DRR based design of the fq_codel scheduler on that third panel shows that with the 12 flows going, at this rate, we are saving 10ms on sparser packets - packets that have an arrival rate less than the total time it takes to serve all the other queue-making flows.
cake-spectrum-hapac2 (1).png
Now as to why the bandwidths seem a bit different - the tcp flows in the SFQ case are more jittery than the cake one because they hit the end of the SFQ's fifo, have one or more tail drops, and then have to recover more data that the codel AQM does. It turns out we deliver slightly more data in both directions in this test case.

Lastly, what does a bad result look like? Well, this is the basic behavior of a typical (spectrum) cable modem today. The latencies under load grow so bad, that it chokes the upstream flows enormously, and your voice call, well... do you like shouting 600 ft across the room to be heard? Or clicking on a web page and waiting 2 seconds for the first byte?
Best practice for fq_codel: At shaped rates below 4Mbit, you need to scale the target to the time it takes for 1MTU to egress. At 1500MTU, 1Mbit, 15ms. It generally pays to use a quantum of 300 below 100Mbit.

Cake autoscales these two parameters.

My thanks to Jordan Szuch for testing this release candidate of mikrotik on the hapac2 and providing these plots and comfort, that cake and fq_codel were actually working correctly here. SHIP IT.

(really looking forward to more testing and testers)

Re: some quick comments on configuring cake

Posted: Thu Oct 14, 2021 10:15 am
by moeller0
Just a few notes on configuration cake overhead keywords (if in a hurry just read the bold snippets):

ADSL* and max rate <= 25/5 Mbps: "overhead 44 atm"
Note: actually anything using ATM/AAL5, which nowadays for access links should be only ADSL, ADSL2, ADSL2+, but theoretically VDSL2 also allows ATM/AAL5 but I have seen no evidence yet that this configuration exsts in the real world. Note 44 Bytes is a realistic "bad case" encapsulation overhead seen in the wild, theoretically larger overhead seems possible albeit very unlikely. To dig deeper into ADSL overhead curious minds can have a look at

VDSL2**: "overhead 44 mpu 88"
Note: Actually PTM carrier instead of ATM/AAL%, this can actually be used on ADSL links as well, and as far as I know some ISPs actually use that.
Also note that PTM uses a 64/65 encapsulation so if you deduce the shaper settings from modem sync you need derate the syncrarts by 64/65 = 0.984615384615 (cake offers a ptm keyword to perform this derating automatically, but does so by adjusting the accounted packet size instead of simply adjusting the shaper gross rates. BUT for most users the sync will not be the relevant limit, but a shaper/policer at the ISP's end which enforces the contracted rates which if functional will already have the 64/65 overhead accounted for.)
VDSL2 likely has lower overhead than 44, but the bandwidth sacrifice of specifying a slightly larger per-packet-overhead is small compared to the latency-under-load-increase possible if the per-packet-overhead is too small.

DOCSIS/cable**: "overhead 18 mpu 88"
Note: The real per-packet/per-slot overhead on a DOCSIS link is considerably higher, but the DOCSIS standard mandates that user access rates are shapes as if they had 18 bytes of per-packet overhead, so for us that is the relevant value.

Getting initial shaper setting: The quickest way to get reasonable starting values to configure the shaper is to simply run a few speedtests and try to get a feel for the reliably available speeds for down- and up-link and then use these net goodput values (mostly measured as TCP/IP goodput) as gross shaper values for cake. Say you measured 100 arbitrary units, the respective gross rate on a DOCSIS link would be larger or equal to :
100 / ((1500-20-20)/(1500+18)) = 103.97
This will give the shaper a 100-100*100/103.97 = 3.82% margin compared to the true bottleneck rate, which is an acceptable starting point*, which then should be confirmed by a few bufferbloat tests, either via the dslreports speedtest (for configuration see ... sting/2803) or waveform's new test under ... qm-details while tailored for OpenWrt's SQM version, contains a lot of background information and configuration advice for those willing to spend more time.

*) The recommended margin is 5-15% of the true bottleneck gross rate, tyically a bit more for ingress/download and potentially a bit less for egress/upload, but 3.8% is close enough IFF one is willing to run a few tests to confirm that bufferbloat is sufficiently controlled, otherwise just take 95% of the speedtest result.

Cake's bandwidth parameter

Posted: Thu Oct 21, 2021 5:22 pm
by dtaht
We try to stress that the default options for cake (essentially just the bandwidth parameter) are good enough for most purposes.

That said, there are two important differences between how cake's bandwidth shaper works vis a vis htb that are useful to highlight.

Token bucket designs date back to the 70s as an easy to implement in hardware method of doing rate control. Linux HTB along the way (2006) gained the ability to compensate for dsl as cake does, but I don't know if it's configurable in mikrotik's api. Also, our thinking is flavored by the CPE -> perspective, rather than the ISP -> down, and my hope is in working with more active ISPs trying to shape their down more directly we'll find ideas worth implementing moving forward.

The more important difference between htb and cake's shaper is that a token bucket is naturally bursty. If a link has lain idle for a while, enough tokens accumulate (the htb quantum and burst parameters) that a line rate burst will pass through htb until the burst parameter is exceeded.

This means that that burst ends up accumulating in the device buffers and invokes jitter. The deficit based shaper in cake never bursts, but does need a cpu that can context switch rapidly enough to ensure a smoother delivery of packets. You can typically run cake hard up against a htb shaper, configured at the same rate, and have cake almost always win. And you can typically configure
htb with a higher burst and quantum parameter to have it use less cpu and still more or less effectively shape the connection - but it too starts getting wildly variable as you tweak those parameters to save on cpu to be able to run at higher rates.

One thing that we've failed to call out enough is doing things like saying "if you have enough cpu". How we think about that over here is a bit different from how others think about it, in that what matters is not clock rate, or straight line instructions per second, but how fast the cpu can context switch. It's often the case that a heavily pipelined cpu cannot context switch as fast as one that isn't.

Running out of cpu when shaping using either method is a PITA. Per-cpu locking is also a problem. You might peg one cpu at 100% and leave the others idle. The linux community has worked very hard to remove a bunch of locks over the years, but at the moment the most progress is being made via ebf assistance, as in libreqos and preseem. YMMV.

I see that a common means of testing mikrotik is with X tc filters (seemingly 25). Cake can work with those also, but the hope was that less tc rules would be needed with cake as a base, and some of the cpu lost, or even all of it, to using cake, recovered that way. In general we try to encourage folk to drop all their preconceptions about shaping, multiple tiers of service, and so on, and delete everything they are doing special, try cake bandwidth X, and then measure their results. I'd like a look at an ISPs typical tc rule set to see how tc is being used today.

As for multiple tiers of service - A common configuration is three tiers of htb -> SFQ, SFQ, SFQ. I've seen 6, 9, even as many as 20, and the thing commonly missed by assembling the qdiscs this way
is that every separate qdisc you add has a packet limit, each! adds to your worst case delay. You can typically drop in htb + fq_codel in those configurations and keep your worst case delay bounded better via the aqm, or apply cake which has 3 or 4 tiers of service internally.