Community discussions

MikroTik App
 
dtaht
just joined
Topic Author
Posts: 17
Joined: Sat Aug 03, 2013 5:46 am

some quick comments on configuring cake

Sun Oct 10, 2021 3:29 pm

Hi, one of the contributors to cake here. I'm pleased y'all are finally shipping it, but I have a few comments:

* A modern version of cake has support for the new diffserv LE codepoint. I'd dearly like support for that in mikrotik given how problematic CS1 proved to be, and it's a teeny patch.

* One feature of cake is that it runs the same whether at line rate, or with the shaper enabled, so you can get per-host/per flow fq, diffserv classification, etc. I'm very interested in learning of results when you try to run it or fq_codel at line rate, rather than shaped. fq_codel is the default on all interfaces, rather than pfifo_fast, in most linuxes today. I would really like it
if people put it through a battery of flent rrul tests or heavy iperf, and took captures, and plotted rtts, particularly on the higher end mikrotik hw. It is most useful with working BQL in the device driver.

* https://help.mikrotik.com/docs/display/ROS/Queues is missing support for the gso-splitting option. When using the shaper component, below 1gbit, gro "super"packets are automatically split up back into packets (and then interleaved with other flows), when unshaped, or above 1gbit, they are not. If you've got the cpu, split up superpackets.

* If you are natting at the router, try the nat option. This does not work with some forms of offloaded nat.

* If you have major bandwidth asymmetry on a link (greater than 10x1), try the ack-filter option on the slower part of the link. It gets to be a hugely *necessary* idea at ratios higher than that, see: https://blog.cerowrt.org/post/ack_filtering/

* It would be nice if mikrotik had some way of polling and displaying statistics from fq_codel for backlog, reschedules, drops, and marks, and from cake for the same. Exposing these statistics to more users would drive understanding of the role of packet loss (and marking) in controlling network delay. tc supports json output, multiple tools can parse that. See the enthusiasm for collecting stats over in the starlink community... I would love to see at the very least, drop stats out of the mikrotik userbase.

* When shaping dsl especially, it's very important to get the link type "framing" right, but also useful on cablemodems to set the docsis parameter. You can get hard up against the actual configured cablemodem rate in particular in this way instead of wasting 5-15%, and in the dsl case it is *impossible* to get a consistent shaped rate unless you set it right, or at least, conservatively. I mean that. Impossible to get some forms of dsl right unless you compensate.

* If you aren't going to use diffserv, use cake besteffort, to save on memory and cpu. To save on cpu further, don't use the ack-filter or nat options.

* There are a bunch of per host/per flow fq options that are dependent on your use cases for regulating traffic between ip addresses or ports.

* Use wash on ingress when you don't trust the diffserv markings from upstream. This a pretty heavy hammer, and it is preferred that y'all communicate with your customers about how you treat diffserv and let them optimize their own traffic, only remarking from 0 (best effort) to something else if you need to. There is a published guide to zoom traffic, among others. Wash on egress if you aren't following the relevant RFCs.

* Cake tries really hard to follow a bunch of mutually conflcting diffserv RFCs, and in an age where videoconferencing is very important the cake diffserv4 model is closer to how a wifi AP treats it. see: https://www.w3.org/TR/webrtc-priority/ for this underused facility in webrtc.

* Despite saying all this about diffserv it generally ranks dead last as an optimization technique verses better statistical multiplexing from FQ, and the short queues you get from an AQM.

I should stress that these are options and are optional, aside from getting shaped dsl compensation right, the cake defaults are pretty good.

Other notes:

* Telling your customers how they can have better wifi at home is useful also! In most cases the bufferbloat starts to shift to the home wifi at above 40mbit, and no matter all the contortions you've done here to manage your bandwidth to/from them better, everybody benefits from better home routers with sqm on the link and fq_codel on the wifi: https://blog.linuxplumbersconf.org/2016 ... y-3Nov.pdf

* The cake mailing list is the best place to ask questions or make feature requests: https://lists.bufferbloat.net/listinfo/cake - see also the archives there or on the related "Bloat" mailing list. Cake is the most advanced smart queue management (SQM) system, we've been able to design, as yet: https://www.bufferbloat.net/projects/ce ... anagement/ and whilst we initially targeted it at cpe and home gateways it is certainly proving useful in the middle of an ISP's network. We are very interested in feedback as to how to make it, or something like it, better for ISPs. One example (that I have NO idea how to make work on mikrotik) is here: https://github.com/rchac/LibreQoS

* There are multiple academic papers on how fq_codel and cake actually work, the best summary of most of the things we did to beat bufferbloat in linux is in the online book; https://bufferbloat-and-beyond.net/ - but feel free to hit google scholar for "bufferbloat", and the cobalt AQM.

* I'm really big on explaining the why (in addition to the how, above), at various levels, including entertaining ones like this:

https://blog.apnic.net/2020/01/22/buffe ... -over-yet/
Last edited by dtaht on Sun Oct 10, 2021 8:57 pm, edited 8 times in total.
 
User avatar
Larsa
Member Candidate
Member Candidate
Posts: 260
Joined: Sat Aug 29, 2015 7:40 pm

Re: some quick comments on configuring cake

Sun Oct 10, 2021 3:33 pm

Dave, thanks for very useful tips! Should be included as "best practice" in the ros documentation.
 
dtaht
just joined
Topic Author
Posts: 17
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Sun Oct 10, 2021 8:44 pm

To give an example of where I'd hoped to see fq_codel or cake make more of a dent in the mikrotik universe, consider a topology like this:

10Gbit -> 1GBit port A
-> 1Gbit port B
10 more ports

In ANY fast->slow rate transition fair queuing, and aqm, can soften the impact of that 10Gbit interface (or multiple 1Gbit interfaces) fed into 1Gbit Port A here, achieving near zero latency for sparse flows and ultimately 5ms or less for incoming traffic. It's a complete unknown how deep the buffers are on those 1Gbit ports throughout the world, (or in any stepdown) but I strongly suspect they are far deeper than 5ms, and few have anything other than a FIFO on them. This recent paper was good: https://arxiv.org/pdf/2109.11693.pdf

Some offload engines for switches have gained RED of late, but that's still finicky to configure. The bulk of the bufferbloat effort has been on fixing the last mile, but we are seeing deep within the
ISP's network, signs of bloat there, also.
 
BitHaulers
newbie
Posts: 31
Joined: Thu Jun 21, 2018 11:23 am

Re: some quick comments on configuring cake

Tue Oct 12, 2021 6:05 am

Do you have any tips for LTE connections? Especially ones that go from ~5Mbps to 70Mbps in a few hours? The auto ingress doesn't always act as I'd expect it to, and I'm not sure if it's RouterOS' implementation, or a bug, or me not understanding things.
 
WeWiNet
Long time Member
Long time Member
Posts: 560
Joined: Thu Sep 27, 2018 4:11 pm

Re: some quick comments on configuring cake

Tue Oct 12, 2021 10:53 am

Hi dtaht,

thanks for posting all this usefull information. I asked already in seperate post a bit in this direction but you really provide
massive data (which half I don't yet fully understand).
But it shows that cake is a complex tool, which is worth learning more how to use it.

Don't assume all Mikrotik affinados are queue/cake experts. Please make it (if possible) simple so all can benefit a max from your
experience.

What do you mean exactly with :
When shaping dsl especially, it's very important to get the link type "framing" right
This is one of my use cases where queuing is really really important. Can you give short example for say link of 5M down and 800k up (or whatever you want to use)

The other question from Bithaulers
any tips for LTE connections? Especially ones that go from ~5Mbps to 70Mbps in a few hours?

is also very valid, same problem again on my side. LTE (and soon 5G even worse) is a medium where in 24h the "pipe" itself changes heavily.
In this situation it is really hard to define the pipe size and do queueing with fixed values gets almost impossible.
What can CAKE do in this case?

Again thanks for the good data you provided.
WeWiNet

**
MTCNA
I like a new challenge, I migrate to ROS7... :? no way, finally I stay with 6.48! I am NOT crazy :lol: !!!
 
gtj0
just joined
Posts: 15
Joined: Wed Sep 23, 2020 8:08 pm

Re: some quick comments on configuring cake

Tue Oct 12, 2021 5:59 pm

Thanks dtaht!
I wish we could upvote posts and threads. I'd do both.
 
dtaht
just joined
Topic Author
Posts: 17
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Wed Oct 13, 2021 7:08 pm

Do you have any tips for LTE connections? Especially ones that go from ~5Mbps to 70Mbps in a few hours? The auto ingress doesn't always act as I'd expect it to, and I'm not sure if it's RouterOS' implementation, or a bug, or me not understanding things.
Don't use them? We get the "how can an end user make LTE generally usable and consistently low latency" question a lot. It's often worse than wifi. We've (bufferbloat.net) been after that entire industry for years now to do better queue management everywhere - the handsets are horrifically overbuffered, the enode-bs as well, the backhaul's both encrypted and underprovisioned...

And instead we get back all sorts on non-useful and actually extra-latency inducing things like "network slices", and other places where they've thoroughly shot themselves in the foot (like distributed cpus for the wireless connection) from a queue theory perspective and so on. One company is afraid to even look at the packet headers inside the encapsulation, so no fq or ecn is possible by their lawyer decreed policy. There's a been a ton of good research published on how to make the queuing saner on 3/4/5g but I'm still not aware of any actual products. I am hoping that the next gen of cell phones from both apple and google get that more right (I just finished up a stint at apple, can't say more) , but as for managing the downlink...

Cake's Auto-ingress is somewhat suitable for rates that fluctuate slightly, but many/most LTE/5g systems fluctuate too much. We made cake easily and transparently reconfigurable, so with adaquate stats from the hardware, or passive measurement of flows passing through it, some answers for managing inbound are more possible... but the right (I'm trying to avoid cursing here), answer was to fq and aqm the enode-bs, improve the backhauls, and stop trying to create for-pay services that don't work.

Various moral equivalents to FQ have long been in play in the rest of the "fixed wireless" market (which consists of a lot of telco folk talking to themselves, rather than recognise that non-5g tech - like most of mikrotiks market - dominates in the field).

A possible avenue for improving LTE inbound is leveraging kathie nichol'snew queue estimator that's now in bbr, and the ebpf "pping" tool we're working on... but ENOFUNDING. If they spent a little less on the marketing and a little more on the tech - or opened up more binary blobs, we could make progress, rapidly.
 
User avatar
Larsa
Member Candidate
Member Candidate
Posts: 260
Joined: Sat Aug 29, 2015 7:40 pm

Re: some quick comments on configuring cake

Wed Oct 13, 2021 7:43 pm

Various moral equivalents to FQ have long been in play in the rest of the "fixed wireless" market (which consists of a lot of telco folk talking to themselves, rather than recognise that non-5g tech - like most of mikrotiks market - dominates in the field).

Hear, hear! Much like 3GPP trying to reinvent "Internet" and related tech stacks using their own acronyms. ; -)
 
dtaht
just joined
Topic Author
Posts: 17
Joined: Sat Aug 03, 2013 5:46 am

cake (or fq_codel) vs sfq

Thu Oct 14, 2021 4:40 am

I thought I'd write a brief note about SFQ vs CAKE. I think highly of SFQ. If I could go back in time to 2002, when it first arrived in linux, I'd have tried to make it the default, instead of a FIFO, given what I know now. It was *the* fundamental component in wondershaper. Nearly any place you have a FIFO today, SFQ would be better, so long as you have it properly sized.

It is still very possible to get a good result with SFQ at higher rates, if you increase the packet limit, and if you have a good mix of flows, increase the number of flows. However therein lies the rub - if you increase the packet limit, you end up with 100s of ms of bufferbloat - if you don't increase the packet limit you won't be able to achieve full bandwidth at high rates - and setting a per packet limit is not as good as setting a byte limit in an age where a packet can range in size from 64 bytes to 64k bytes.

DRR is an option that can work better than SFQ. That said, if all you have is SFQ, USE IT. Anything that breaks up bursts is good.

So... 4 improvements that came from fq_codel over SFQ.

0) It does better FQ for "sparse" flows than SFQ
1) You don't need to set the queue length, the AQM attempts to hold latencies to 5ms
2) The default number of flows is 1024, which seems to be "enough"
3) fq_codel drops from the head, not the tail of the queue, signaling congestion earlier, and avoiding bursty tail loss

What follows are two "rrul" plots, taken from the flent.org tool we use heavily in the bufferbloat project and highly recommend over, for example, web based benchmarking tools.

They test - simultaneously - 4 tcp upload streams, 4 tcp download streams, 4 measurement streams (both udp and icmp) for 1 minute, by default, and both of these are *good* results. This particular test was against a cablemodem provisioned for 100Mbit down, 10Mbit up. I'll show what a bad result looks like at the end.
sfq-spectrum-hapac2 (1).png
Take a look at the third panel on the bottom on both these plots. That's fq_codel's DRR++ derived scheduler, taking the measurement flows and putting them in the front of the queue. SFQ and DRR put new flows at the tail of the FQ queues - so if you have 32 flows, a new flow's arrival will end up at the 33th queue (Which is still WAY better than a FIFO), and be served in turn. A variant of SFQ, called SQF, noticed that it was possible to take a new arrival, serve that first, and thus newer flows - of all sorts, not just voip, dns, tcp syn/syn ack got a little boost and lower latency, than fatter flows. The DRR based design of the fq_codel scheduler on that third panel shows that with the 12 flows going, at this rate, we are saving 10ms on sparser packets - packets that have an arrival rate less than the total time it takes to serve all the other queue-making flows.
cake-spectrum-hapac2 (1).png
Now as to why the bandwidths seem a bit different - the tcp flows in the SFQ case are more jittery than the cake one because they hit the end of the SFQ's fifo, have one or more tail drops, and then have to recover more data that the codel AQM does. It turns out we deliver slightly more data in both directions in this test case.

Lastly, what does a bad result look like? Well, this is the basic behavior of a typical (spectrum) cable modem today. The latencies under load grow so bad, that it chokes the upstream flows enormously, and your voice call, well... do you like shouting 600 ft across the room to be heard? Or clicking on a web page and waiting 2 seconds for the first byte?
baseline-spectrum-hapac2.png
Best practice for fq_codel: At shaped rates below 4Mbit, you need to scale the target to the time it takes for 1MTU to egress. At 1500MTU, 1Mbit, 15ms. It generally pays to use a quantum of 300 below 100Mbit.

Cake autoscales these two parameters.

My thanks to Jordan Szuch for testing this release candidate of mikrotik on the hapac2 and providing these plots and comfort, that cake and fq_codel were actually working correctly here. SHIP IT.

(really looking forward to more testing and testers)
You do not have the required permissions to view the files attached to this post.
Last edited by dtaht on Thu Oct 14, 2021 5:38 pm, edited 4 times in total.
 
moeller0
just joined
Posts: 1
Joined: Thu Oct 14, 2021 9:43 am

Re: some quick comments on configuring cake

Thu Oct 14, 2021 10:15 am

Just a few notes on configuration cake overhead keywords (if in a hurry just read the bold snippets):

DSL:
ADSL* and max rate <= 25/5 Mbps: "overhead 44 atm"
Note: actually anything using ATM/AAL5, which nowadays for access links should be only ADSL, ADSL2, ADSL2+, but theoretically VDSL2 also allows ATM/AAL5 but I have seen no evidence yet that this configuration exsts in the real world. Note 44 Bytes is a realistic "bad case" encapsulation overhead seen in the wild, theoretically larger overhead seems possible albeit very unlikely. To dig deeper into ADSL overhead curious minds can have a look at https://github.com/moeller0/ATM_overhead_detector.

VDSL2**: "overhead 44 mpu 88"
Note: Actually PTM carrier instead of ATM/AAL%, this can actually be used on ADSL links as well, and as far as I know some ISPs actually use that.
Also note that PTM uses a 64/65 encapsulation so if you deduce the shaper settings from modem sync you need derate the syncrarts by 64/65 = 0.984615384615 (cake offers a ptm keyword to perform this derating automatically, but does so by adjusting the accounted packet size instead of simply adjusting the shaper gross rates. BUT for most users the sync will not be the relevant limit, but a shaper/policer at the ISP's end which enforces the contracted rates which if functional will already have the 64/65 overhead accounted for.)
VDSL2 likely has lower overhead than 44, but the bandwidth sacrifice of specifying a slightly larger per-packet-overhead is small compared to the latency-under-load-increase possible if the per-packet-overhead is too small.

DOCSIS/cable**: "overhead 18 mpu 88"
Note: The real per-packet/per-slot overhead on a DOCSIS link is considerably higher, but the DOCSIS standard mandates that user access rates are shapes as if they had 18 bytes of per-packet overhead, so for us that is the relevant value.

Getting initial shaper setting: The quickest way to get reasonable starting values to configure the shaper is to simply run a few speedtests and try to get a feel for the reliably available speeds for down- and up-link and then use these net goodput values (mostly measured as TCP/IP goodput) as gross shaper values for cake. Say you measured 100 arbitrary units, the respective gross rate on a DOCSIS link would be larger or equal to :
100 / ((1500-20-20)/(1500+18)) = 103.97
This will give the shaper a 100-100*100/103.97 = 3.82% margin compared to the true bottleneck rate, which is an acceptable starting point*, which then should be confirmed by a few bufferbloat tests, either via the dslreports speedtest (for configuration see https://forum.openwrt.org/t/sqm-qos-rec ... sting/2803) or waveform's new test under https://www.waveform.com/tools/bufferbloat.

https://openwrt.org/docs/guide-user/net ... qm-details while tailored for OpenWrt's SQM version, contains a lot of background information and configuration advice for those willing to spend more time.


*) The recommended margin is 5-15% of the true bottleneck gross rate, tyically a bit more for ingress/download and potentially a bit less for egress/upload, but 3.8% is close enough IFF one is willing to run a few tests to confirm that bufferbloat is sufficiently controlled, otherwise just take 95% of the speedtest result.

Who is online

Users browsing this forum: mducharme, mike7 and 13 guests