Community discussions

MikroTik App
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

some quick comments on configuring cake

Sun Oct 10, 2021 3:29 pm

Hi, one of the contributors to cake here. I'm pleased y'all are finally shipping it, but I have a few comments:

* A modern version of cake has support for the new diffserv LE codepoint. I'd dearly like support for that in mikrotik given how problematic CS1 proved to be, and it's a teeny patch.

* One feature of cake is that it runs the same whether at line rate, or with the shaper enabled, so you can get per-host/per flow fq, diffserv classification, etc. I'm very interested in learning of results when you try to run it or fq_codel at line rate, rather than shaped. fq_codel is the default on all interfaces, rather than pfifo_fast, in most linuxes today. I would really like it
if people put it through a battery of flent rrul tests or heavy iperf, and took captures, and plotted rtts, particularly on the higher end mikrotik hw. It is most useful with working BQL in the device driver.

* https://help.mikrotik.com/docs/display/ROS/Queues is missing support for the gso-splitting option. When using the shaper component, below 1gbit, gro "super"packets are automatically split up back into packets (and then interleaved with other flows), when unshaped, or above 1gbit, they are not. If you've got the cpu, split up superpackets.

* If you are natting at the router, try the nat option. This does not work with some forms of offloaded nat.

* If you have major bandwidth asymmetry on a link (greater than 10x1), try the ack-filter option on the slower part of the link. It gets to be a hugely *necessary* idea at ratios higher than that, see: https://blog.cerowrt.org/post/ack_filtering/

* It would be nice if mikrotik had some way of polling and displaying statistics from fq_codel for backlog, reschedules, drops, and marks, and from cake for the same. Exposing these statistics to more users would drive understanding of the role of packet loss (and marking) in controlling network delay. tc supports json output, multiple tools can parse that. See the enthusiasm for collecting stats over in the starlink community... I would love to see at the very least, drop stats out of the mikrotik userbase.

* When shaping dsl especially, it's very important to get the link type "framing" right, but also useful on cablemodems to set the docsis parameter. You can get hard up against the actual configured cablemodem rate in particular in this way instead of wasting 5-15%, and in the dsl case it is *impossible* to get a consistent shaped rate unless you set it right, or at least, conservatively. I mean that. Impossible to get some forms of dsl right unless you compensate.

* If you aren't going to use diffserv, use cake besteffort, to save on memory and cpu. To save on cpu further, don't use the ack-filter or nat options.

* There are a bunch of per host/per flow fq options that are dependent on your use cases for regulating traffic between ip addresses or ports.

* Use wash on ingress when you don't trust the diffserv markings from upstream. This a pretty heavy hammer, and it is preferred that y'all communicate with your customers about how you treat diffserv and let them optimize their own traffic, only remarking from 0 (best effort) to something else if you need to. There is a published guide to zoom traffic, among others. Wash on egress if you aren't following the relevant RFCs.

* Cake tries really hard to follow a bunch of mutually conflcting diffserv RFCs, and in an age where videoconferencing is very important the cake diffserv4 model is closer to how a wifi AP treats it. see: https://www.w3.org/TR/webrtc-priority/ for this underused facility in webrtc.

* Despite saying all this about diffserv it generally ranks dead last as an optimization technique verses better statistical multiplexing from FQ, and the short queues you get from an AQM.

I should stress that these are options and are optional, aside from getting shaped dsl compensation right, the cake defaults are pretty good.

Other notes:

* Telling your customers how they can have better wifi at home is useful also! In most cases the bufferbloat starts to shift to the home wifi at above 40mbit, and no matter all the contortions you've done here to manage your bandwidth to/from them better, everybody benefits from better home routers with sqm on the link and fq_codel on the wifi: https://blog.linuxplumbersconf.org/2016 ... y-3Nov.pdf

* The cake mailing list is the best place to ask questions or make feature requests: https://lists.bufferbloat.net/listinfo/cake - see also the archives there or on the related "Bloat" mailing list. Cake is the most advanced smart queue management (SQM) system, we've been able to design, as yet: https://www.bufferbloat.net/projects/ce ... anagement/ and whilst we initially targeted it at cpe and home gateways it is certainly proving useful in the middle of an ISP's network. We are very interested in feedback as to how to make it, or something like it, better for ISPs. One example (that I have NO idea how to make work on mikrotik) is here: https://github.com/rchac/LibreQoS

* There are multiple academic papers on how fq_codel and cake actually work, the best summary of most of the things we did to beat bufferbloat in linux is in the online book; https://bufferbloat-and-beyond.net/ - but feel free to hit google scholar for "bufferbloat", and the cobalt AQM.

* I'm really big on explaining the why (in addition to the how, above), at various levels, including entertaining ones like this:

https://blog.apnic.net/2020/01/22/buffe ... -over-yet/
Last edited by dtaht on Sun Oct 10, 2021 8:57 pm, edited 8 times in total.
 
User avatar
Larsa
Member
Member
Posts: 422
Joined: Sat Aug 29, 2015 7:40 pm

Re: some quick comments on configuring cake

Sun Oct 10, 2021 3:33 pm

Dave, thanks for very useful tips! Should be included as "best practice" in the ros documentation.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Sun Oct 10, 2021 8:44 pm

To give an example of where I'd hoped to see fq_codel or cake make more of a dent in the mikrotik universe, consider a topology like this:

10Gbit -> 1GBit port A
-> 1Gbit port B
10 more ports

In ANY fast->slow rate transition fair queuing, and aqm, can soften the impact of that 10Gbit interface (or multiple 1Gbit interfaces) fed into 1Gbit Port A here, achieving near zero latency for sparse flows and ultimately 5ms or less for incoming traffic. It's a complete unknown how deep the buffers are on those 1Gbit ports throughout the world, (or in any stepdown) but I strongly suspect they are far deeper than 5ms, and few have anything other than a FIFO on them. This recent paper was good: https://arxiv.org/pdf/2109.11693.pdf

Some offload engines for switches have gained RED of late, but that's still finicky to configure. The bulk of the bufferbloat effort has been on fixing the last mile, but we are seeing deep within the
ISP's network, signs of bloat there, also.
 
BitHaulers
newbie
Posts: 35
Joined: Thu Jun 21, 2018 11:23 am

Re: some quick comments on configuring cake

Tue Oct 12, 2021 6:05 am

Do you have any tips for LTE connections? Especially ones that go from ~5Mbps to 70Mbps in a few hours? The auto ingress doesn't always act as I'd expect it to, and I'm not sure if it's RouterOS' implementation, or a bug, or me not understanding things.
 
WeWiNet
Long time Member
Long time Member
Posts: 586
Joined: Thu Sep 27, 2018 4:11 pm

Re: some quick comments on configuring cake

Tue Oct 12, 2021 10:53 am

Hi dtaht,

thanks for posting all this usefull information. I asked already in seperate post a bit in this direction but you really provide
massive data (which half I don't yet fully understand).
But it shows that cake is a complex tool, which is worth learning more how to use it.

Don't assume all Mikrotik affinados are queue/cake experts. Please make it (if possible) simple so all can benefit a max from your
experience.

What do you mean exactly with :
When shaping dsl especially, it's very important to get the link type "framing" right
This is one of my use cases where queuing is really really important. Can you give short example for say link of 5M down and 800k up (or whatever you want to use)

The other question from Bithaulers
any tips for LTE connections? Especially ones that go from ~5Mbps to 70Mbps in a few hours?

is also very valid, same problem again on my side. LTE (and soon 5G even worse) is a medium where in 24h the "pipe" itself changes heavily.
In this situation it is really hard to define the pipe size and do queueing with fixed values gets almost impossible.
What can CAKE do in this case?

Again thanks for the good data you provided.
**
MTCNA
Chateau 5G: high speed :D meets ROS7 :shock: , the perfect match... :lol:.
Having an Audience? Use wifiwave2!!! (the more people complain, the faster it gets fixed 8) )
 
gtj0
just joined
Posts: 15
Joined: Wed Sep 23, 2020 8:08 pm

Re: some quick comments on configuring cake

Tue Oct 12, 2021 5:59 pm

Thanks dtaht!
I wish we could upvote posts and threads. I'd do both.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Wed Oct 13, 2021 7:08 pm

Do you have any tips for LTE connections? Especially ones that go from ~5Mbps to 70Mbps in a few hours? The auto ingress doesn't always act as I'd expect it to, and I'm not sure if it's RouterOS' implementation, or a bug, or me not understanding things.
Don't use them? We get the "how can an end user make LTE generally usable and consistently low latency" question a lot. It's often worse than wifi. We've (bufferbloat.net) been after that entire industry for years now to do better queue management everywhere - the handsets are horrifically overbuffered, the enode-bs as well, the backhaul's both encrypted and underprovisioned...

And instead we get back all sorts on non-useful and actually extra-latency inducing things like "network slices", and other places where they've thoroughly shot themselves in the foot (like distributed cpus for the wireless connection) from a queue theory perspective and so on. One company is afraid to even look at the packet headers inside the encapsulation, so no fq or ecn is possible by their lawyer decreed policy. There's a been a ton of good research published on how to make the queuing saner on 3/4/5g but I'm still not aware of any actual products. I am hoping that the next gen of cell phones from both apple and google get that more right (I just finished up a stint at apple, can't say more) , but as for managing the downlink...

Cake's Auto-ingress is somewhat suitable for rates that fluctuate slightly, but many/most LTE/5g systems fluctuate too much. We made cake easily and transparently reconfigurable, so with adaquate stats from the hardware, or passive measurement of flows passing through it, some answers for managing inbound are more possible... but the right (I'm trying to avoid cursing here), answer was to fq and aqm the enode-bs, improve the backhauls, and stop trying to create for-pay services that don't work.

Various moral equivalents to FQ have long been in play in the rest of the "fixed wireless" market (which consists of a lot of telco folk talking to themselves, rather than recognise that non-5g tech - like most of mikrotiks market - dominates in the field).

A possible avenue for improving LTE inbound is leveraging kathie nichol'snew queue estimator that's now in bbr, and the ebpf "pping" tool we're working on... but ENOFUNDING. If they spent a little less on the marketing and a little more on the tech - or opened up more binary blobs, we could make progress, rapidly.
 
User avatar
Larsa
Member
Member
Posts: 422
Joined: Sat Aug 29, 2015 7:40 pm

Re: some quick comments on configuring cake

Wed Oct 13, 2021 7:43 pm

Various moral equivalents to FQ have long been in play in the rest of the "fixed wireless" market (which consists of a lot of telco folk talking to themselves, rather than recognise that non-5g tech - like most of mikrotiks market - dominates in the field).

Hear, hear! Much like 3GPP trying to reinvent "Internet" and related tech stacks using their own acronyms. ; -)
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

cake (or fq_codel) vs sfq

Thu Oct 14, 2021 4:40 am

I thought I'd write a brief note about SFQ vs CAKE. I think highly of SFQ. If I could go back in time to 2002, when it first arrived in linux, I'd have tried to make it the default, instead of a FIFO, given what I know now. It was *the* fundamental component in wondershaper. Nearly any place you have a FIFO today, SFQ would be better, so long as you have it properly sized.

It is still very possible to get a good result with SFQ at higher rates, if you increase the packet limit, and if you have a good mix of flows, increase the number of flows. However therein lies the rub - if you increase the packet limit, you end up with 100s of ms of bufferbloat - if you don't increase the packet limit you won't be able to achieve full bandwidth at high rates - and setting a per packet limit is not as good as setting a byte limit in an age where a packet can range in size from 64 bytes to 64k bytes.

DRR is an option that can work better than SFQ. That said, if all you have is SFQ, USE IT. Anything that breaks up bursts is good.

So... 4 improvements that came from fq_codel over SFQ.

0) It does better FQ for "sparse" flows than SFQ
1) You don't need to set the queue length, the AQM attempts to hold latencies to 5ms
2) The default number of flows is 1024, which seems to be "enough"
3) fq_codel drops from the head, not the tail of the queue, signaling congestion earlier, and avoiding bursty tail loss

What follows are two "rrul" plots, taken from the flent.org tool we use heavily in the bufferbloat project and highly recommend over, for example, web based benchmarking tools.

They test - simultaneously - 4 tcp upload streams, 4 tcp download streams, 4 measurement streams (both udp and icmp) for 1 minute, by default, and both of these are *good* results. This particular test was against a cablemodem provisioned for 100Mbit down, 10Mbit up. I'll show what a bad result looks like at the end.
sfq-spectrum-hapac2 (1).png
Take a look at the third panel on the bottom on both these plots. That's fq_codel's DRR++ derived scheduler, taking the measurement flows and putting them in the front of the queue. SFQ and DRR put new flows at the tail of the FQ queues - so if you have 32 flows, a new flow's arrival will end up at the 33th queue (Which is still WAY better than a FIFO), and be served in turn. A variant of SFQ, called SQF, noticed that it was possible to take a new arrival, serve that first, and thus newer flows - of all sorts, not just voip, dns, tcp syn/syn ack got a little boost and lower latency, than fatter flows. The DRR based design of the fq_codel scheduler on that third panel shows that with the 12 flows going, at this rate, we are saving 10ms on sparser packets - packets that have an arrival rate less than the total time it takes to serve all the other queue-making flows.
cake-spectrum-hapac2 (1).png
Now as to why the bandwidths seem a bit different - the tcp flows in the SFQ case are more jittery than the cake one because they hit the end of the SFQ's fifo, have one or more tail drops, and then have to recover more data that the codel AQM does. It turns out we deliver slightly more data in both directions in this test case.

Lastly, what does a bad result look like? Well, this is the basic behavior of a typical (spectrum) cable modem today. The latencies under load grow so bad, that it chokes the upstream flows enormously, and your voice call, well... do you like shouting 600 ft across the room to be heard? Or clicking on a web page and waiting 2 seconds for the first byte?
baseline-spectrum-hapac2.png
Best practice for fq_codel: At shaped rates below 4Mbit, you need to scale the target to the time it takes for 1MTU to egress. At 1500MTU, 1Mbit, 15ms. It generally pays to use a quantum of 300 below 100Mbit.

Cake autoscales these two parameters.

My thanks to Jordan Szuch for testing this release candidate of mikrotik on the hapac2 and providing these plots and comfort, that cake and fq_codel were actually working correctly here. SHIP IT.

(really looking forward to more testing and testers)
You do not have the required permissions to view the files attached to this post.
Last edited by dtaht on Thu Oct 14, 2021 5:38 pm, edited 4 times in total.
 
moeller0
just joined
Posts: 1
Joined: Thu Oct 14, 2021 9:43 am

Re: some quick comments on configuring cake

Thu Oct 14, 2021 10:15 am

Just a few notes on configuration cake overhead keywords (if in a hurry just read the bold snippets):

DSL:
ADSL* and max rate <= 25/5 Mbps: "overhead 44 atm"
Note: actually anything using ATM/AAL5, which nowadays for access links should be only ADSL, ADSL2, ADSL2+, but theoretically VDSL2 also allows ATM/AAL5 but I have seen no evidence yet that this configuration exsts in the real world. Note 44 Bytes is a realistic "bad case" encapsulation overhead seen in the wild, theoretically larger overhead seems possible albeit very unlikely. To dig deeper into ADSL overhead curious minds can have a look at https://github.com/moeller0/ATM_overhead_detector.

VDSL2**: "overhead 44 mpu 88"
Note: Actually PTM carrier instead of ATM/AAL%, this can actually be used on ADSL links as well, and as far as I know some ISPs actually use that.
Also note that PTM uses a 64/65 encapsulation so if you deduce the shaper settings from modem sync you need derate the syncrarts by 64/65 = 0.984615384615 (cake offers a ptm keyword to perform this derating automatically, but does so by adjusting the accounted packet size instead of simply adjusting the shaper gross rates. BUT for most users the sync will not be the relevant limit, but a shaper/policer at the ISP's end which enforces the contracted rates which if functional will already have the 64/65 overhead accounted for.)
VDSL2 likely has lower overhead than 44, but the bandwidth sacrifice of specifying a slightly larger per-packet-overhead is small compared to the latency-under-load-increase possible if the per-packet-overhead is too small.

DOCSIS/cable**: "overhead 18 mpu 88"
Note: The real per-packet/per-slot overhead on a DOCSIS link is considerably higher, but the DOCSIS standard mandates that user access rates are shapes as if they had 18 bytes of per-packet overhead, so for us that is the relevant value.

Getting initial shaper setting: The quickest way to get reasonable starting values to configure the shaper is to simply run a few speedtests and try to get a feel for the reliably available speeds for down- and up-link and then use these net goodput values (mostly measured as TCP/IP goodput) as gross shaper values for cake. Say you measured 100 arbitrary units, the respective gross rate on a DOCSIS link would be larger or equal to :
100 / ((1500-20-20)/(1500+18)) = 103.97
This will give the shaper a 100-100*100/103.97 = 3.82% margin compared to the true bottleneck rate, which is an acceptable starting point*, which then should be confirmed by a few bufferbloat tests, either via the dslreports speedtest (for configuration see https://forum.openwrt.org/t/sqm-qos-rec ... sting/2803) or waveform's new test under https://www.waveform.com/tools/bufferbloat.

https://openwrt.org/docs/guide-user/net ... qm-details while tailored for OpenWrt's SQM version, contains a lot of background information and configuration advice for those willing to spend more time.


*) The recommended margin is 5-15% of the true bottleneck gross rate, tyically a bit more for ingress/download and potentially a bit less for egress/upload, but 3.8% is close enough IFF one is willing to run a few tests to confirm that bufferbloat is sufficiently controlled, otherwise just take 95% of the speedtest result.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Cake's bandwidth parameter

Thu Oct 21, 2021 5:22 pm

We try to stress that the default options for cake (essentially just the bandwidth parameter) are good enough for most purposes.

That said, there are two important differences between how cake's bandwidth shaper works vis a vis htb that are useful to highlight.

Token bucket designs date back to the 70s as an easy to implement in hardware method of doing rate control. Linux HTB along the way (2006) gained the ability to compensate for dsl as cake does, but I don't know if it's configurable in mikrotik's api. Also, our thinking is flavored by the CPE -> perspective, rather than the ISP -> down, and my hope is in working with more active ISPs trying to shape their down more directly we'll find ideas worth implementing moving forward.

The more important difference between htb and cake's shaper is that a token bucket is naturally bursty. If a link has lain idle for a while, enough tokens accumulate (the htb quantum and burst parameters) that a line rate burst will pass through htb until the burst parameter is exceeded.

This means that that burst ends up accumulating in the device buffers and invokes jitter. The deficit based shaper in cake never bursts, but does need a cpu that can context switch rapidly enough to ensure a smoother delivery of packets. You can typically run cake hard up against a htb shaper, configured at the same rate, and have cake almost always win. And you can typically configure
htb with a higher burst and quantum parameter to have it use less cpu and still more or less effectively shape the connection - but it too starts getting wildly variable as you tweak those parameters to save on cpu to be able to run at higher rates.

One thing that we've failed to call out enough is doing things like saying "if you have enough cpu". How we think about that over here is a bit different from how others think about it, in that what matters is not clock rate, or straight line instructions per second, but how fast the cpu can context switch. It's often the case that a heavily pipelined cpu cannot context switch as fast as one that isn't.

Running out of cpu when shaping using either method is a PITA. Per-cpu locking is also a problem. You might peg one cpu at 100% and leave the others idle. The linux community has worked very hard to remove a bunch of locks over the years, but at the moment the most progress is being made via ebf assistance, as in libreqos and preseem. YMMV.

I see that a common means of testing mikrotik is with X tc filters (seemingly 25). Cake can work with those also, but the hope was that less tc rules would be needed with cake as a base, and some of the cpu lost, or even all of it, to using cake, recovered that way. In general we try to encourage folk to drop all their preconceptions about shaping, multiple tiers of service, and so on, and delete everything they are doing special, try cake bandwidth X, and then measure their results. I'd like a look at an ISPs typical tc rule set to see how tc is being used today.

As for multiple tiers of service - A common configuration is three tiers of htb -> SFQ, SFQ, SFQ. I've seen 6, 9, even as many as 20, and the thing commonly missed by assembling the qdiscs this way
is that every separate qdisc you add has a packet limit, each! adds to your worst case delay. You can typically drop in htb + fq_codel in those configurations and keep your worst case delay bounded better via the aqm, or apply cake which has 3 or 4 tiers of service internally.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Sat Oct 23, 2021 10:49 pm

Some poetry and analysis from Jim Gettys: https://gettys.wordpress.com/2018/02/11 ... -elephant/
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Sun Oct 31, 2021 12:12 pm

This patch makes cake work better with a locally terminated VPN: https://lists.bufferbloat.net/pipermail ... 05257.html
 
User avatar
DanielJB
Frequent Visitor
Frequent Visitor
Posts: 71
Joined: Mon May 27, 2013 3:05 pm

Re: some quick comments on configuring cake

Fri Nov 12, 2021 9:10 am

Hi Dave (dtaht),

Firstly, it's a testament to Mikrotik to see key developers such as yourself posting in the forums; secondly, your fine work on CAKE and related has made a global contribution to virtually everyone using the internet, so hats off!

> A modern version of cake has support for the new diffserv LE codepoint. I'd dearly like support for that in mikrotik given how problematic CS1 proved to be, and it's a teeny patch
+1! Would be great if you could submit a request at https://help.mikrotik.com so it is formalised.

> It would be nice if mikrotik had some way of polling and displaying statistics from fq_codel
I agree. As a first step, Mikrotik could fix the queue counters, for example enabling CAKE for all WiFi outbound queues on RouterOS 7 (/queue type set wireless-default cake-diffserv=diffserv4 cake-flowmode=dual-dsthost kind=cake), we always see:
/queue monitor
queued-packets: 0
queued-bytes: 0

As of RouterOS 7.1rc5, 'cake-bandwidth' is still a required parameter for LTE interfaces - do you agree there is still benefit using CAKE AQM without bandwidth limits? Mikrotik may be unaware of the opportunity.

Thanks,
Dan
- Daniel J Blueman
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Sat Nov 13, 2021 12:34 am

Good catch. The bandwidth parameter should be optional for cake.

As for whether or not you can run an LTE interface at line rate wisely, the state of most of the linux drivers for that were terribly overbuffered, so the amount of backpressure you got was very late. I hope that something like AQL or BQL land for LTE interfaces, and there's some promising work towards actively sensing LTE bandwidth going on over here in the openwrt universe: https://forum.openwrt.org/t/cake-w-adap ... 108848/482

Similarly, the fq_codel for wifi stuff was only supported for 5 chipsets, any place where they can use that, instead of a shaper, is a win. ( https://lwn.net/Articles/705884/ ) . As for slamming a shaper like cake in front of it, my understanding of mikrotik's market is it's mostly ISPs, and that ISPs sell tiers of service, and in that case a cake would be good.

As for the noqueue, offloads tend to suck up all the packets so you don't see them. If their queue command could be improved to use the tc -s qdisc statistics - and show loss, backlog, and ecn statistics, that would be nice. The wifi statistics for same are buried under /sys/kernel/debug/iee*/phy*/aqm and a few other aqm files per station further below there.

Some of my personal backstory is that I was a WISP operator in Nicaragua, and I'd upgraded my backbone to wireless-n, only to have it fail (up to 30 seconds of latency) in rain, which in Nicaragua is 2+ months long. So my motivation early on in the bufferbloat effort was to fix fixed wireless, and then go back to my mountaintop there, surf and swim and so on, as soon as the fixes went into linux mainline. My attempts to retire keep being thwarted, the last time I tried to hang it up, Nicaragua had had a near-revolution, and it seemed simpler and safer to just go about fixing the whole internet for everyone... and to try and get on top of new deployments like this one to make sure they get it right...

I miss that mountain a lot, sometimes. Seeing comcast get it right was a high ( https://arxiv.org/abs/2107.13968 ), seeing starlink get it wrong ( https://www.youtube.com/watch?v=c9gLo6Xrwgw ), wasn't. And I can no longer afford to retire. But it looks like mikrotik is well on their way to getting it right, which is a high.
Last edited by dtaht on Sun Nov 14, 2021 8:03 pm, edited 1 time in total.
 
mducharme
Trainer
Trainer
Posts: 1740
Joined: Tue Jul 19, 2016 6:45 pm
Location: Vancouver, BC, Canada

Re: some quick comments on configuring cake

Sat Nov 13, 2021 12:40 am

I have a question about priority QoS in cake. Our customers IP packets are encapsulated in PPPoE frames, which are then encapsulated in ethernet frames (VPLS tunnel), which then have two MPLS labels placed on them, which then have a VLAN header attached as the outermost layer. Is cake capable of reading the DSCP from the IP packet with all of those layers of encapsulation?

Currently the VLAN header on the outside has the proper VLAN priority set for the priority that we want the packet to be treated, but I don't think cake can read VLAN priority (PCP)?

Some background is, this is a WISP situation where the backhaul link is carrying about 90% VPLS traffic with MPLS labels (traffic for around 400 customers), and the VLAN tag has the priority set to the priority that I want the packet to be treated. We have eight different priority classes, depending on customer class (retail vs enterprise) and type of traffic, and use HTB to put the packet into the correct queue based on the VLAN priority over this backhaul. Currently we find only fifo and red useful for this on RouterOS v6, but with both we start to hit a limit at around 850Mbps of the 1Gbps backhaul link where it starts to drop retail packets even though it hasn't been maxed out. I'm hoping that maybe in RouterOS v7 one of codel or fq_codel or cake would work for this to achieve close to the maximum 1Gbps.

I tried using sfq on RouterOS v6 for this backhaul but performance substantially decreased, likely because the sfq handler got confused by the MPLS labels and put all MPLS traffic (90% of the current traffic) into the same stream.

I haven't found a lot of info online about people queuing packets with MPLS labels with codel/fq_codel/cake
 
User avatar
Larsa
Member
Member
Posts: 422
Joined: Sat Aug 29, 2015 7:40 pm

Re: some quick comments on configuring cake

Sun Nov 14, 2021 3:03 pm

As for whether or not you can run an LTE interface at line rate wisely, the state of most of the linux drivers for that were terribly overbuffered, so the amount of backpressure you got was very late. I hope that something like AQL or BQL land for LTE interfaces, and there's some promising work towards actively sensing LTE bandwidth going on over here in the openwrt universe: https://forum.openwrt.org/t/cake-w-adap ... 108848/482

From what I gather they perform testing using shell scripts and icmp (ping). Is there a more robust method that has that logic built in the the device driver itself? I'm very keen to make this work on all types of wireless technologies that suffer from a high degree of fluctuations in throughput, and in my particular case especially for LTE and it's friends.

EDIT:
Both IEEE 802.11 and LTE (RAN) have plenty of real time performance indicators that should fit fine for tuning purposes.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Sun Nov 14, 2021 7:45 pm

re - mpls. I have no idea if the linux flow dissector is good enough to get that far into the packet to do any good there. (I can look). It can cope with ppp-oe. If it can't find "flows", since there is seemingly no way to get at statistics in microtik, you would end up with a single queue, no matter how varied your traffic was, which you could see with (in fq_codel or sfq) tc -s class show. Otherwise you could try to hit it with a bunch of packets from different flows and see if they come out the other side in the same order.

In such a case where fq proves impossible, I might try the pie or codel AQM by themselves to keep queue sizes down. You can certainly use cake in this way as well (flowblind option), and possibly get some differentiation of service via diffserv, but it would be cheaper cpuwise to try htb + aqm.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Sun Nov 14, 2021 7:50 pm

one of the things discussed on that openwrt thread was using a tcptrace-like tool, and elsewhere, deeply inspecting tcp rtt inflation with ebpf and one of kathie nichol's innovations, pping.

Some info here: https://lists.bufferbloat.net/pipermail ... 15772.html

however microtik is far, far behind the curve on ebpf support. The tcptrace-like tool I call wtbb but it's not under heavy development, lacking funding. I do regard lte's terrible, terrible queuing problems as a high priority, but apparently few in the 5g world claiming low latency is actually investigating or fixing queuing delay via any means. And I hate working for free, and would rather fix wifi.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Sun Nov 14, 2021 7:55 pm

in direct answer to your question, I don't know of any linux mainline device drivers that do anything clever with lte, like bql or aql. Most of these drivers are out of tree, and I do hope somewhere in some OS, for android or for ios, there's intelligent life down there. One of these days someone will do a study of actual queuing latency in common cellphones.... apple has a new tool (the command line version is called networkQuality, the ios version is under developer settings).

What I have long done is measure the worst case latency under load for my lte connection on my boat, and shape cake to that. I don't care about bandwidth, I care that my videoconfernces work well. I experimeted with a string of podcasts gradually decaying my cake parameters to see what it really looked like to end users - with predictably awful results in the last couple ones I did. The next string of podcasts will have something like that openwrt script on them.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Sun Nov 14, 2021 7:58 pm

And re-re-reading this question (wow, did my eyes glaze over), cake pays no attention to vlan priorites. It can, with a tc rule. Assuming it's a modern enough cake. asking your question of the cake mailing list might get you somewhere...
 
User avatar
Amm0
Long time Member
Long time Member
Posts: 611
Joined: Sun May 01, 2016 7:12 pm
Location: California

Re: some quick comments on configuring cake

Fri Nov 19, 2021 6:33 pm

I noticed one of the RTT schemes is "satellite"... We sometimes use high speed, but high latency (500ms) GEO point-to-point IP links (10-100Mb/s SCPC)... Historically "TCP acceleration" is the approach to deal with these reliable but high RTT links for normal "web traffic" (i.e. some variant of "split TCP" using pepsal/SCPS-TS, sometimes using Hybla[-like] CC). In our case, the sat link has a fixed RTT and fixed/known, non-shared bandwidth – which is why I think CAKE may be of some use. Since we typically route sat links into Mikrotik ROS, CAKE be easy to apply in v7.1.

But I'm curious on your thoughts if "TCP acceleration" is even needed if a CAKE queue is used on either end of [a high RTT, high BW] bridged L2 satellite link?

Since the TCP CC algorithm/config employed by actual clients can dramatically effect TCP performance with high RTT, it's just not that easy to just simulate in a lab (e.g. apple's TCP stack responds differently than Linux, same for Windows, etc., and then also differently across those OS version since TCP CC flavors change) - thus curious what your experience is with CAKE in satellite use cases.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Cake to GEO

Sat Nov 20, 2021 5:09 am

If you have a correct estimate of RTT across the satellite link, use rtt that_number + 60ms. Definitely do not use the default rtt estimate (100ms) here as it will not fill the link. "satellite" is a SWAG.

cake supports RFC3168 - style ecn - if you enable that on your endpoints you can do congestion control losslessly. Win. The FQ portion will keep lower rate request/response and voip protocols separate from the AQM, and (nearly) never drop those.

https://www.bufferbloat.net/projects/ce ... nable_ECN/ [1]

There are a bunch of other ways to go with a "tcp accellerator" depending on your topology. If you are using a tcp proxy, enabling ecn on those endpoints will control the amount of data in flight. Using a delay sensitive tcp, also.

I would like very much a flent "rrul" test from an actual real-world satellite link, with and/or without a proxy. I have plenty from starlink, nothing from GEO, would love to emulate the other new constellations coming up. some packet captures too!

[1] Apple has made it more difficult to use ECN of late. The additional sysctl required to re-enable ecn negotiation always is

sudo sysctl -w net.inet.tcp.disable_tcp_heuristics=1

See also:
https://github.com/apple-opensource/xnu ... che.c#L164

This disables mptcp and tfo also.

Your core question "are proxies even needed", I didn't answer. Please go measure.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Fri Dec 10, 2021 6:46 am

Good evening, and thank you Dave for your many years of work along with the rest of your team combating bufferbloat! I have been following along for many years, and still feel like I know so little.

I am so glad to finally have fq_codel and cake in Mikrotik! Previously I had run an OpenBSD router at home for many years and it was great, but I have been running Mikrotik for a few years now. Anyhow, on to my testing..

## INFO
## Mikrotik CCR-1009
## RouterOS 7.1 Stable
## AT&T VDSL2 100/20
## San Antonio, Tx.
## Results of a ping to test server with unloaded pipe for reference:
--- dallas.starlink.taht.net ping statistics ---
27 packets transmitted, 27 received, 0% packet loss, time 26035ms
rtt min/avg/max/mdev = 28.288/28.699/29.860/0.314 ms




#### Test 1 - No queue
taht-ccr1009.png



#### Test 2 - CAKE defaults
name="cake-default" kind=cake cake-bandwidth=0bps cake-overhead=0 cake-overhead-scheme="" cake-rtt=100ms
cake-diffserv=diffserv3 cake-flowmode=triple-isolate cake-nat=no cake-wash=no cake-ack-filter=none
taht-ccr1009-2.png



#### Test 3 - Cake with NAT on download/upload and ACK filter on upload
name="cake-up" kind=cake cake-bandwidth=0bps cake-overhead=0 cake-overhead-scheme="" cake-rtt=100ms
cake-diffserv=diffserv4 cake-flowmode=triple-isolate cake-nat=yes cake-wash=no cake-ack-filter=filter

name="cake-down" kind=cake cake-bandwidth=0bps cake-overhead=0 cake-overhead-scheme="" cake-rtt=100ms
cake-diffserv=diffserv4 cake-flowmode=triple-isolate cake-nat=yes cake-wash=no cake-ack-filter=none
taht-ccr1009-3.png



#### Test 4 - Adding bridged ptm
name="cake-up" kind=cake cake-bandwidth=0bps cake-overhead=22 cake-atm=ptm cake-overhead-scheme=bridged-ptm
cake-rtt=100ms cake-diffserv=diffserv4 cake-flowmode=triple-isolate cake-nat=yes cake-wash=no cake-ack-filter=filter

name="cake-down" kind=cake cake-bandwidth=0bps cake-overhead=22 cake-atm=ptm cake-overhead-scheme=bridged-ptm
cake-rtt=100ms cake-diffserv=diffserv4 cake-flowmode=triple-isolate cake-nat=yes cake-wash=no cake-ack-filter=none
taht-ccr1009-4.png



#### Test 5 - Adding wash on download
name="cake-up" kind=cake cake-bandwidth=0bps cake-overhead=22 cake-atm=ptm cake-overhead-scheme=bridged-ptm
cake-rtt=100ms cake-diffserv=diffserv4 cake-flowmode=triple-isolate cake-nat=yes cake-wash=no cake-ack-filter=filter

name="cake-down" kind=cake cake-bandwidth=0bps cake-overhead=22 cake-atm=ptm cake-overhead-scheme=bridged-ptm
cake-rtt=100ms cake-diffserv=diffserv4 cake-flowmode=triple-isolate cake-nat=yes cake-wash=yes cake-ack-filter=none
taht-ccr1009-5.png



#### Test 6 - Remove bridged ptm, and set overhead to 22 (same as bridged ptm) and also add MPU 44 (would not let me save that with bridged ptm selected)
name="cake-up" kind=cake cake-bandwidth=0bps cake-overhead=22 cake-mpu=44 cake-atm=ptm cake-overhead-scheme=""
cake-rtt=100ms cake-diffserv=diffserv4 cake-flowmode=triple-isolate cake-nat=yes cake-wash=no cake-ack-filter=filter

name="cake-down" kind=cake cake-bandwidth=0bps cake-overhead=22 cake-mpu=44 cake-atm=ptm cake-overhead-scheme=""
cake-rtt=100ms cake-diffserv=diffserv4 cake-flowmode=triple-isolate cake-nat=yes cake-wash=yes cake-ack-filter=none
taht-ccr1009-6.png



Admittedly, AT&T is doing a pretty darn good job as of late. Bufferbloat used to be much worse with this same setup, I know they have pushed out several firmware updates over the years to this modem. It is especially heads and shoulders better than my old cable modem with Spectrum. I was lucky to fight through that horror to finally learn that it had a Puma 6 chipset which was known after a period of time to actually introduce latency to varying degrees at random! *GRR*

Anyhow, I can't leave well enough alone and why leave my buckets up to them to control, so here I am! Also, I have noticed there has not been much testing that I could find so figured I would help! Next week, I am supposed to get my new 5009 router so that will free up this one for more lab style testing. I have a CRS309 (10gb switch), CRS326 (1gb with 10gb uplinks) here so even though it would be local.. maybe I can help by doing some testing as you mentioned about like 10gb -> 1gb, etc.

Let me know how I can help, and I look forward to your feedback on my results. P.S. - thank you in advance for letting me use your server ;) I had done some testing against mine in Dallas, but was worried that it didn't have enough CPU to generate the traffic needed?

One more edit... here is my results from waveform's test after the last config:
https://www.waveform.com/tools/bufferbl ... 99892a2b21
You do not have the required permissions to view the files attached to this post.
Last edited by kevinb361 on Fri Dec 10, 2021 7:13 am, edited 5 times in total.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Sat Dec 11, 2021 9:33 pm

Thx so much for testing. I have a low standard right now... "does it crash?", so far, so good.

Your first result, sans cake, was really quite good, and indicates your AT&T link has only about 20ms of buffering in it, or so. Believe it or not, that's actually "underbuffered" by prior standards, and makes it harder for a single flow to sustain full rate. But: a little underbuffering is totally fine by me, and I don't care all that much if a single flow is unable to achieve full rate, I'd rather have low latency.

It's easier to determine the buffer depth via a single upload test like this:

flent -x --step-size=.05 --socket-stats -t the_options_you_are_testing --te=upload_streams=1 -H the_closest_server tcp_nup

Use the gui to print the "tcp_rtt" stats. If you use the -t option to name your different runs, you can also do comparison plots via "add other data files" in flent-gui.

there are servers in atlanta and in fremont, california, if either of those would be closer for you.
Last edited by dtaht on Sat Dec 11, 2021 10:05 pm, edited 2 times in total.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Sat Dec 11, 2021 9:37 pm

OK, ok, I gave in, in order to do science, could you also try a tcp_nup with upload_streams=4? and =16?

The Test 1 *appears* to show an old issue raising it's head - tcp global synchronization - the amount of queue is so short that all the flows synchronize and drop simultaneously, as per panel 3 of your first plot, but in order to do "science" here, simplifying the test to just uploads would help.

Secondly it appears that something on the path is treating the CS1 codepoint as higher priority than the CS0 codepoint, when CS1 is supposed to be "background".
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Sat Dec 11, 2021 9:43 pm

Does that VDSL device do hardware flow control? Or are you shaping via cake via htb? (I'm happy to hear the bandwidth=0 parameter seems to be working otherwise?), but the only way I can think of you getting results this good is if the vdsl modem is exerting flow control....

Anyway, your last result is a clear win over what you had before, methinks. I'd like a tcp_nup test of that config too, when you find the time.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Sun Dec 12, 2021 4:51 am

Thx so much for testing. I have a low standard right now... "does it crash?", so far, so good.

Your first result, sans cake, was really quite good, and indicates your AT&T link has only about 20ms of buffering in it, or so. Believe it or not, that's actually "underbuffered" by prior standards, and makes it harder for a single flow to sustain full rate. But: a little underbuffering is totally fine by me, and I don't care all that much if a single flow is unable to achieve full rate, I'd rather have low latency.


It's easier to determine the buffer depth via a single upload test like this:

flent -x --step-size=.05 --socket-stats -t the_options_you_are_testing --te=upload_streams=1 -H the_closest_server tcp_nup

Use the gui to print the "tcp_rtt" stats. If you use the -t option to name your different runs, you can also do comparison plots via "add other data files" in flent-gui.

there are servers in atlanta and in fremont, california, if either of those would be closer for you.
No crashing, I have run the CCR1009 very heavy for several days without issue! Full transparency, tonight I am on the RB5009, it just showed up yesterday so I have been toying with it. So, I will be using it for my testing tonight. I can always swap around if you would like. Either way, they are both running 7.1 Stable.

I agree with your statement on under buffering and would also much prefer lower latency than a single stream achieving full rate.

Yes sir, I am in San Antonio so server is ~30ms from me. Here is the result with the test requested, sans queueing:
rrul_-_no_queue.png
You do not have the required permissions to view the files attached to this post.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Sun Dec 12, 2021 5:22 am

OK, ok, I gave in, in order to do science, could you also try a tcp_nup with upload_streams=4? and =16?

The Test 1 *appears* to show an old issue raising it's head - tcp global synchronization - the amount of queue is so short that all the flows synchronize and drop simultaneously, as per panel 3 of your first plot, but in order to do "science" here, simplifying the test to just uploads would help.

Secondly it appears that something on the path is treating the CS1 codepoint as higher priority than the CS0 codepoint, when CS1 is supposed to be "background".
HAH, I was hoping to pique your interest ;) Science incoming!

I just thought of something that is very annoying about this modem/"router" from ATT. I have it in 'bypass' mode so that it assigns the public IP to the router however, it is still NAT'd traffic for lack of better words. I am not sure how it actually works, but it still has it's own state table, etc. The FIOS guys have figured out a way to bypass it because they also have an ONT, etc. But since this is DSL, I am stuck with whatever they are doing inside the black box. Maybe this is what is causing the codepoint funny business, as I am not doing anything with DSCP, etc.

On to the data!
tcp_nup_-_no_queue_4up_streams.png
tcp_nup_-_no_queue_4up_streams1.png
tcp_nup_-_no_queue_16up_streams.png
tcp_nup_-_no_queue_16up_streams1.png
You do not have the required permissions to view the files attached to this post.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Sun Dec 12, 2021 5:38 am

Does that VDSL device do hardware flow control? Or are you shaping via cake via htb? (I'm happy to hear the bandwidth=0 parameter seems to be working otherwise?), but the only way I can think of you getting results this good is if the vdsl modem is exerting flow control....

Anyway, your last result is a clear win over what you had before, methinks. I'd like a tcp_nup test of that config too, when you find the time.
I am not sure if the VDSL device does or not to be honest. It is an ATT branded box model BGW210. I have it in passthrough mode, but as stated above it is still some black magic NAT but 'passes' the public IP to my router.

The very first test I posted was without any queue in the Mikrotik router. After that, was all with cake, and when using bandwidth=0 it deffinently works well! Obviously, tweaking that helps it out but for a general setup out of the box.

** NOTE ** I just realized I had made a mistake in my config. I was leaving bandwidth at 0 in the cake config, and was setting up the target max limit for upload under the simple queue general settings to 19M. However, no limiting on the download. I will need to try these tests again later setting that to unlimited and setting bandwidth within cake itself. Curious to see if that makes any difference.

Setting up that last config with tcp_nup results:
tcp_nup_-_cake_4up_streams.png
tcp_nup_-_cake_4up_streams1.png
tcp_nup_-_cake_16up_streams.png
tcp_nup_-_cake_16up_streams1.png
You do not have the required permissions to view the files attached to this post.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Sun Dec 12, 2021 7:18 am

OK.
0) Still mostly very happy it doesn't crash.

1) Your dsl device's buffer is sized in packets, not bytes. The reason we only saw a 20ms RTT before on the rrul test, vs a vs the tcp-nup test being so much larger RTT, is that the acks from the return flows on the path filled up the queue also. I leave it as an exercise now for the reader to calculate the packet buffer length on this device...

2) I figured I was either looking a shaper above cake, or at dsl flow control .(I like hw flow ontrol, btw, I was perpetually showing off an ancient dsl modem with a 4 packet buffer and hw flowcontrol + fq-codel in the early days, as FQ = the time based AQM vs a fifo worked with that beautifully and cost 99% less cpu to do that way. Sadly most dsl modems moved to a switch and don't provide that backpressure anymore. Not quite sure you just tested that without a shaper.

3) Do want to verify you are not using BBR on your client? The 5ms simultaneous drops are still a mite puzzling.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Sun Dec 12, 2021 7:30 am

i do dream of hardware flow control, so no shaper, bandwidth=0 for cake as a tcp_nup test.

But i expect to be unlucky. Anyway, your fiddling with the frame parameters without a cake shaper active should have done nothing (I think), so that run was puzzling...

cake nat besteffort the_right_dsl_option bandwidth XMbit easiest to reason about. Do you have visibiity into the sync rate of the modem? Anyway, get that number right next then try
tcp_ndown... Note you cannot measure tcp rtt from this direction via flent directly, so we resort to inference or packet captures.

At some point I might ask you to stick your *.flent.gz files somewhere. Pleased to have so vastly improved tcp rtt.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Sun Dec 12, 2021 10:19 am

OK.
0) Still mostly very happy it doesn't crash.

1) Your dsl device's buffer is sized in packets, not bytes. The reason we only saw a 20ms RTT before on the rrul test, vs a vs the tcp-nup test being so much larger RTT, is that the acks from the return flows on the path filled up the queue also. I leave it as an exercise now for the reader to calculate the packet buffer length on this device...

2) I figured I was either looking a shaper above cake, or at dsl flow control .(I like hw flow ontrol, btw, I was perpetually showing off an ancient dsl modem with a 4 packet buffer and hw flowcontrol + fq-codel in the early days, as FQ = the time based AQM vs a fifo worked with that beautifully and cost 99% less cpu to do that way. Sadly most dsl modems moved to a switch and don't provide that backpressure anymore. Not quite sure you just tested that without a shaper.

3) Do want to verify you are not using BBR on your client? The 5ms simultaneous drops are still a mite puzzling.
I will need to do much more studying to find the answer to question 1. =) I assume it will atleast partially have to do with the RTT and bandwidth as part of the equation.

I think at this point, I need to start over somewhat considering I was NOT using the bandwidth limit within cake, and was setting the bandwidth limit oustide of it. Not thinking when I started, I was used to my old way of queueing in Mikrotik by setting up a simple queue with sfq.

It is interesting to me that you mention the hw flow control, and 4 packet buffer. Earlier I watched the youtube video for the first time of you explaining the 4 packet buffer with the people in the audience as packets. Also, I had watched another video where the gentleman had mentioned hardware. I will link both here for those interested.
https://www.youtube.com/watch?v=ZeCIbCzGY6k
https://www.youtube.com/watch?v=Q6SAcO-H6b0

BBR, good point I hadn't thought of that! I am using PopOS which is a derivative of Debian. Sure enough..
❯ sysctl net.ipv4.tcp_congestion_control
net.ipv4.tcp_congestion_control = bbr2

I assume for our testing purposes, we would want to disable that correct?
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Sun Dec 12, 2021 11:00 am

i do dream of hardware flow control, so no shaper, bandwidth=0 for cake as a tcp_nup test.

But i expect to be unlucky. Anyway, your fiddling with the frame parameters without a cake shaper active should have done nothing (I think), so that run was puzzling...

cake nat besteffort the_right_dsl_option bandwidth XMbit easiest to reason about. Do you have visibiity into the sync rate of the modem? Anyway, get that number right next then try
tcp_ndown... Note you cannot measure tcp rtt from this direction via flent directly, so we resort to inference or packet captures.

At some point I might ask you to stick your *.flent.gz files somewhere. Pleased to have so vastly improved tcp rtt.
OK, no bandwidth shaping, and the following cake config -- and tcp_nup tests..

name="cake-up" kind=cake cake-bandwidth=0bps cake-overhead=22 cake-mpu=44 cake-atm=ptm cake-overhead-scheme=""
cake-rtt=100ms cake-diffserv=diffserv4 cake-flowmode=triple-isolate cake-nat=yes cake-wash=no cake-ack-filter=filter

name="cake-down" kind=cake cake-bandwidth=0bps cake-overhead=22 cake-mpu=44 cake-atm=ptm cake-overhead-scheme=""
cake-rtt=100ms cake-diffserv=diffserv4 cake-flowmode=triple-isolate cake-nat=yes cake-wash=yes cake-ack-filter=none
tcp_nup_-_cake_4up_bw0.png
tcp_nup_-_cake_16up_bw0.png
Ahh yes, it looks like your assumptions were correct! ;) On to the next test! Here is the info you requested from the modem's interface:
Screenshot from 2021-12-12 01-58-04.png
Screenshot from 2021-12-12 01-59-18.png
Here is the config for the following tcp_ndown tests:

name="cake-up" kind=cake cake-bandwidth=19.0Mbps cake-overhead=22 cake-mpu=44 cake-atm=ptm cake-overhead-scheme=""
cake-rtt=100ms cake-diffserv=besteffort cake-flowmode=triple-isolate cake-nat=yes cake-wash=no cake-ack-filter=filter

name="cake-down" kind=cake cake-bandwidth=100.0Mbps cake-overhead=22 cake-mpu=44 cake-atm=ptm cake-overhead-scheme=""
cake-rtt=100ms cake-diffserv=besteffort cake-flowmode=triple-isolate cake-nat=yes cake-wash=yes cake-ack-filter=none
tcp_ndown_-_cake_4down.png
tcp_ndown_-_cake_4down1.png
tcp_ndown_-_cake_16down.png
tcp_ndown_-_cake_16down1.png
Now this has me interested looking at this data.. running a RRUL test as well, because why not ;)
rrul_-_cake_best_effort.png
Ask and you shall receive! Here are the files:
http://zylone.org/taht/tcp_nup-2021-12- ... 0.flent.gz
http://zylone.org/taht/tcp_nup-2021-12- ... 0.flent.gz
http://zylone.org/taht/tcp_ndown-2021-1 ... n.flent.gz
http://zylone.org/taht/tcp_ndown-2021-1 ... n.flent.gz
http://zylone.org/taht/rrul-2021-12-12T ... t.flent.gz
You do not have the required permissions to view the files attached to this post.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Sun Dec 12, 2021 12:26 pm

You don't have hw flow control.

Nice to know (I guess) that BBR2 still struggles with itself. Try resetting that to cubic on the up, please, and shape to 19

add ack-filter to the up

I'm running cubic on that server for the down.

Your baseline rtt might drop in half without bonding OR if you can disable interleaving (yes, as well as your bandwidth).
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Sun Dec 12, 2021 4:59 pm

Is it possible to scrape that rate? cake supports dynamically changing it's config *without* reloading the qdisc, but I doubt mikrotik can do that with their api (?) tc qdisc change dev whatever cake bandwidth the_new_bandwidth. You should be able to get really close to the actual uplink rate (22xxxkps) with the right framing. Those little ping spikes are a bit puzzlng (something out of band like ppp-oe?) I note some dhcp and some ppp messages now exist in some implementations that actually do send the link and/or shaped rate and framing.

Your download was really pretty. But anyway, I'd like to solidify the upload using cubic at 19mbit first, ack-filter on (I worry about that option), then I'd love to see sfq (unshaped and shaped) to the same rate with both bbrv2 and cubic - We are kinda getting down to attempting rigorous science here, so perhaps scripting, and some packet captures are in order. On the other hand, if you can keep the
tested options straight in the -t option we can easily compare things later. I have a long standing hypothesis that since SFQ was so popular in the wisp markets, (ubnt uses it), and I long ago proved it was too short to sustain fat tcp flows, that it was acting as an AQM also in this market, which is why the observed bufferbloat was only in the 80-100ms range, and as people started shaping to faster and faster rates and using 8+ multiflow speedtests, didn't notice they were killing single flow tcp performance. ( https://www.bufferbloat.net/projects/bl ... _Must_Die/ ). The poor results I'd got then however, predate the advent of the linux stack's pacing and single flows have actually been scaling higher than 12mbit since against sfq's default 128 packet limit.

The reason why the rrul upload looks spotty is actually more related to sampling error, not an actual problem per se', and you are also zoomed way in. You can scale plots relative to each other as you wish, or combine them, via flent. I *like* to zoom in but try to stay cognizant of the scale, and there's a version of the plot that won't zoom on you, also.

Somewhat puzzled about the QoS stuff, but I'd rather get the bandwidth param right first. I note I'm not a huge fan of QoS in the first place due to all the differing interpretrations, and there was also a bug in some version or another that wasn't readng the dscp field properly with some encapsulations. cake has a "wash" option if you are actually seeing mismarks on ingress, or are doing something special on egress that you don't want upstream to see. i do keep hoping we can "export" a standards compliant diffserv set in the hope that the ISP might respect it, and vice versa...

The rrul test is a *stress* test using greedy traffic and not indicative of the intent of QoS. Were it to be more representative, it would send voip-like isochronous traffic through the VI queue, videoconferencing 16ms frame-like traffic through the video queue, and something torrent like through the background queue. It semi-intentiionally and semi as a mistake, only excercises 3 of the 4 cake diffserv4 or wifi hw queues, rrulv2 does this more right, haven't finished the spec yet.

Demonstrating the sad results of sending greedy traffic through a qos system that *thinks* its traffic was going to obey the rules was also on my mind at the time. You still see a lot of strict priority queues out there where if one user lucks into the right dscp marking, they can starve out everyone else. Cake's game theory here uses soft admission control so that that doesn't happen, and in general shows the benefits of short queues and 5 tuple fair queuing over any form of qos, and furthermore does per host fq, so the worst a user can do is do themselves in not everybody else.

There are 110 other tests in the suite, i've got rather good at reading the rrul test over the years, it's the way to get a picture with the least amount of effort, then we do the tcp_nup and down tests. I might not have needed to suggest that had I not noticed that it looked like you were running BBR. The square wave tests are useful,, as are the various _var versions which let you test different servers.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Sun Dec 12, 2021 6:41 pm

You don't have hw flow control.

Nice to know (I guess) that BBR2 still struggles with itself. Try resetting that to cubic on the up, please, and shape to 19

add ack-filter to the up

I'm running cubic on that server for the down.

Your baseline rtt might drop in half without bonding OR if you can disable interleaving (yes, as well as your bandwidth).
Roger that, I figured no hw control was the case.

I have changed net.ipv4.tcp_congestion_control=cubic on the client, and will use the same settings as the last test which have 19M for upload and ack=filter. Pasting config here for sanity.

name="cake-up" kind=cake cake-bandwidth=19.0Mbps cake-overhead=22 cake-mpu=44 cake-atm=ptm cake-overhead-scheme=""
cake-rtt=100ms cake-diffserv=besteffort cake-flowmode=triple-isolate cake-nat=yes cake-wash=no cake-ack-filter=filter

name="cake-down" kind=cake cake-bandwidth=100.0Mbps cake-overhead=22 cake-mpu=44 cake-atm=ptm cake-overhead-scheme=""
cake-rtt=100ms cake-diffserv=besteffort cake-flowmode=triple-isolate cake-nat=yes cake-wash=yes cake-ack-filter=none
tcp_nup_-_cake_4up_streams_client_cubic.png
tcp_nup_-_cake_16up_streams_client_cubic.png
I am going to upload data to my google drive. Hopefully that makes things a little easier to keep track of.
https://drive.google.com/drive/folders/ ... sp=sharing

This run will be in folder 1-Change to cubic on client

It appears that I cannot turn off interleaving in the modem. However, I can tell it to use line 1, line 2, or both. I will have to wait until next weekend to test this out. The ol lady is home and if she losses her internet.. well, we all know where that leads us ;)
You do not have the required permissions to view the files attached to this post.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Sun Dec 12, 2021 7:09 pm

I think she'll be happy with your efforts so far.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Sun Dec 12, 2021 7:45 pm

Is it possible to scrape that rate? cake supports dynamically changing it's config *without* reloading the qdisc, but I doubt mikrotik can do that with their api (?) tc qdisc change dev whatever cake bandwidth the_new_bandwidth. You should be able to get really close to the actual uplink rate (22xxxkps) with the right framing. Those little ping spikes are a bit puzzlng (something out of band like ppp-oe?) I note some dhcp and some ppp messages now exist in some implementations that actually do send the link and/or shaped rate and framing.

Your download was really pretty. But anyway, I'd like to solidify the upload using cubic at 19mbit first, ack-filter on (I worry about that option), then I'd love to see sfq (unshaped and shaped) to the same rate with both bbrv2 and cubic - We are kinda getting down to attempting rigorous science here, so perhaps scripting, and some packet captures are in order. On the other hand, if you can keep the
tested options straight in the -t option we can easily compare things later. I have a long standing hypothesis that since SFQ was so popular in the wisp markets, (ubnt uses it), and I long ago proved it was too short to sustain fat tcp flows, that it was acting as an AQM also in this market, which is why the observed bufferbloat was only in the 80-100ms range, and as people started shaping to faster and faster rates and using 8+ multiflow speedtests, didn't notice they were killing single flow tcp performance. ( https://www.bufferbloat.net/projects/bl ... _Must_Die/ ). The poor results I'd got then however, predate the advent of the linux stack's pacing and single flows have actually been scaling higher than 12mbit since against sfq's default 128 packet limit.

The reason why the rrul upload looks spotty is actually more related to sampling error, not an actual problem per se', and you are also zoomed way in. You can scale plots relative to each other as you wish, or combine them, via flent. I *like* to zoom in but try to stay cognizant of the scale, and there's a version of the plot that won't zoom on you, also.

Somewhat puzzled about the QoS stuff, but I'd rather get the bandwidth param right first. I note I'm not a huge fan of QoS in the first place due to all the differing interpretrations, and there was also a bug in some version or another that wasn't readng the dscp field properly with some encapsulations. cake has a "wash" option if you are actually seeing mismarks on ingress, or are doing something special on egress that you don't want upstream to see. i do keep hoping we can "export" a standards compliant diffserv set in the hope that the ISP might respect it, and vice versa...

The rrul test is a *stress* test using greedy traffic and not indicative of the intent of QoS. Were it to be more representative, it would send voip-like isochronous traffic through the VI queue, videoconferencing 16ms frame-like traffic through the video queue, and something torrent like through the background queue. It semi-intentiionally and semi as a mistake, only excercises 3 of the 4 cake diffserv4 or wifi hw queues, rrulv2 does this more right, haven't finished the spec yet.

Demonstrating the sad results of sending greedy traffic through a qos system that *thinks* its traffic was going to obey the rules was also on my mind at the time. You still see a lot of strict priority queues out there where if one user lucks into the right dscp marking, they can starve out everyone else. Cake's game theory here uses soft admission control so that that doesn't happen, and in general shows the benefits of short queues and 5 tuple fair queuing over any form of qos, and furthermore does per host fq, so the worst a user can do is do themselves in not everybody else.

There are 110 other tests in the suite, i've got rather good at reading the rrul test over the years, it's the way to get a picture with the least amount of effort, then we do the tcp_nup and down tests. I might not have needed to suggest that had I not noticed that it looked like you were running BBR. The square wave tests are useful,, as are the various _var versions which let you test different servers.
I am not sure what you mean by scrape the rate? Do you mean change the bandwidth limit real time during a test, or possibly using it as part of a script to help automate testing using the API?

Here are the test results:

SFQ shaped cubic
tcp_nup_-_SFQ_4up_shaped_cubic.png
tcp_nup_-_SFQ_16up_shaped_cubic.png
SFQ unshaped cubic
tcp_nup_-_SFQ_4up_unshaped_cubic.png
tcp_nup_-_SFQ_16up_unshaped_cubic.png
SFQ shaped bbr
tcp_nup_-_SFQ_4up_shaped_bbr.png
tcp_nup_-_SFQ_16up_shaped_bbr.png
SFQ unshaped bbr
tcp_nup_-_SFQ_4up_unshaped_bbr.png
tcp_nup_-_SFQ_16up_unshaped_bbr.png
All data can be found in the 2-SFQ Testing foler
https://drive.google.com/drive/folders/ ... sp=sharing

Just wanted to say, thank you for your analysis and education. I need to go back a few more times and re-read this thread to try and consume it all better. =) I am going to set the client back to cubic and fire cake back up and play around with it some more to see if I can get the framing tweaked to maybe get closer to the upload rate as you had stated.
You do not have the required permissions to view the files attached to this post.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Sun Dec 12, 2021 7:49 pm

I think she'll be happy with your efforts so far.
I have given up asking her how the internet is doing.. she is very binary. It either works or it doesn't. AHAHA!!!

Only thing I could do to make her happier is move the AP on that side of the house into her room so she has a better signal from her devices, she is right on the edge of GOOD 5g. I have done ALOT of testing/tweaking on the wifi here as well (all mikrotik). That will be the next round of testing after all of this. Go back and re-test/tweak the wifi and out of curiosity see how the flent tests fair over the air compared to over the wire.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Sun Dec 12, 2021 9:04 pm

OK, I believe I have the framing sync'd up pretty well.

To quote moeller0 above, he is spot on with the overhead of 44 and also spot on with the fact that VDSL2 likey having a lower overhead than 44.

Also, to note.. I have the bandwidth limit set to 22M, which is the bonded upstream rate of the modem. So, it is obvious to me that the ISP is capping me at 20mbps as per service agreement.

I am really splitting hairs at this point..
tcp_nup_-_CAKE_1up_ptm_overhead_40_mpu80.png

Here it is zoomed in
tcp_nup_-_CAKE_1up_ptm_overhead_40_mpu80-zoom.png
Here is a graph showing the difference between number of streams with overhead 40, and mpu 80
tcp_nup_-_CAKE_1up_ptm_overhead_40_mpu80-streams.png
You do not have the required permissions to view the files attached to this post.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Sun Dec 12, 2021 9:32 pm

I am pretty sure you have the overhead right at this point. I'm also happy to see it not crash.

In the interest of science, however, if at some point you could also repeat the 4up test with htb + fq_codel, that would be interesting. Also if you were to enable ecn for a fq_codel vs cake comparison on your client.

While we definitely get more throughput and less FQ latency from cake, with more bounded results from that side
bothersome2.png
cake's "Cobalt" AQM tcp RTT is oscillating far more than I would have expected. SFQ's overly short buffers are winning pretty good here.
bothersome.png
You do not have the required permissions to view the files attached to this post.
Last edited by dtaht on Sun Dec 12, 2021 10:01 pm, edited 1 time in total.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Sun Dec 12, 2021 9:39 pm

your 16up result seems kind of anomalous.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Sun Dec 12, 2021 9:50 pm

by "scraping the rate" I meant rolling some sort of script to pull it off the modems sync rate, but since your isp is shaping you instead, stick to the 19.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Sun Dec 12, 2021 9:56 pm

your 16up result seems kind of anomalous.
Agreed, I just ran it again.. and got similar results as before. Almost identical, the speed is wonky, however latency is still great.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Sun Dec 12, 2021 9:57 pm

by "scraping the rate" I meant rolling some sort of script to pull it off the modems sync rate, but since your isp is shaping you instead, stick to the 19.
Ahh ok, gotcha! Well, that is the interesting thing not sure you noticed that last go around I had set the rate to 22 to match the rate in the modem, and it appears to be good. I assume you say keep it at 19 to give myself some headroom in case that rate were to drop in the future?
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Sun Dec 12, 2021 10:03 pm

no, I didn't notice. 19 makes my head hurt less for now? In general dsl tends to fluxuate in rain, over the course of a day, etc, so leaving yourself headroom is a good idea.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Sun Dec 12, 2021 10:11 pm

as for the up16 anomaly, try htb + fq_codel...

And at some point, when your gf is not looking, reboot and try cake again at up16? I return to my initial objective, not crashing. This is 5.6.x? cpu arch?
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Mon Dec 13, 2021 12:13 am

no, I didn't notice. 19 makes my head hurt less for now? In general dsl tends to fluxuate in rain, over the course of a day, etc, so leaving yourself headroom is a good idea.
Good point, not to mention I thought about it afterwards.. even thought the sync rate is 22, the ISP is obviously holding me at 20. So, as not to let them be the bottleneck, 19 makes sense in that case as well.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Mon Dec 13, 2021 12:37 am

as for the up16 anomaly, try htb + fq_codel...

And at some point, when your gf is not looking, reboot and try cake again at up16? I return to my initial objective, not crashing. This is 5.6.x? cpu arch?
Hmm, well after a fresh reboot.. the results are the same for the up16 and cake. This is odd. Per the Mikrotik 7.1 release changelog, it is running 5.6.3, and this router has a quad core 350-1400 (auto) MHz arm64 chip. Looking up the model of the CPU, it appears to be a Marvell ARMADA 7040 https://www.marvell.com/content/dam/mar ... 017-12.pdf

As far as fq_codel, the results appear to be pretty similar to cake..
tcp_nup_-_fqcodel_16up_ptm_overhead_40_mpu80.png
You do not have the required permissions to view the files attached to this post.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Mon Dec 13, 2021 1:15 am

I'm kinda hoping this is a bug in flent!!!
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Mon Dec 13, 2021 1:19 am

I'm kinda hoping this is a bug in flent!!!
Just for the heck of it.. I added some more data here.. I added 8 and 32. It looks like even with 8 it starts to drop.. and gets worse with more, however 16 and 32 are roughly the same.
You do not have the required permissions to view the files attached to this post.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Mon Dec 13, 2021 2:44 am

I'm off researching kernel versions. NOT relevant to this was the wireguard patch that went into 5.7.

https://github.com/dtaht/sch_cake/issue ... -984503893

If you have a mikrotik account (I am not a mikrotik customer), and can file a bug, I'm a bit concerned.

I wouldn't mind, however, returning to testing downloads. Your 16 flow download was perfect...
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Mon Dec 13, 2021 2:46 am

I'd wanted a tcp rtt plot for the 4up test also. You can recreate my cdf if you like comparing the sfq vs cake vs fq-codel.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Mon Dec 13, 2021 3:23 am

also, 8, 16, 32 with SFQ?
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Mon Dec 13, 2021 3:37 am

I hate bugs. :/ Anyway, a packet capture of the 16 flow test would be good at this point.

tcpdump -i your-interface -s 128 -w 16flowscake.cap

We'd never tested bonding until today... and I could imagine us having a lot of packet reordering in a variety of ways.

Assuming this is a bug that isn't in flent - it's one of those darn things that didn't show up in testing because we didn't stress things hard enough. The weird thing is I keep seeing artifacts in the latest release of all this stuff in newer kernels, that don't match the kinds of results we were getting when we first mainlined this code. https://forum.openwrt.org/t/validating- ... /111123/10
is one example.

Can't even rule out a bug in your host's tcp. WI have a research group that can take a look at this, try to reproduce.

ANYWAY. At least it doesn't crash and you have consistently low latency, and probably rarely stress out a box this hard. thx so much for being all over this with me!
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Mon Dec 13, 2021 5:44 am

I am pretty sure you have the overhead right at this point. I'm also happy to see it not crash.

In the interest of science, however, if at some point you could also repeat the 4up test with htb + fq_codel, that would be interesting. Also if you were to enable ecn for a fq_codel vs cake comparison on your client.

While we definitely get more throughput and less FQ latency from cake, with more bounded results from that side

bothersome2.png

cake's "Cobalt" AQM tcp RTT is oscillating far more than I would have expected. SFQ's overly short buffers are winning pretty good here.

bothersome.png

I totally missed this post earlier. My brain went into shutdown for a bit apparently, went and vegged out for a bit.

Testing of fq_codel with ECN, vs cake 4up..
tcp_nup_-_cake_4up.png
tcp_nup_-_cake_4up1.png
You do not have the required permissions to view the files attached to this post.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Mon Dec 13, 2021 5:49 am

I'm off researching kernel versions. NOT relevant to this was the wireguard patch that went into 5.7.

https://github.com/dtaht/sch_cake/issue ... -984503893

If you have a mikrotik account (I am not a mikrotik customer), and can file a bug, I'm a bit concerned.

I wouldn't mind, however, returning to testing downloads. Your 16 flow download was perfect...
Very cool! Hopefully this week, I will be getting my brother's new Mikrotik router installed and testing wireguard between his house and mine.

I will definitely file a bug with Mikrotik.

Right on, what kind of download tests would ya like to run
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Mon Dec 13, 2021 5:55 am

I'd wanted a tcp rtt plot for the 4up test also. You can recreate my cdf if you like comparing the sfq vs cake vs fq-codel.
Yes sir, here ya go.. The plot thickens! =P
tcp_nup_-_sfq_4up2.png
tcp_nup_-_sfq_4up3.png
You do not have the required permissions to view the files attached to this post.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Mon Dec 13, 2021 6:08 am

also, 8, 16, 32 with SFQ?
Here ya go!
tcp_nup_-_sfq_32up.png
tcp_nup_-_sfq_32up1.png
tcp_nup_-_sfq_32up2.png
tcp_nup_-_sfq_32up3.png
tcp_nup_-_sfq_32up4.png
tcp_nup_-_sfq_32up5.png
You do not have the required permissions to view the files attached to this post.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Mon Dec 13, 2021 6:28 am

I hate bugs. :/ Anyway, a packet capture of the 16 flow test would be good at this point.

tcpdump -i your-interface -s 128 -w 16flowscake.cap

We'd never tested bonding until today... and I could imagine us having a lot of packet reordering in a variety of ways.

Assuming this is a bug that isn't in flent - it's one of those darn things that didn't show up in testing because we didn't stress things hard enough. The weird thing is I keep seeing artifacts in the latest release of all this stuff in newer kernels, that don't match the kinds of results we were getting when we first mainlined this code. https://forum.openwrt.org/t/validating- ... /111123/10
is one example.

Can't even rule out a bug in your host's tcp. WI have a research group that can take a look at this, try to reproduce.

ANYWAY. At least it doesn't crash and you have consistently low latency, and probably rarely stress out a box this hard. thx so much for being all over this with me!
Yeah, bugs are no fun! The possible packet reordering makes sense because of the interleaving.

I wonder if you are seeing these artifacts due to a lack of a large enough testing pool? I understand the frustrations, I used to do software QA for several years and the developers hated me. ;)

That is a good point, I was thinking about it earlier.. I should spin up a few different VM's and test from them to see if I get the same results. Just to see if it is something with my host computer/tcp stack/kernel, etc.. I am running a custom kernel on this box which is what got me thinking about it.. 'xanmod' kernel.

Definitely no crashing going on, that is great and since it is just me and the ol lady.. yeah I doubt we hit the router hard ever honestly.. there are MAYBE 20 devices total that have access to the internet. Maybe throw 10-15 VM's at any one time on top of that but even still..

None the less, I for whatever reason really dig this stuff. In my mind, why have all this fancy computer stuff if your network is not optimized? ;) You are very welcome, and thank you for all your work and insight!

Anyhow, here is a link to my google drive with a fresh cake 16up flent test, and packet capture!
https://drive.google.com/drive/folders/ ... sp=sharing
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Mon Dec 13, 2021 1:27 pm

Could you delete the --step-size portion of your flent command line? Really hoping this is flent and sampling error...

In fq-codel, we have what is now the second largest queue management system in the world, from a standing start of me and eric dumazet at 4AM PDT in may of 2012, admittedly a distant second to FIFO. iOS, OSX are billions, the linux cloud is 100s of millions, container instances, billions, and router and iot instances on ethernet in the 10s of millions, wifi, somewhere between 10s to 100s of millions. Implementations done by others include those apple versions, the ns3 versions, freebsd and openbsd, and ghu knows where else.

This algorithm is designed to be unobtrusive, and on by default, and although we collect good statistics, it's rare anyone posts them, and for example, microtik has no way to collect them. And there are of course hundreds of other network algorithms in play, all evolving, at the same time the hardware is morphing out from underneath it. I remember giving a lecture at the university at modena ( https://www.bufferbloat.net/projects/ce ... ember-2012 ) where someone said that I must be "so proud", and I said, "no, I'm terrified".

For all that, there's maybe, oh, 200? people in the world that understand network congestion control well, and most of those are retired or nearly so. I certainly wish there were more, because in the end we're kind of responsible for keeping the entire internet from collapsing. The job doesn't pay well, either.

Most of the ongoing validation of correctness has come from thousands of users on dozens of platforms happy with the results from waveform or dslreports, which is kind of limited compared to the larger flent test suite, and certainly I have no budget for hardware, a full tilt lab, or extensive automation, and the problem space has a few hundred dimensions in the end.

Recently I got a small grant, in part, to validate we and the other implementations still got it right: https://nlnet.nl/project/CeroWRT-II/
Last edited by dtaht on Mon Dec 13, 2021 4:47 pm, edited 1 time in total.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Mon Dec 13, 2021 1:32 pm

your also degrading over flows sfq results are perversely cheering me up. flent bug. tcp bug. me, not mentall concieving how a 19mbit bonded uplink "should work". the packet caps will tell. but it's 3am here, going to back to bed, thx for testing sfq. Also I would consider the xianmod kernel highly experimental, and if you have a more common distro kernel to test with on another vm or on bare metal that would help rule out the host. bbrv2 is *highly* experimental and also modifies the stack in some subtle ways.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Mon Dec 13, 2021 4:04 pm

I'm glad you are digging it, and I can feed off your energy somewhat.

for analyzing packet captures I use wireshark a lot, especially looking for retransmits, reorders, and the various plots....

I often use tcptrace and xplot.org - apt-get install tcptrace xplot.org

Example of use

tcptrace -G thecapture.cap

look for a big one in the format xtoy or ytox. xplot that tsv or rtt . xpl file.

Given the rtt variance of the fq-codel ecn test (please do a cap), which should NOT have been that bad... I'm concerned that there might even be something as low level as a crc error here. Your capture wouldn't show that (just losing packets), but IF ecn is enabled, and working, you should be about to see an ece in the ack data (from an upload), or a CE on a download.

I have meetings much of today, might not get on it. Edit: i did. no crc errors.. but..
Last edited by dtaht on Mon Dec 13, 2021 5:54 pm, edited 1 time in total.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Mon Dec 13, 2021 4:29 pm

wow. You don't have enough loss on that link, only a couple retransmits to speak of, and I'm leaning towards an issue with your host tcp. At one level, it's great, but extremely, extremely weird. are you using the "fq" qdisc on your host, also? And sure you are using cubic?
throughput.png
These RTTs strongly imply something other than the window is regulating the flow. Note that wireshark is far more accurate than flent (which can only sample), so it missed the really big RTTs.
rtt.png
You do not have the required permissions to view the files attached to this post.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Tue Dec 14, 2021 1:15 am

Whew, it has been a long day. First thing this morning, I figured.. to heck with that kernel.. well I blew the box up, wouldn't boot into X.. so anyhow, decided to just blow it away and start with a clean install instead of futzing with Linux. I needed to get to work ;) Other than that, whew I havn't been able to sit down and do any testing until now.

WIth that said, I realized when I changed bbr to cubic the other day, I didn't restart the machine. So I was probably still testing with bbr. *slaps forehead* So I am not sure if it was that, or that dang kernel. None the less, I am on a fresh install with the following:

5.15.5-76051505-generic
net.ipv4.tcp_congestion_control = cubic

I went ahead and ran a test with cake, here are some results.. I think we are in a MUCH better starting position now. Or so I hope ;) I will attach a capture of the traffic as well..
tcp_nup_-_cake_4up_1_stream.png
tcp_nup_-_cake_4up_4_stream.png
tcp_nup_-_cake_4up_8_stream.png
tcp_nup_-_cake_4up_16_stream.png
tcp_nup_-_cake_4up_32_stream.png
Screenshot from 2021-12-13 16-58-27.png
Screenshot from 2021-12-13 17-00-43.png

Also, just for the sake of data.. here is the info on the modem not showing any CRC errors.
Screenshot from 2021-12-13 17-12-41.png
You do not have the required permissions to view the files attached to this post.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Tue Dec 14, 2021 1:25 am

My brain is fried.. forgot to add the packet capture..

https://drive.google.com/drive/folders/ ... sp=sharing
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Tue Dec 14, 2021 1:29 am

That's MUCH more correct looking, thank you!

Next, to see if ecn is working properly, (e.g. the mikrotik marking it correctly, the path not stomping on it) you can run the exact same test series, but with:

sudo sysctl -w net.ipv4.tcp_ecn=1

I use ecn primarily as an AQM debugging tool (given how rarely it's turned on in the field) and for all I know (without seeing the capture) you had it on just now.

EDIT: You didn't.
Last edited by dtaht on Tue Dec 14, 2021 2:05 am, edited 2 times in total.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Tue Dec 14, 2021 1:49 am

Thank you for the packet capture. You can, btw, filter out all your other traffic by specifying "host dallas.starlink.taht.net"

This is the correct sort of carnage that cubic does, there's retransmits, dup acks, out of order stuff - strangely comforting after puzzling over that last capture all night!
correct_cubic_carnage.png
The xplot equivalent of this plot is prettier (IMHO), and in either case, if you zoom in, you can see the "sack" blocks in the bottom line, showing loss and recovery.

With ECN enabled you won't see sacks (except when there is actual loss), CE's and CWRs.

PS I'm not too concerned about the performance dropoff at 32 flows in that we are pounding the link flat and loss going up geometrically, however if it returns or gets worse... my concern was some sort of memory leak hurting the mikrotik box, well I had a lot of concerns! Thx so much for the exhaustive testing, again.
You do not have the required permissions to view the files attached to this post.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Tue Dec 14, 2021 1:59 am

That's MUCH more correct looking, thank you!

Next, to see if ecn is working properly, you can run the exact same test series, but with:

sudo sysctl -w net.ipv4.tcp_ecn=1

I use ecn primarily as an AQM debugging tool (given how rarely it's turned on in the field) and for all I know (without seeing the capture) you had it on just now.
WOOHOO! =) Glad we are moving in the right direction now, and you are a mind reader, I sure did.. it was set to 2.

I have just set it to 1, and here are the results..
tcp_nup_-_cake_4up.png
tcp_nup_-_cake_8up.png
tcp_nup_-_cake_16up.png
tcp_nup_-_cake_32up.png
tcp_nup_-_cake_32up1.png
Screenshot from 2021-12-13 17-54-58.png
Screenshot from 2021-12-13 17-55-10.png
You do not have the required permissions to view the files attached to this post.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Tue Dec 14, 2021 2:04 am

Once again, I forgot the capture for that last round with ECN = 1 HAH! Here ya go

https://drive.google.com/drive/folders/ ... sp=sharing
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Tue Dec 14, 2021 2:11 am

Thank you for the packet capture. You can, btw, filter out all your other traffic by specifying "host dallas.starlink.taht.net"

This is the correct sort of carnage that cubic does, there's retransmits, dup acks, out of order stuff - strangely comforting after puzzling over that last capture all night!

correct_cubic_carnage.png

The xplot equivalent of this plot is prettier (IMHO), and in either case, if you zoom in, you can see the "sack" blocks in the bottom line, showing loss and recovery.

With ECN enabled you won't see sacks (except when there is actual loss), CE's and CWRs.

PS I'm not too concerned about the performance dropoff at 32 flows in that we are pounding the link flat and loss going up geometrically, however if it returns or gets worse... my concern was some sort of memory leak hurting the mikrotik box, well I had a lot of concerns! Thx so much for the exhaustive testing, again.
Ahh yes, filtering the other flows is a good idea! I have used wireshark at an elementary level.. first time using xplot, I must use it and learn it better! Well, both tools for that matter. That is pretty awesome being able to see the sack blocks. So that first trace was with ECN=2 which was out of the box on this install.. and this last run was with ECN=1. I agree with ya on the 32, I just figured I would throw it in there on these runs to see what happens. It is not crushing as bad as before though, so that is cool too.

So you mentioned the cubic carnage, please excuse my ignorance, but I assume that is the best out of any we could use? I take it at least that it better than BBR? Or maybe better said.. 'better' in this particular environment. I know some tools are better than others depending on the use case.

You are very welcome, glad I can be of some help with data at the least since you are educating me in the process!
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Tue Dec 14, 2021 2:18 am

Nope. We failed to negotiate ecn. (in the packet capture the syn had ecn cwr, the syn/ack didn't, could be the modem, could be failure to read the dscp field properly on the mikrotik, could be my server, will check the server as soon as I remember the password)

But comforting that the result was essentially the same. Leave ecn on (since it fails to negotiate anyway)... Now, that we have a consistent setup?? Could you repeat the SFQ test and the fq-codel test,
same scenarios, if you aren't too wiped out? I'll go check to see some stuff on the server.

tcpdump with the host

Anyway, with that as a baseline, and if you have energy, the download tests would also be good (but let's shy away from anything involving qos dscp for now, so if you really go nuts, rrul_be doesn't test that). I'm going to make some dinner...
Last edited by dtaht on Tue Dec 14, 2021 2:32 am, edited 1 time in total.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Tue Dec 14, 2021 2:31 am

In order for me to look at that machine (ecn neg might be disabled) I will need to shut it down and put a new password on it. Anyway, if yer still testing, let me know when done.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Tue Dec 14, 2021 2:34 am

Go ahead and do what you need to do. I need to go eat dinner myself. I will check back here before any further testing!
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Tue Dec 14, 2021 2:49 am

To summarize a few things. Yesterday we ended up in a state where a bunch of flows weren't even going through the host at the right rate, so we weren't stress testing the qdisc, and thus not seeing any difference in latency between the three different qdiscs under test. It was seeing SFQ act the same as all the other ones as we added load, that made me scratch my head - and you blow your machine away entirely! thx. It felt pretty good to me, too. :) Anway... I'm sitting here overfocused on making sure the mikrotik is working right, and whilst I am VERY intereted in captures and BBRv1 and BBRv2 behavior, in the context of this thread I just want to make sure mikrotik has got these new qdiscs working exactly right. My long term goal is that fq-codel in particular, go on by default on all interfaces in this or some future mikrotik release...

"So you mentioned the cubic carnage, please excuse my ignorance, but I assume that is the best out of any we could use? I take it at least that it better than BBR? Or maybe better said.. 'better' in this particular environment. I know some tools are better than others depending on the use case."

I'm enjoying very much sharing my tcp knowledge with you whilst you test. i might end up giving some reading assignments though...

tcp is designed, in the end, to be able to reliably carry packets via any means or combination of circumstances possible, as per rfc2549, which is a good read.

So when I said "cubic carnage" I was mostly being allerative. I've seen MUCH MUCH worse, and was actually expecting significant episodes of reordering from the bonded link, but didn't. Anyway, by
eyeball that was the correct behavior of cake and cubic together.

As you pound more and more flows through a link, (or you have a shorter and shorter buffer) we start hitting another phase of tcp (slow start and congestion avoidance are what i usally talk about, but i do allude to this in my apnic talk), we lose so many packets that we trigger tail loss and a 250ms RTO ("hello are you still there?"0, which is an even more extreme form of congestion control (it completely resets the tcp window also). This is probably the cause of the ever "long tail" above the 99th percentile of the cdf plot as you add more and more flows. Add 64 flows, 128 flows, eventually flows won't even be able to get started...

This was pretty good: https://blog.apnic.net/2018/03/19/strik ... cillation/
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Tue Dec 14, 2021 2:56 am

To summarize a few things. Yesterday we ended up in a state where a bunch of flows weren't even going through the host at the right rate, so we weren't stress testing the qdisc, and thus not seeing any difference in latency between the three different qdiscs under test. It was seeing SFQ act the same as all the other ones as we added load, that made me scratch my head - and you blow your machine away entirely! thx. It felt pretty good to me, too. :) Anway... I'm sitting here overfocused on making sure the mikrotik is working right, and whilst I am VERY intereted in captures and BBRv1 and BBRv2 behavior, in the context of this thread I just want to make sure mikrotik has got these new qdiscs working exactly right. My long term goal is that fq-codel in particular, go on by default on all interfaces in this or some future mikrotik release...

"So you mentioned the cubic carnage, please excuse my ignorance, but I assume that is the best out of any we could use? I take it at least that it better than BBR? Or maybe better said.. 'better' in this particular environment. I know some tools are better than others depending on the use case."

I'm enjoying very much sharing my tcp knowledge with you whilst you test. i might end up giving some reading assignments though...

tcp is designed, in the end, to be able to reliably carry packets via any means or combination of circumstances possible, as per rfc2549, which is a good read.

So when I said "cubic carnage" I was mostly being allerative. I've seen MUCH MUCH worse, and was actually expecting significant episodes of reordering from the bonded link, but didn't. Anyway, by
eyeball that was the correct behavior of cake and cubic together.

As you pound more and more flows through a link, (or you have a shorter and shorter buffer) we start hitting another phase of tcp (slow start and congestion avoidance are what i usally talk about, but i do allude to this in my apnic talk), we lose so many packets that we trigger tail loss and a 250ms RTO ("hello are you still there?"0, which is an even more extreme form of congestion control (it completely resets the tcp window also). This is probably the cause of the ever "long tail" above the 99th percentile of the cdf plot as you add more and more flows. Add 64 flows, 128 flows, eventually flows won't even be able to get started...

This was pretty good: https://blog.apnic.net/2018/03/19/strik ... cillation/
I have a decent understanding of TCP, and I understand what you are saying about the tail loss, especially from the video I found the other day with you using the people as packets. I just pulled up the RFC and the other link. Awesome! I have some homework tonight =)
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Tue Dec 14, 2021 3:22 am

OK, it's back up. ECN neg is enabled (but the bits could be getting washed out on the path, OR I'd disabled it on the previous boot).

To go to your BBR vs cubic question. :lecture mode:

TCP reno was the "internet standard" for a long time. It had a "sawtooth", and an initial window of 2, and couldn't scale past some X mbits

circa 2006-2008 a bunch of things happened - the linux txqueuelen went from 100 to 1000, TSO (up to 42 packets in a single offload), appeared, and window "scaling" started to deploy,
and linux switched to tcp cubic, and wifi added packet aggregation...
the first was just... dumb, the second, a desparate attempt to make tcp saturate a wire better against weak cpus at the time (which it did) and window scaling, to make TCP scale to gbits and beyond, and cubic looked and was faster to grab bandwidth while seemingly doing no harm because of problems 1,2,3 not being well understood yet, and wifi aggregation not at all.

To compound things further, Linux went to IW10 to make the web server folk happy in 2010 ( https://tools.ietf.org/id/draft-gettys- ... ul-00.html ) ... everyone added more buffering to the modems ... failed to understand what bittorrent's real problem was...

and then we started noticing that classic voip and videoconferecing apps like skype, were not working anymore. Enter jim gettys, having his kids yell at him for transfering files to mit. https://gettys.wordpress.com/category/bufferbloat/ and me, in nicaragua, scratching my head as to why my internet radio, which had worked for years, had stopped working: http://the-edge.taht.net/post/Did_Buffe ... Net_Radio/

Anyway there's a lot of ranting between 2011 and 2021 I'll elide. BBR emerged from youtubes struggle to find a way to deliver data reliably whilst not ovebuffering overmuch (fast forward and reverse this helps with), and it's a *perfect* transport for streaming a single tcp session of recorded video like netflix (except they thus far haven't made BBR work well on bsd). BBR is better in many respects than cubic, especially if it is FQed (where it mostly lives in it's delay based regime), but it has some unpleasant modes where it dukes it out with cubic (to win), has trouble competing with itself (ideally where we use a sharded website today with 110+ different connections we'd switch to *one* BBR connection back to the "mainframe", and it doesn't presently respect RFC3168 ECN, or gentle packet loss and has it's own model of the network that is by god, superior to yours and your efforts to police or limit traffic so applications you like, like games, work well not relevant to that world picture of endless advertisements poured past your eyeballs.

https://queue.acm.org/detail.cfm?id=3022184

Despite my cynicism, I don't like cubic either - reno, what was so wrong with reno, and IW2? I ask. Apple just went IW10 too, with offloads, and those are not the right things for clients, either, iMHO... the internet is a communications network, not a tv... I'm really happy that the pending HTTP3 standard specifies reno, as by jumping on udp, where all our request response and voip/videoconferencing protocols reside, and even then most days I think inbound shaping with FQ-codel is the only way to keep all the applications besides web traffic working for everyone.

Anyway, BBRv2 is better than BBRv1, and I'm delighted to see new people trying the shiny stuff in circumstances where the designers didn't think about much, like on a 19Mbit fq-codeled link. And finding bugs. They like packet captures too and sometimes listen to reason. And I'm grouchy and it's time for dinner. I look forward to the rest of the tests!
Last edited by dtaht on Tue Dec 14, 2021 3:39 am, edited 1 time in total.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Tue Dec 14, 2021 3:39 am

OK, it's back up. ECN neg is enabled (but the bits could be getting washed out on the path, OR I'd disabled it on the previous boot).

To go to your BBR vs cubic question. :lecture mode:

TCP reno was the "internet standard" for a long time. It had a "sawtooth", and an initial window of 2, and couldn't scale past some X mbits

circa 2006-2008 a bunch of things happened - the linux txqueuelen went from 100 to 1000, TSO (up to 42 packets in a single offload), appeared, and window "scaling" started to deploy,
and linux switched to tcp cubic, and wifi added packet aggregation...
the first was just... dumb, the second, a desparate attempt to make tcp saturate a wire better against weak cpus at the time (which it did) and window scaling, to make TCP scale to gbits and beyond, and cubic looked and was faster to grab bandwidth while seemingly doing no harm because of problems 1,2,3 not being well understood yet, and wifi aggregation not at all.

To compound things further, Linux went to IW10 to make the web server folk happy in 2010 ( https://tools.ietf.org/id/draft-gettys- ... ul-00.html ) ... everyone added more buffering to the modems ... failed to understand what bittorrent's real problem was...

and then we started noticing that classic voip and videoconferecing apps like skype, were not working anymore. Enter jim gettys, having his kids yell at him for transfering files to mit. https://gettys.wordpress.com/category/bufferbloat/ and me, in nicaragua, scratching my head as to why my internet radio, which had worked for years, had stopped working: http://the-edge.taht.net/post/Did_Buffe ... Net_Radio/

Anyway there's a lot of ranting between 2011 and 2021 I'll elide. BBR emerged from youtubes struggle to find a way to deliver data reliably whilst not ovebuffering overmuch (fast forward and reverse this helps with), and it's a *perfect* transport for streaming a single tcp session of recorded video like netflix (except they thus far haven't made BBR work well on bsd). BBR is better in many respects than cubic, especially if it is FQed (where it mostly lives in it's delay based regime), but it has some unpleasant modes where it dukes it out with cubic (to win), has trouble competing with itself (ideally where we use a sharded website today with 110+ different connections we'd have *one* BBR connection back to the "mainframe", and it doesn't respect ECN, or gentle packet loss and has it's own model of the network that is by god, superior to yours!!

https://queue.acm.org/detail.cfm?id=3022184
Despite my cynicism, I don't like cubic either - reno, what was so wrong with reno, and IW2? I ask.

Anyway, BBRv2 is better than BBRv1, and I'm delighted to see new people trying the shiny stuff in circumstances where the designers didn't think about much, like on a 19Mbit fq-codeled link. And finding bugs. They like packet captures too.
WOW, thank you for all of that! I have many tabs in my browser to read now =) That is interesting that these other mechanisms work great for big fat single flows.. and not others. As if the dev's never talked to anyone else to see what the end user might do other than just watch videos all day! ;)

Reminds me of being a QA guy.. hence why the dev's always hated me.. because I would find the most ridiculous bugs, but after regression and happy path testing, I would hit it like an end user would. What do you mean this input field that SHOULD only be alphanumeric completely crashes everything when you put @#%&@#% in it?!

My brain buffer is bloated HAH! Like my brain is an old dialup BBS and your posts are like a 1gbit link! OK, enough of the dad jokes I suppose.. back to science! Off to run my tests, will report back in a bit!
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Tue Dec 14, 2021 3:52 am

OK, I am going to post SFQ and FQ_Codel separately so it doesn't get confusing..

Link to packet capture of 16 flows SFQ with ECN: https://drive.google.com/drive/folders/ ... sp=sharing

FQ test coming up in a few
tcp_nup_-_sfq_ecn_total.png
tcp_nup_-_sfq_ecn_4up.png
tcp_nup_-_sfq_ecn_8up.png
tcp_nup_-_sfq_ecn_16up.png
tcp_nup_-_sfq_ecn_32up.png
You do not have the required permissions to view the files attached to this post.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Tue Dec 14, 2021 4:01 am

And here are the FQ_Codel with ECN tests and packet capture:

https://drive.google.com/drive/folders/ ... sp=sharing
tcp_nup_-_fq_codel_ecn_totals.png
tcp_nup_-_fq_codel_ecn_4up.png
tcp_nup_-_fq_codel_ecn_8up.png
tcp_nup_-_fq_codel_ecn_16up.png
tcp_nup_-_fq_codel_ecn_32up.png
You do not have the required permissions to view the files attached to this post.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Tue Dec 14, 2021 4:41 am

Random thought.. I didn't answer your question earlier.. I do have qdiscs on this machine.. fq_codel in fact.. here is what is there by default..

qdisc mq 0: dev enp11s0 root
qdisc fq_codel 0: dev enp11s0 parent :c limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp11s0 parent :b limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp11s0 parent :a limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp11s0 parent :9 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp11s0 parent :8 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp11s0 parent :7 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp11s0 parent :6 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp11s0 parent :5 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp11s0 parent :4 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp11s0 parent :3 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp11s0 parent :2 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp11s0 parent :1 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Tue Dec 14, 2021 4:51 am

really large string of wtf moments, there. Can you return to cake? or turn off ecn? or both?

your mq - fq-codel might explain some other things, but not this.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Tue Dec 14, 2021 5:05 am

really large string of wtf moments, there. Can you return to cake? or turn off ecn? or both?

your mq - fq-codel might explain some other things, but not this.
Roger that, back to cake
net.ipv4.tcp_ecn = 0

Data:
https://drive.google.com/drive/folders/ ... sp=sharing
tcp_nup_-_cake_no_ecn_totals.png
tcp_nup_-_cake_no_ecn_4up.png
tcp_nup_-_cake_no_ecn_8up.png
tcp_nup_-_cake_no_ecn_16up.png
tcp_nup_-_cake_no_ecn_32up.png
You do not have the required permissions to view the files attached to this post.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Tue Dec 14, 2021 5:08 am

I just had an oh crap moment.. before this thread, I had been been toying around with other stuff, and just remembered I had put this in place.. I would imagine this has had some sort of affect in between client and router....

Firewall mangle rule in mikrotik..

;;; Set priority for WMM
chain=postrouting action=set-priority new-priority=from-dscp-high-3-bits
passthrough=yes log=no log-prefix=""
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Tue Dec 14, 2021 5:21 am

Well, don't do that then. :O ip is big-endian....

but a good test of fq-codel with ecn disabled would comfort me, first. There should be differences in the overall distribution particularly in the 32 flows test... but throughput should stay flat, not that horrible thing that just happened....
 
mducharme
Trainer
Trainer
Posts: 1740
Joined: Tue Jul 19, 2016 6:45 pm
Location: Vancouver, BC, Canada

Re: some quick comments on configuring cake

Tue Dec 14, 2021 5:33 am

Sorry to enter this thread with a question that is somewhat less relevant to the discussion and testing that is happening currently (which is all very interesting, even though I understand very little of it).

Is there any effort to have hardware ASIC implementations of codel, fq_codel, or cake? Most of the ASIC queues that are hardware offloaded use something called wred, which is weighted-red. I don't know much about it, but I would strongly suspect it is only a slight improvement over regular red. However, it is hardware offloaded by the ASIC, which is a major advantage at the ISP level if you are concerned about CPU utilization and scalability. A lot of major vendors use wred for all queuing. For me, what would be a killer app for these AQM solutions like cake would be if they could be offloaded to an ASIC and to use them with no regular CPU cost instead of being stuck with only wred. Are there any efforts in this area?
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Tue Dec 14, 2021 5:46 am

Well, don't do that then. :O ip is big-endian....

but a good test of fq-codel with ecn disabled would comfort me, first. There should be differences in the overall distribution particularly in the 32 flows test... but throughput should stay flat, not that horrible thing that just happened....
HAH, 20 lashes! *banging head on wall*

OK, here is a CLEAN test!

CAKE with NO ECN and no dumb mangle rule!

https://drive.google.com/drive/folders/ ... sp=sharing
tcp_nup_-_cake_no_ecn_32up_totals.png
tcp_nup_-_cake_no_ecn_4up.png
tcp_nup_-_cake_no_ecn_8up.png
tcp_nup_-_cake_no_ecn_16up.png
tcp_nup_-_cake_no_ecn_32up.png
You do not have the required permissions to view the files attached to this post.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Tue Dec 14, 2021 6:03 am

@mducharme thx for tagging along through this enormous thread. I do hope we prove the 7.1 implementation of these algorithms is solid... mikrotik is very late to this party but can benefit from - for example - all the progress made since docsis-pie was standardized ( https://blog.apnic.net/2021/12/02/worki ... -frontier/ ) and a new improved, worldwide focus on improving network latency in the covid era.

Anyway:

htb + fq-codel has been offloaded into one nic we are publicly aware of: https://forum.openwrt.org/t/validating- ... /111123/24 - fq-codel is also in qcomm's proprietary wifi firmware. I am aware of other efforts but can't talk. I can still vent my opinions, though, like: Offloads are fragile. cpus evolve. http://www.taht.net/~d/broadcom_aug9_2018.pdf - get a bunch of cores with decent
cache and no proprietary offloads and see how you do....

It's htb shaping to a non-default line rate that is the expensive part. So I'd like that be the last thing folk were doing rather than the first. :/

As fq-codel is *cheap* in combination with "bql" backpressure is the default across most of linux now, including openwrt, and with "aql" on the wifi. Any place you have a hw to slow transition (like 10gige to 1gige), I'd like to see it on ( https://datatracker.ietf.org/doc/rfc7567/ )

My recommendation has generally been do programmable bql pressure (some intel and mellonox nics do this), or shape how cake does as htb designs get increasingly bursty at modern rates.

There's a p4 version that might make it into a bigfoot derived card and already works on switchs. There's the ebpf stuff preseem does and this middlebox https://github.com/rchac/LibreQoS

I do hope for more hardware, line cards, that can programatically do this right and meet an isp's needs, but as much as I think quality queue management should be top priority, it's still early days from invention to this much deployment. It would be great if it became a killer app, and whoever did it, threw the bufferbloat project some stock, in exchange for QA....

Lastly, I do not know how much wred is deployed anymore. 5 tuple FQ - all by itself - seems to be gaining traction.
Last edited by dtaht on Tue Dec 14, 2021 6:17 am, edited 2 times in total.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Tue Dec 14, 2021 6:05 am

ok, so if you could don a fire retardant suit, re-enable ecn, and retry cake, and if that looks substantially similar, retry fq-codel?
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Tue Dec 14, 2021 6:08 am

p4 codel: https://arxiv.org/pdf/2010.04528.pdf

everyone else working on hardware implementations, kind of went dark earlier this year, and stopped returning my emails, I like to think that's a good sign.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Tue Dec 14, 2021 6:24 am

ok, so if you could don a fire retardant suit, re-enable ecn, and retry cake, and if that looks substantially similar, retry fq-codel?
AHAHAH roger that.. here we go! Umm.. well, the results are different.. o.O

CAKE with ECN=1

Data: https://drive.google.com/drive/folders/ ... sp=sharing
tcp_nup_-_cake_ecn_totals.png
tcp_nup_-_cake_ecn_4up.png
tcp_nup_-_cake_ecn_8up.png
tcp_nup_-_cake_ecn_16up.png
tcp_nup_-_cake_ecn_32up.png
You do not have the required permissions to view the files attached to this post.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Tue Dec 14, 2021 6:30 am

@kevinb361 I was up very late yesterday and will sleep soon. I can live with not knowing ecn works before I wake.:) thx again for going to town on this and making such "interesting" mistakes. It's all data to me, and I think the bug you had on the xanwhatever itwas kernel was rather interesting, as well as the damage seemingly caused by using that iptables rule.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Tue Dec 14, 2021 6:34 am

@kevinb361

the ecn result is very disturbing. But it could be mikrotik (a checksum failure or parsing the wrong bits on this encapsulation, which was a bug that I can't remember when we fixed in some release of linux and cake), the modem, the path, something at linode, where my server is. Anyway, fq-codel without ecn would be a good comparison to validate fq-codel is also implemented correctly, and a repeat of the SFQ test without that iptables rule would hopefully also be sane.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Tue Dec 14, 2021 6:36 am

@kevinb361 I was up very late yesterday and will sleep soon. I can live with not knowing ecn works before I wake.:) thx again for going to town on this and making such "interesting" mistakes. It's all data to me, and I think the bug you had on the xanwhatever itwas kernel was rather interesting, as well as the damage seemingly caused by using that iptables rule.
Right on, I am about to pass out myself. I just uploaded the CAKE+ECN data in the post before this one.. look at it in the morning, so you can get some real sleep! =P HAH, I am just glad today is over with and all those weird bugs I introduced on my own are gone! HAHA

Chat with ya later!
 
mducharme
Trainer
Trainer
Posts: 1740
Joined: Tue Jul 19, 2016 6:45 pm
Location: Vancouver, BC, Canada

Re: some quick comments on configuring cake

Tue Dec 14, 2021 6:45 am

Lastly, I do not know how much wred is deployed anymore. 5 tuple FQ - all by itself - seems to be gaining traction.
From my interactions with engineers working for much bigger ISPs than the one I work for (where queuing in software is possible), wred is still the gold standard for most large providers. My understanding is that everything Cisco and Juniper is wred. It can handle huge bandwidth amounts due to offloading to the ASIC, but is almost certainly much worse than any of the newer AQM solutions. I believe those running Cisco and Juniper have no ability to even consider codel or fq_codel or cake on the service provider side.

I suppose the difference is how much these AQM technologies are really designed to be used on the client side vs the service provider side. If I was daring enough to upgrade our core routers to 7.1 (I am not), what would make the difference - should we do cake queues for each customer at the ISP end, or should we deploy local cake queues to their routers, or should we do it on both ends? And if we should do at both ends, then why?

I ask this because most of the documentation I have seen about cake and fq_codel is about what the client should do rather than what the ISP should do. There is a lot of useful information geared towards the regular end user, but very little geared towards ISPs who want to improve services for their users.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Tue Dec 14, 2021 6:49 am

@kevinb361

the ecn result is very disturbing. But it could be mikrotik (a checksum failure or parsing the wrong bits on this encapsulation, which was a bug that I can't remember when we fixed in some release of linux and cake), the modem, the path, something at linode, where my server is. Anyway, fq-codel without ecn would be a good comparison to validate fq-codel is also implemented correctly, and a repeat of the SFQ test without that iptables rule would hopefully also be sane.
No rest for the wicked! HAHA

OK here is fq_codel without ECN

Data: https://drive.google.com/drive/folders/ ... sp=sharing
tcp_nup_-_fqcodel_no_ecn_totals.png
tcp_nup_-_fqcodel_no_ecn_4up.png
tcp_nup_-_fqcodel_no_ecn_8up.png
tcp_nup_-_fqcodel_no_ecn_16up.png
tcp_nup_-_fqcodel_no_ecn_32up.png
You do not have the required permissions to view the files attached to this post.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Tue Dec 14, 2021 7:00 am

@kevinb361

the ecn result is very disturbing. But it could be mikrotik (a checksum failure or parsing the wrong bits on this encapsulation, which was a bug that I can't remember when we fixed in some release of linux and cake), the modem, the path, something at linode, where my server is. Anyway, fq-codel without ecn would be a good comparison to validate fq-codel is also implemented correctly, and a repeat of the SFQ test without that iptables rule would hopefully also be sane.
Sometimes I wonder if I am a masochist.. HAH ok, last test and then I am gonna go count packets until I pass out! =P

this is sfq without ECN

Data: https://drive.google.com/drive/folders/ ... sp=sharing
tcp_nup_-_sfq_no_ecn_totals.png
tcp_nup_-_sfq_no_ecn_4up.png
tcp_nup_-_sfq_no_ecn_8up.png
tcp_nup_-_sfq_no_ecn_16up.png
tcp_nup_-_sfq_no_ecn_32up.png
You do not have the required permissions to view the files attached to this post.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Tue Dec 14, 2021 8:04 am

since your eye is now "trained" for a fairly short rtt, try fremont.starlink.taht.net or london,singapore, or sydney .starlink.taht.net

we also have tests for these competing against each other, as in the usual case we are not sending flows to a single server.

SFQ will start to underperform at these longer rtts, and I don't honestly know which of cake or fq-codel will win. SFQ is doing really, really well so far, but i suspect it will go to hell on the rrul_be tests, even on the short rtt to dallas, and the way to test multiple sites is via the -H serverA -H serverB -H serverC -H serverD rtt_fair_var , which is also "interesting" on a fifo.

And with that, I really am calling it quits for the day. Very reassuring to see non-ecn work. Some backstory - ECN is not an enabled option for any but a few OSX things, or really advanced linux users, so making it misbehave, on this kernel release (and modem! can't rrul that out) coherently and consistently, rules out a ton of mild "background noise" I've had for about a year now.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

the sparse optimization in fq-codel, cake and fq-pie

Tue Dec 14, 2021 4:54 pm

The FQ component of fq_codel, cake, and fq-pie has what we call the "sparse flow optimization". Request/response (DNS, syn, syn/ack) the first packet of any new flow, acks, voip, gaming, packets, usually "fly through" without observing any queuing at all. In this example we have 32 fat flows, and SFQ would have put the thin flow at the end of that queue - (which is still a LOT better than FIFO and I'd like to use one of those runs on future plots). so in this example, at this 19mbit rate and number of flows, we're consistently saving 3ms of latency and jitter.
consistently_ll.png
While that might seem like a small number, your typical web page might issue 100 dns queries, and 100 syns, and the queuing cost for those, vanishes. Some of that gets amortized by how web pages interleave requests, but not all of it, by far.

Also, because these qdiscs judge "sparseness" by bytes (DRR-like, rather than SFQ-like), not packets, and because the uplink acks are pretty small and sparse also, the queuing cost for much of a web page load time (usually the first 10 round trips per flow) also vanishes. We used to do a demo back in 2013 or so, showing a basic upload workload and how much better web pages behaved with fq-codel in place. (setting up a long saturating workload in flent -l 300 rrul_be - and then a web page benchmarker, demo'd to dan york of the internet society here:

https://circleid.com/posts/20130418_buf ... s_can_be/
To be clear, however, a great deal of the benefit in that particular demo, was in also effectively applying AQM in shortening the queues, and not having that giant fifo. Enormous single queued FIFOs must die I thought then, and now, and the benefits of rfc8290 so obvious that we'd be done in a year.
You do not have the required permissions to view the files attached to this post.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Tue Dec 14, 2021 6:07 pm

since your eye is now "trained" for a fairly short rtt, try fremont.starlink.taht.net or london,singapore, or sydney .starlink.taht.net

we also have tests for these competing against each other, as in the usual case we are not sending flows to a single server.

SFQ will start to underperform at these longer rtts, and I don't honestly know which of cake or fq-codel will win. SFQ is doing really, really well so far, but i suspect it will go to hell on the rrul_be tests, even on the short rtt to dallas, and the way to test multiple sites is via the -H serverA -H serverB -H serverC -H serverD rtt_fair_var , which is also "interesting" on a fifo.

And with that, I really am calling it quits for the day. Very reassuring to see non-ecn work. Some backstory - ECN is not an enabled option for any but a few OSX things, or really advanced linux users, so making it misbehave, on this kernel release (and modem! can't rrul that out) coherently and consistently, rules out a ton of mild "background noise" I've had for about a year now.
OK, going to try and break this down in chunks as best as possible.

This round is rtt_fair_var on cake with dallas, fremont, london, singapore, and sydney

Data: https://drive.google.com/drive/folders/ ... sp=sharing
rtt_fair_var_-_cake_fair_totals.png
rtt_fair_var_-_cake_fair_total_cdf.png
rtt_fair_var_-_cake_fair_4.png
rtt_fair_var_-_cake_fair_8.png
rtt_fair_var_-_cake_fair_16.png
rtt_fair_var_-_cake_fair_32.png
You do not have the required permissions to view the files attached to this post.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: the sparse optimization in fq-codel, cake and fq-pie

Tue Dec 14, 2021 6:22 pm

The FQ component of fq_codel, cake, and fq-pie has what we call the "sparse flow optimization". Request/response (DNS, syn, syn/ack) the first packet of any new flow, acks, voip, gaming, packets, usually "fly through" without observing any queuing at all. In this example we have 32 fat flows, and SFQ would have put the thin flow at the end of that queue - (which is still a LOT better than FIFO and I'd like to use one of those runs on future plots). so in this example, at this 19mbit rate and number of flows, we're consistently saving 3ms of latency and jitter.

consistently_ll.png

While that might seem like a small number, your typical web page might issue 100 dns queries, and 100 syns, and the queuing cost for those, vanishes. Some of that gets amortized by how web pages interleave requests, but not all of it, by far.

Also, because these qdiscs judge "sparseness" by bytes (DRR-like, rather than SFQ-like), not packets, and because the uplink acks are pretty small and sparse also, the queuing cost for much of a web page load time (usually the first 10 round trips per flow) also vanishes. We used to do a demo back in 2013 or so, showing a basic upload workload and how much better web pages behaved with fq-codel in place. (setting up a long saturating workload in flent -l 300 rrul_be - and then a web page benchmarker, demo'd to dan york of the internet society here:

https://circleid.com/posts/20130418_buf ... s_can_be/
To be clear, however, a great deal of the benefit in that particular demo, was in also effectively applying AQM in shortening the queues, and not having that giant fifo. Enormous single queued FIFOs must die I thought then, and now, and the benefits of rfc8290 so obvious that we'd be done in a year.
This reminds me of many years ago when I first got into messing with this stuff. I would have set queue's I believe using RED? I don't remember anymore.. but anyhow, I would give ACK and DNS top priority with a guaranteed bucket size of whatever. I wish I had kept those configs so I could look back and see how I used to do it. Thank you for the knowledge, the pieces are starting to make more sense now. I have a ridiculous number of tabs open now, and am slowly going through them reading ;)

That is a great video, and funny no lie.. the other day I took a screencap with OBS of me just opening tabs and going to different sites after clearing my browser cache and DNS cache to show my son and brother how 'snappy' my internet is.

NOTE here for others, look into pihole for your DNS. Local DNS caching is a huge plus especially like Dave said with typical websites now a days resolving so many domains per website. Plus, it is an excellent network wide ad blocker! I currently run two VM's with pihole, as well as a separate unbound recursive DNS server that they point to.

I have since setup whatever ubiquiti's default simple queue is on their USG.. I believe it is fq_codel? He is amazed, and his facetime video is super clear. He is on a 25/5 cable modem so that made an enormous improvement for them. As he said, I can stream netflix and game at the same time now! hehe

My brother on the other hand, we just put his new router in yesterday.. I am still not totally sure what his bandwidth is provisioned at. We only had a few minutes to play with cake but I believe I got him in a decent ballpark to start. He is around 400/40. His speeds fluctuate greatly, even when setting bandwidth. Now granted, that was all testing with web tools.. so I asked him to spin me up a VM on his server over there so I can run flent..
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Tue Dec 14, 2021 6:34 pm

since your eye is now "trained" for a fairly short rtt, try fremont.starlink.taht.net or london,singapore, or sydney .starlink.taht.net

we also have tests for these competing against each other, as in the usual case we are not sending flows to a single server.

SFQ will start to underperform at these longer rtts, and I don't honestly know which of cake or fq-codel will win. SFQ is doing really, really well so far, but i suspect it will go to hell on the rrul_be tests, even on the short rtt to dallas, and the way to test multiple sites is via the -H serverA -H serverB -H serverC -H serverD rtt_fair_var , which is also "interesting" on a fifo.

And with that, I really am calling it quits for the day. Very reassuring to see non-ecn work. Some backstory - ECN is not an enabled option for any but a few OSX things, or really advanced linux users, so making it misbehave, on this kernel release (and modem! can't rrul that out) coherently and consistently, rules out a ton of mild "background noise" I've had for about a year now.
Here is the next one.. be back in a bit to do fq_codel gotta jump on a call for a few

SFQ rtt_fair

Data: https://drive.google.com/drive/folders/ ... sp=sharing
rtt_fair_var_-_sfq_fair_totals.png
rtt_fair_var_-_sfq_fair_cdf_total.png
rtt_fair_var_-_sfq_fair_4.png
rtt_fair_var_-_sfq_fair_8.png
rtt_fair_var_-_sfq_fair_16.png
rtt_fair_var_-_sfq_fair_32.png
You do not have the required permissions to view the files attached to this post.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Tue Dec 14, 2021 8:59 pm

To restore your eyeball to what the current "real world" looks like for everyone else, try that rtt_fair test with all this fancy schmancy stuff off, just the default fifo on the modem. You situation is different than that 2013 demo in that you have a vastly shorter queue than the 250+ms queue of the cable modems of the time, and the linux tcp stack has also improved greatly (with packet pacing)....

another visual trick is putting those sites in your hosts file so you can just say -H sydney -H singapore etc on the command line instead of sydney.starlink.taht.net so it's more readable.

I should also note that the "starlink" subdomain is just the name of the linux 5.11 kernel cloud I'd created to test starlink stuff, and has nothing to do with starlink (with whom I have a non-relationship presently - amusing story of my encounter with them here: https://www.youtube.com/watch?v=c9gLo6Xrwgw starlink data here: https://docs.google.com/document/d/1puR ... QKblM/edit ). I hope they fix the dishy at some point, and their router...

I have an older cloud named "apple", and an even older one, named "comcast", and I keep them running primarily so I can verify changes in host device drivers and tcp stacks over time.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Wed Dec 15, 2021 12:32 am

To restore your eyeball to what the current "real world" looks like for everyone else, try that rtt_fair test with all this fancy schmancy stuff off, just the default fifo on the modem. You situation is different than that 2013 demo in that you have a vastly shorter queue than the 250+ms queue of the cable modems of the time, and the linux tcp stack has also improved greatly (with packet pacing)....

another visual trick is putting those sites in your hosts file so you can just say -H sydney -H singapore etc on the command line instead of sydney.starlink.taht.net so it's more readable.

I should also note that the "starlink" subdomain is just the name of the linux 5.11 kernel cloud I'd created to test starlink stuff, and has nothing to do with starlink (with whom I have a non-relationship presently - amusing story of my encounter with them here: https://www.youtube.com/watch?v=c9gLo6Xrwgw starlink data here: https://docs.google.com/document/d/1puR ... QKblM/edit ). I hope they fix the dishy at some point, and their router...

I have an older cloud named "apple", and an even older one, named "comcast", and I keep them running primarily so I can verify changes in host device drivers and tcp stacks over time.
OK, finally at it.. been another busy day but so far have made some great progress with cake on my brothers cable modem! 400/40 is what it's real speed appears to be. Also got wireguard setup between us so that I can SSH into a VM on his end to do my testing and rsync the data here to analyze. Will do more later now that the wife and kids are there screwing up my data with all their streaming ;)

Anyhow... here is a test with no queue, whatever the modem is doing..

Data: https://drive.google.com/drive/folders/ ... sp=sharing
rtt_fair_var_-_no_queue_rtt_fair_cdftotals.png
rtt_fair_var_-_no_queue_rtt_fair_totals.png
rtt_fair_var_-_no_queue_rtt_fair_4.png
rtt_fair_var_-_no_queue_rtt_fair_8.png
rtt_fair_var_-_no_queue_rtt_fair_16.png
rtt_fair_var_-_no_queue_rtt_fair_32.png
You do not have the required permissions to view the files attached to this post.
Last edited by kevinb361 on Wed Dec 15, 2021 12:48 am, edited 1 time in total.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Wed Dec 15, 2021 12:35 am

@mducharme I forked your question over here, so it doesn't get lost. viewtopic.php?t=181289

I'm a little busy today, I'll try to get back on it tonight or tomorrow.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Wed Dec 15, 2021 12:46 am

here is the final rtt_fair_var with fq_codel

Data: https://drive.google.com/drive/folders/ ... sp=sharing
rtt_fair_var_-_fqcodel_rtt_fair_totals.png
rtt_fair_var_-_fqcodel_rtt_fair_cdftotal.png
rtt_fair_var_-_fqcodel_rtt_fair_4.png
rtt_fair_var_-_fqcodel_rtt_fair_8.png
rtt_fair_var_-_fqcodel_rtt_fair_16.png
rtt_fair_var_-_fqcodel_rtt_fair_32.png
You do not have the required permissions to view the files attached to this post.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: the sparse optimization in fq-codel, cake and fq-pie

Wed Dec 15, 2021 12:54 am

I have since setup whatever ubiquiti's default simple queue is on their USG.. I believe it is fq_codel? He is amazed, and his facetime video is super clear. He is on a 25/5 cable modem so that made an enormous improvement for them. As he said, I can stream netflix and game at the same time now! hehe

My brother on the other hand, we just put his new router in yesterday.. I am still not totally sure what his bandwidth is provisioned at. We only had a few minutes to play with cake but I believe I got him in a decent ballpark to start. He is around 400/40. His speeds fluctuate greatly, even when setting bandwidth. Now granted, that was all testing with web tools.. so I asked him to spin me up a VM on his server over there so I can run flent..
I ported fq_codel to the edgerouters over a weekend ( https://gettys.wordpress.com/2017/02/02 ... fferbloat/ ). Their userbase lept all over it, wrote the backend configuration language, the gui, and a wizard, then ubnt ultimately adopted in their next version of the OS, calling it "smart queues", in reference to the "smart queue management" spec. (It's since been renamed to esq)

One nice fq-codel thing is that you can run multiple netflix flows at the same time and have them hold at roughly the same rate and with consistent quality in competition with other traffic.

yes, I don't trust web tests very far. thx for adopting flent.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Wed Dec 15, 2021 12:57 am

To verify - presently you have cake 100mbit on the download, and were varying the upload qdisc?

And when you tested "the bare modem" both were off?
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: the sparse optimization in fq-codel, cake and fq-pie

Wed Dec 15, 2021 1:26 am

I have since setup whatever ubiquiti's default simple queue is on their USG.. I believe it is fq_codel? He is amazed, and his facetime video is super clear. He is on a 25/5 cable modem so that made an enormous improvement for them. As he said, I can stream netflix and game at the same time now! hehe

My brother on the other hand, we just put his new router in yesterday.. I am still not totally sure what his bandwidth is provisioned at. We only had a few minutes to play with cake but I believe I got him in a decent ballpark to start. He is around 400/40. His speeds fluctuate greatly, even when setting bandwidth. Now granted, that was all testing with web tools.. so I asked him to spin me up a VM on his server over there so I can run flent..
I ported fq_codel to the edgerouters over a weekend ( https://gettys.wordpress.com/2017/02/02 ... fferbloat/ ). Their userbase lept all over it, wrote the backend configuration language, the gui, and a wizard, then ubnt ultimately adopted in their next version of the OS, calling it "smart queues", in reference to the "smart queue management" spec. (It's since been renamed to esq)

One nice fq-codel thing is that you can run multiple netflix flows at the same time and have them hold at roughly the same rate and with consistent quality in competition with other traffic.

yes, I don't trust web tests very far. thx for adopting flent.
Oh wow! That is awesome! I have an old edgerouter around here somewhere. I think the SD card or whatever in it is corrupt. I found where I can revive it.. but just havn't had the need to. I need to put that on my todo list. I remember seeing somewhere where someone was able to get OpenBSD + pf running on it. That would be pretty neat. I love pf.

He ran the edgerouter for a while until it went tits up, then he replaced it with a USG, but they were not beefy enough to get full speed out of it when using queue's. Dad wanted to see the big numbers ugh.. but now he is just using the factory whatever router from his new fiber ISP. I have not touched it.. but I gave the USG to my son, and he is on such a small pipe, the queue doesn't lower the speed and it is running GREAT now!

I don't trust the web sites either.. but trying to explain the web tests to flent to the family is like pulling teeth. My brother is seeing the difference since he finally gave me a VM to use to test. Now if I can just talk my dad into letting me setup a new router ;)
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Wed Dec 15, 2021 1:29 am

To verify - presently you have cake 100mbit on the download, and were varying the upload qdisc?

And when you tested "the bare modem" both were off?
cake 100 on download, and NOT varying the upload, it was set at 19M

both were off when testing the bare modem
Screenshot from 2021-12-14 17-28-39.png
You do not have the required permissions to view the files attached to this post.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Wed Dec 15, 2021 1:34 am

The download component of your test looks a touch odd to me, I asked above what it was set to.

Also the --te=upload_streams parameter has no function on the rtt_fair tests, they generate one stream per -H server option.

Here's where fq-codel begins to pull ahead of SFQ in a couple respects. Your baseline RTT is about 28ms to dallas, and over 250ms to the furthest server on the list. A design goal of TCP was to have it be ultimately (after running for a while) "fair" to flows of vastly different distances, so that you could transfer data from dallas to fremont, and from dallas to sydney, simultaneously and be sure that you'd have at least some throughput at the longer RTTs. This goal, was actually inherent in why IP took over from novell's IPX, because the IPX folk hadn't thought about this hard enough

It is still "just a goal" that is not ever met, but tends to degrade fairly gracefully, as every TCP paper you read will try and express how they might converge more or less fairly, over time at different round trips.

Nowadays, more and more data is moving to the datacenter closest to you, and in the cable case, perhaps you'd be 12ms away from my server, and in the fiber case, 2ms. With a naive design for TCP/ip the odds are good that that "local-ish" traffic would completely starve out longer distances, and indeed it can be quite unfair to more distant flows. 7 or 8x differences in throughput at 10x RTT differences are fairly common.

But! Sydney is quite possibly still a really needed destination for your traffic, so... what do you do? I'm pretty old fashioned in terms of my aims for low latency and equal throughput... and at every point, although we optimized for RTT relentlessly in the design of fq-codel, we also aimed for ultimate that "some" bandwidth that other flows could get, in codel, maybe better than 1/7x, we didn't know...

Now, with really short fifo queues, and with sfq's really short queues, tcp generally cannot get enough runway to send a BDP's worth of traffic to more distant coasts, so you see the short RTT getting 10mbits of uplink bandwidth here:
rtt_fair_var_-_sfq_dl.png
fq-codel, on the other hand, strives to give "enough" buffering for more distant sites to get a much more nearly fair share of the bandwidth.
rtt_fair_var_-_fqcodel_dl.png
The relentless drive to move CDN resources closer and closer to you is a good thing - shorter RTTs make for more responsive web traffic in particular, but my design goal for fq-codel was
to be able to connect equally to all people, near and far, and their services of all sorts, be it email, or chat, or web or voip, regardless of how distant they were.

And we didn't get 1/7th the bandwidth at 10x the RTT,! we knocked it out of the park, with nearly equal throughput no matter how near, or how far. (TCPs improved also).
You do not have the required permissions to view the files attached to this post.
Last edited by dtaht on Wed Dec 15, 2021 4:28 am, edited 1 time in total.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Wed Dec 15, 2021 1:50 am

The download component of your test looks a touch odd to me, I asked above what it was set to.

Also the --te=upload_streams parameter has no function on the rtt_fair tests, they generate one stream per -H server option.

Here's where fq-codel begins to pull ahead of SFQ in a couple respects. Your baseline RTT is about 28ms to dallas, and over 250ms to the furthest server on the list. A design goal of TCP was to have it be ultimately (after running for a while) "fair" to flows of vastly different distances, so that you could transfer data from dallas to fremont, and from dallas to sydney, simultaneously and be sure that you'd have at least some throughput at the longer RTTs. This goal, was actually inherent in why IP took over from novell's IPX, because the IPX folk hadn't thought about this hard enough

It is still "just a goal" that is not ever met, but tends to degrade fairly gracefully, as every TCP paper you read will try and express how they might converge more or less fairly, over time at different round trips.

Nowadays, more and more data is moving to the datacenter closest to you, and in the cable case, perhaps you'd be 12ms away from my server, and in the fiber case, 2ms. With a naive
design for TCP/ip the odds are good that that "local-ish" traffic would completely starve out longer distances, and indeed it can be quite unfair to more distant flows. 7 or 8x differences in throughput at 10x RTT differences are fairly common.

But! Sydney is quite possibly still a really needed destination for your traffic, so... what do you do? I'm pretty old fashioned in terms of my aims for low latency and equal throughput... and at every point, although we optimized for RTT relentlessly in the design of fq-codel, we also aimed for ultimate that "some" bandwidth that other flows could get, in codel, maybe better than 1/7x, we didn't know...

Now, with really short fifo queues, and with sfq's really short queues, tcp generally cannot get enough runway to send a BDP's worth of traffic to more distant coasts, so you see the short RTT getting 10mbits of uplink bandwidth here:

rtt_fair_var_-_sfq_dl.png

fq-codel, on the other hand, strives to give "enough" buffering for more distant sites to get a much more nearly fair share of the bandwidth.

rtt_fair_var_-_fqcodel_dl.png

The relentless drive to move CDN resources closer and closer to you is a good thing - shorter RTTs make for more responsive web traffic in particular, but my design goal for fq-codel was
to be able to connect equally to all people, near and far, and their services of all sorts, be it email, or chat, or web or voip, regardless of how distant they were.

And we didn't get 1/7th at 10x the RTT,! we knocked it out of the park, with nearly equal throughput no matter how near, or how far. (TCPs improved also).
Outstanding! It is obvious that fq_codel is working as designed! I can agree with you on the 'old fashined' way of thinking. For example, I am a linux sysadmin by trade and I administer a few 100 servers spread across the US. I could see where this would be a relevant argument in the sense that OK, across town I could get 10ms RTT, but in Washington it could be say 200ms RTT... and I would not be happy if I am dropping ssh traffic to washington because I am pushing alot of data to server across town.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Wed Dec 15, 2021 4:25 am

cake on the edgerouter: https://community.ui.com/questions/Cake ... c755cae8a2
cake on the udm pro: https://github.com/fabianishere/udm-kernel

The whole bufferbloat project is full of hackers desperate to have low latency bandwidth and willing to go to extraordinary lengths to get better queue management running. If routerOS had had a devkit available.... :/

Since your brother is up and running, could you try the upload string of fq_codel'd tests on, with ecn enabled? That would rule out parts of that path, and my server, at least.

I think the device he has not capable of much more than 200Mbit inbound shaping, but could be wrong. The udm pro can do about 700. Also, usually I just reflash most ubnt gear to openwrt. The edgerouter X's are nice little boxes in particular, and they seem to have mostly abandoned edgeOS. VyOS is still alive and has long had smart queues in it. I have reflashed much mikrotik gear as well, but I actually rather like routerOS, and have merely been wishing for 6+ years that they'd get the 300 lines of code that fq_codel is, into it and on by default.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Wed Dec 15, 2021 5:11 am

Let me tackle the download portion of the test. :rant: *nobody* for some reason, tests up and downloads and ping simultaneously, as if people just sat there, did an upload, waited, then did a download, and then did a ping. It's a really bothersome aspect of almost all the web tests today. Real traffic, from multiple people and their devices in a household or business is in both directions, all the time. Your network should degrade gracefully when there is traffic up, down, or both at the same time. While the rrul test series is patterned on bittorrent, which once upon a time ruled the world, we stilll didn't test networks for what torrent was really doing to them, in the light of some future world that had way more devices on it, more or less behaving as badly or worse than torrent did. :End of rant: See bofh for more...

Anyway, your provider's network represents a pretty good compromise of packet, not byte limits, on both sides. If you must have a FIFO, Byte fifos are better because acks eat 1/15th the space data does, and so if you have a ton of acks in one direction or another, they crowd out the data packets. Bytes are a rough proxy for time, as it takes the same amount of time to transmit 15 64 byte acks as a 1500 byte data packet. You had about, i don't remember now, 80ms worth of buffering for big packets on the down, and yes, I can do the math for the right packet limit that actually represents with the rrul test results so long as cake's ack-filter is off, pretty accurately, but try to leave that as an exercise for the reader. But anyway, on the down, this time, you have ton of acks from the up, clogging up that queue, and your download is now rate limited to 50Mbits by the upload. (if these packet limits were oversized, your upload would be limited by the download)
noqueue_dl.png
SFQ is pretty similar here, but a bit more biased towards the shorter RTT.
sfq_dn.png
(I'm assuming above you used sfq or noqueue in the inbound shaper0

Please note, that both these behaviors in either case is actually a pretty good thing, in that the user perceptible *latency* is gone, because bytes=time and your download slowed down gracefully, and your up, underbuffered. So... win, right?

Or... you could have a network capable of running at 100Mbit down, 19Mbit up, all the time, with no latency, either:
fqcodel_dl.png
despite this being better, it appears to my eye that you were running out of queue on the down due to the synchronized drops - which could be hitting a limit at the provider or... is there a 1000 packet limit or memory limit? Cake scales this correctly for you on the down, or should. fq-codel we should have ripped out the packet limit long ago....

While this is a good result... 2x better than the default, without that sync'd drop, it too would have ultimately converged nearer to equal bandwidth for all. Cubic is still too aggressive, so it would take a while....
You do not have the required permissions to view the files attached to this post.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Wed Dec 15, 2021 5:57 am

cake on the edgerouter: https://community.ui.com/questions/Cake ... c755cae8a2
cake on the udm pro: https://github.com/fabianishere/udm-kernel

The whole bufferbloat project is full of hackers desperate to have low latency bandwidth and willing to go to extraordinary lengths to get better queue management running. If routerOS had had a devkit available.... :/

Since your brother is up and running, could you try the upload string of fq_codel'd tests on, with ecn enabled? That would rule out parts of that path, and my server, at least.

I think the device he has not capable of much more than 200Mbit inbound shaping, but could be wrong. The udm pro can do about 700. Also, usually I just reflash most ubnt gear to openwrt. The edgerouter X's are nice little boxes in particular, and they seem to have mostly abandoned edgeOS. VyOS is still alive and has long had smart queues in it. I have reflashed much mikrotik gear as well, but I actually rather like routerOS, and have merely been wishing for 6+ years that they'd get the 300 lines of code that fq_codel is, into it and on by default.
Luckily my brothers connection is the one with the Mikrotik RB5009, same router as I am currently running here. The USG is at the boys house which I don't have anything setup as of yet to connect to remotely. Hopefully soon.

Some context on this test.. it is a Mikrotik RB5009 on a cable modem which appears to be a ~400/40 speed from my limited testing so far. I am running these test in an Ubuntu 20.04 VM. It has 4gb ram and 4 cores, so should not have any issues for resources. I am ssh'd into it through a wireguard tunnel, which appears to be using very little bandwidth in total.

Currently I have the upload bandwidth set to 42M (lowest RTT times were at this setting earlier) and download is not limited.

Here are the results of fq_codel with ecn enabled doing a tcp_nup test

Data: https://drive.google.com/drive/folders/ ... sp=sharing
tcp_nup_-_fqcodel_ecn_totals.png
tcp_nup_-_fqcodel_ecn_4up.png
tcp_nup_-_fqcodel_ecn_8up.png
tcp_nup_-_fqcodel_ecn_16up.png
tcp_nup_-_fqcodel_ecn_32up.png
You do not have the required permissions to view the files attached to this post.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Wed Dec 15, 2021 6:09 am

Let me tackle the download portion of the test. :rant: *nobody* for some reason, tests up and downloads and ping simultaneously, as if people just sat there, did an upload, waited, then did a download, and then did a ping. It's a really bothersome aspect of almost all the web tests today. Real traffic, from multiple people and their devices in a household or business is in both directions, all the time. Your network should degrade gracefully when there is traffic up, down, or both at the same time. While the rrul test series is patterned on bittorrent, which once upon a time ruled the world, we stilll didn't test networks for what torrent was really doing to them, in the light of some future world that had way more devices on it, more or less behaving as badly or worse than torrent did. :End of rant: See bofh for more...

Anyway, your provider's network represents a pretty good compromise of packet, not byte limits, on both sides. If you must have a FIFO, Byte fifos are better because acks eat 1/15th the space data does, and so if you have a ton of acks in one direction or another, they crowd out the data packets. Bytes are a rough proxy for time, as it takes the same amount of time to transmit 15 64 byte acks as a 1500 byte data packet. You had about, i don't remember now, 80ms worth of buffering for big packets on the down, and yes, I can do the math for the right packet limit that actually represents with the rrul test results so long as cake's ack-filter is off, pretty accurately, but try to leave that as an exercise for the reader. But anyway, on the down, this time, you have ton of acks from the up, clogging up that queue, and your download is now rate limited to 50Mbits by the upload. (if these packet limits were oversized, your upload would be limited by the download)

noqueue_dl.png

SFQ is pretty similar here, but a bit more biased towards the shorter RTT.

sfq_dn.png

(I'm assuming above you used sfq or noqueue in the inbound shaper0

Please note, that both these behaviors in either case is actually a pretty good thing, in that the user perceptible *latency* is gone, because bytes=time and your download slowed down gracefully, and your up, underbuffered. So... win, right?

Or... you could have a network capable of running at 100Mbit down, 19Mbit up, all the time, with no latency, either:

fqcodel_dl.png

despite this being better, it appears to my eye that you were running out of queue on the down due to the synchronized drops - which could be hitting a limit at the provider or... is there a 1000 packet limit or memory limit? Cake scales this correctly for you on the down, or should. fq-codel we should have ripped out the packet limit long ago....

While this is a good result... 2x better than the default, without that sync'd drop, it too would have ultimately converged nearer to equal bandwidth for all. Cubic is still too aggressive, so it would take a while....
Ahh bofh! I love it! ;) Heading over to The Register, havn't been over there in years, always good for a laugh!

I am not aware of a packet limit, but I am sure there is.. but it could be a function of this:
Screenshot from 2021-12-14 22-00-48.png
There is a NAT table limit in the modem.. again, not truly passthrough mode..

This is definitely a hard limit by the way.. I have hit it a few times years ago before I was running my own recursive DNS servers.. I would run this tool to find the lowest latency public DNS servers. Anyone wanting to generate a ton of traffic.. just run the DNS Benchmark! https://www.grc.com/dns/benchmark.htm had to login to the modem and clear out all the sessions
You do not have the required permissions to view the files attached to this post.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Wed Dec 15, 2021 6:59 am

shouldn't be nat related issue.

In wireshark, to verify if ecn was excerted on an upload, filter on

tcp.flags.ecn == 1

yes, the flag is getting set. but

My wireshark does not appear to show ECN properly on the tcptrace tool. That is not looking particularly healthy on my xplot either.

sacks, resets, cwrs, no ces, none of which show up in the wireshark thing, perhaps my arm box's build of xplot is busted rather than the packets? I'd much rather blame my tools that the router.... grump. i want to go to bed.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Wed Dec 15, 2021 7:00 am

turn off ecn on your brothers link?

I assume you have 2 or more hardware queues on the vm?
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Wed Dec 15, 2021 7:23 am

turn off ecn on your brothers link?

I assume you have 2 or more hardware queues on the vm?
There is only one software queue on that VM, fq_codel.. but who knows what proxmox might be doing.. I know uplink from that server is 10gbit

I have been running some test now that the link is quiet on their end.. I have changed the bandwith limit from 42 down to 35. Seems to be about the best RTT I can get there.. A little side note.. for whatever reason the results seem to be much happier with fq_codel than with cake! I wonder if it is the settings. Using the same settings other than the framing that I am using on my DSL. Docsis seems to be the nicest, but still not near as clean as with fq_codel. I might have been wrong, I think the down link is way more than 400 after all.. Maybe I just need to go sleep lol

Here it is with ECN turned off

Data: https://drive.google.com/drive/folders/ ... sp=sharing
tcp_nup_-_fqcodel_no_ecn_totals.png
tcp_nup_-_fqcodel_no_ecn_4up.png
tcp_nup_-_fqcodel_no_ecn_8up.png
tcp_nup_-_fqcodel_no_ecn_16up.png
tcp_nup_-_fqcodel_no_ecn_32up.png
You do not have the required permissions to view the files attached to this post.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Wed Dec 15, 2021 7:24 am

How much memory does this router have?

And if there's a way to, say, double the packet and memory limits on the fq_codel rtt_fair test on your home machine maybe those sync'd drops would go away. I didn't see those options in the gui... a lot of people patch down the 10000 packet limit and 32MB limit in fq_codel to something that seems saner (and is, on memory limited routers!), so I don't know what the default is for mikrotik.

How cake autoconfigures here in this scenario may also be wrong if that too shows the sync'd drops on that test. If the gui allows upping the memlimit for that, try 8M in the inbound shaper. (cake has no packet limit) Our reasoning for how we did the defaults for the memlimit option was kind of obtuse and based more on fear of running a router out of memory than getting it exactly correct for inbound.

On outbound, a packet is allocated from an appropriately sized slab, so an ack is 64 bytes + 256 bytes overhead, a data packet rounds up to 2k.

On inbound, they are allocated from a fixed size 2k per packet ring, no matter if it's an ack or not, so you waste quite a lot of memory. We do gso-splitting, which will reallocate a gso packet from up to 42 packets all in a bunch back to the "right" size, but only if gro actually gets packets to split. Openwrt also had a hack also that would start re-slabbing packets when it had memory pressure. So, on a heavy inbound ack workload we might end up using 7x more memory each than ideal, or compensated for correctly by the cake autoconfig for the memlimit.

The ecn problem disturbs me more and more.

I've had a long day, going to bed. Very nice hacking with you these past few days.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Wed Dec 15, 2021 7:31 am

Well, that grouped bifurcation shouldn't be happening in that way. fq-codel suffers from the birthday problem where you get a hash collission sqrt(1024), so at 32 flows it's likely you'd see 2 flows colliding and getting different behavior from the rest. Cake uses a 8 way set associatve hash so you don't see that. I am going to go back to a theory that we are not seeing the right offsets into the packet header, thus the hash function is weird, the dscp handling is weird, and the ack-filter is wonky.

Among many other things that have changed since I last looked at this code was linux switched to a sipp hash from a jenkins hash, but I'm more inclined to suspect
an offload, sending stuff from one cpu to another, or something we haven't thunk of yet.

Remember how we started? At least it doesn't crash. And even being OCD in this way, it performs better than what you had before. I have not had a deep dive into this stuff since,
oh, 2017, really. I'm very interested that it hits the field, working the right way, obviously! but I've had it for the day. Have a great one!
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Wed Dec 15, 2021 7:40 am

How much memory does this router have?

And if there's a way to, say, double the packet and memory limits on the fq_codel rtt_fair test on your home machine maybe those sync'd drops would go away. I didn't see those options in the gui... a lot of people patch down the 10000 packet limit and 32MB limit in fq_codel to something that seems saner (and is, on memory limited routers!), so I don't know what the default is for mikrotik.

How cake autoconfigures here in this scenario may also be wrong if that too shows the sync'd drops on that test. If the gui allows upping the memlimit for that, try 8M in the inbound shaper. (cake has no packet limit) Our reasoning for how we did the defaults for the memlimit option was kind of obtuse and based more on fear of running a router out of memory than getting it exactly correct for inbound.

On outbound, a packet is allocated from an appropriately sized slab, so an ack is 64 bytes + 256 bytes overhead, a data packet rounds up to 2k.

On inbound, they are allocated from a fixed size 2k per packet ring, no matter if it's an ack or not, so you waste quite a lot of memory. We do gso-splitting, which will reallocate a gso packet from up to 42 packets all in a bunch back to the "right" size, but only if gro actually gets packets to split. Openwrt also had a hack also that would start re-slabbing packets when it had memory pressure. So, on a heavy inbound ack workload we might end up using 7x more memory each than ideal, or compensated for correctly by the cake autoconfig for the memlimit.

The ecn problem disturbs me more and more.

I've had a long day, going to bed. Very nice hacking with you these past few days.
This router has 1GB of RAM.. I do not see how you can see memory usage either, only CPU usage. There are no memory limits for fq_codel, only for cake. It would be nice if Mikrotik would give access to all available options, as I saw you state at the beginning of this post they do not have the option for gso-splitting either.

Not sure if I even make sense at this point.. going to sleep as well! Thank you again for the education!!
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Wed Dec 15, 2021 7:43 am

Well, that grouped bifurcation shouldn't be happening in that way. fq-codel suffers from the birthday problem where you get a hash collission sqrt(1024), so at 32 flows it's likely you'd see 2 flows colliding and getting different behavior from the rest. Cake uses a 8 way set associatve hash so you don't see that. I am going to go back to a theory that we are not seeing the right offsets into the packet header, thus the hash function is weird, the dscp handling is weird, and the ack-filter is wonky.

Among many other things that have changed since I last looked at this code was linux switched to a sipp hash from a jenkins hash, but I'm more inclined to suspect
an offload, sending stuff from one cpu to another, or something we haven't thunk of yet.

Remember how we started? At least it doesn't crash. And even being OCD in this way, it performs better than what you had before. I have not had a deep dive into this stuff since,
oh, 2017, really. I'm very interested that it hits the field, working the right way, obviously! but I've had it for the day. Have a great one!
Yep, no crashing for sure and honestly for the use case we are splitting hairs at this point. =) The seat of the pants feeling on my internet as well as my brothers is GREAT and SNAPPY!!
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Wed Dec 15, 2021 8:30 pm

thank you so much for sharing your raw flent.gz files and packet captures. So many things in this world cannot be captured by a single number, a summary plot, and while a cdf might hint at a problem, looking at a system's evolution, over time, is always helpful. The explanation for why we saw this bifurcation:
cdfequiv.png
was that there were two *really major* interruptions in service where only that flow kept going.
cdfscanbemisleading.png
Now, as to what the heck could have caused this, I don't know. I flipped through a couple others, it seems likely this doesn't happen all the time... The packet capture is really messy and I'm no longer sure which cap I'm looking at and I have meetings most of today.
You do not have the required permissions to view the files attached to this post.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Wed Dec 15, 2021 9:20 pm

thank you so much for sharing your raw flent.gz files and packet captures. So many things in this world cannot be captured by a single number, a summary plot, and while a cdf might hint at a problem, looking at a system's evolution, over time, is always helpful. The explanation for why we saw this bifurcation:

cdfequiv.png

was that there were two *really major* interruptions in service where only that flow kept going.

cdfscanbemisleading.png

Now, as to what the heck could have caused this, I don't know. I flipped through a couple others, it seems likely this doesn't happen all the time... The packet capture is really messy and I'm no longer sure which cap I'm looking at and I have meetings most of today.
Yeah, I noticed those interruptions or whatever is causing it as well. I am having a heck of a time with this link. I had my brother call spectrum today and verify that his modem is JUST a gateway.. else make sure it is in bridge mode. It is in fact just a gateway. It is a DOCSIS 3.1 modem with a 2.5gb port.. actually sync'd at 2.5 to the router. I forgot the router actually has a 2.5g port.

Anyhow, back on topic.. doing a test on it with no queue at all, it gets ~600 down and ~40 up. What really strikes me as odd is if I use fq_codel or cake, I can only get it to around ~350 tops. I can leave the downstream unlimited, and still never gets past 350. Really odd.

So, with that said, I had him plug his computer in straight to the modem to do a speed test. Granted, it is web based but upload is still ~40 and download is from 720-830 easily twice of what is going through the router.

CPU utilization never goes above 25% which now that I think about it, it is a quad core.. so that would mean it is stressing a single core. HMMM.. maybe that is the limiting factor?
 
jmszuch1
just joined
Posts: 10
Joined: Fri Oct 19, 2018 10:21 pm

Re: some quick comments on configuring cake

Thu Dec 16, 2021 1:11 am


Yeah, I noticed those interruptions or whatever is causing it as well. I am having a heck of a time with this link. I had my brother call spectrum today and verify that his modem is JUST a gateway.. else make sure it is in bridge mode. It is in fact just a gateway. It is a DOCSIS 3.1 modem with a 2.5gb port.. actually sync'd at 2.5 to the router. I forgot the router actually has a 2.5g port.

Anyhow, back on topic.. doing a test on it with no queue at all, it gets ~600 down and ~40 up. What really strikes me as odd is if I use fq_codel or cake, I can only get it to around ~350 tops. I can leave the downstream unlimited, and still never gets past 350. Really odd.

So, with that said, I had him plug his computer in straight to the modem to do a speed test. Granted, it is web based but upload is still ~40 and download is from 720-830 easily twice of what is going through the router.

CPU utilization never goes above 25% which now that I think about it, it is a quad core.. so that would mean it is stressing a single core. HMMM.. maybe that is the limiting factor?
Been following along and don't have too much to add other than it's been very interesting and informative to read this discussion!

Regarding your speed issue, I wonder if it's this issue that was mentioned by another user in a different topic? They have a 5009 and a 2.5Gb modem as well it looks like: viewtopic.php?t=179145#p895221
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Thu Dec 16, 2021 5:02 am


Yeah, I noticed those interruptions or whatever is causing it as well. I am having a heck of a time with this link. I had my brother call spectrum today and verify that his modem is JUST a gateway.. else make sure it is in bridge mode. It is in fact just a gateway. It is a DOCSIS 3.1 modem with a 2.5gb port.. actually sync'd at 2.5 to the router. I forgot the router actually has a 2.5g port.

Anyhow, back on topic.. doing a test on it with no queue at all, it gets ~600 down and ~40 up. What really strikes me as odd is if I use fq_codel or cake, I can only get it to around ~350 tops. I can leave the downstream unlimited, and still never gets past 350. Really odd.

So, with that said, I had him plug his computer in straight to the modem to do a speed test. Granted, it is web based but upload is still ~40 and download is from 720-830 easily twice of what is going through the router.

CPU utilization never goes above 25% which now that I think about it, it is a quad core.. so that would mean it is stressing a single core. HMMM.. maybe that is the limiting factor?
Been following along and don't have too much to add other than it's been very interesting and informative to read this discussion!

Regarding your speed issue, I wonder if it's this issue that was mentioned by another user in a different topic? They have a 5009 and a 2.5Gb modem as well it looks like: viewtopic.php?t=179145#p895221
Thank you for linking that! I was literally just thinking about going to the other computer to login to his router and force it to 1gb. It is interesting that he was using fasttrack as I tried re-enabling that without any change. OK, getting out of the recliner now to go test that out! ;)
 
mducharme
Trainer
Trainer
Posts: 1740
Joined: Tue Jul 19, 2016 6:45 pm
Location: Vancouver, BC, Canada

Re: some quick comments on configuring cake

Thu Dec 16, 2021 5:11 am

It is unfortunately probably quite difficult to debug cake on RouterOS v7 if other bugs are getting in the way. RouterOS v7 works pretty well in the default config for most home users, but there are still lots of bugs that need to be ironed out.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Thu Dec 16, 2021 5:57 pm

Without ack filtering it is extremely difficult to achieve full download speeds at a 15x1 ratio of down to up or worse.

Also rx rings need to be properly sized, as docsis is bursty. A rx ring of 256 is too small. Don't know if you can change that.

i wish more folk were taking packet captures of their network behaviors, using test tools like flent, or at least iperf, rather than web traffic. I also wish I still had my lab setup, and a budget to test this stuff. It's not so much debugging "cake" as suspecting there are other problems in the stack, on this model.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Thu Dec 16, 2021 7:39 pm

fqcodel_dl.png

despite this being better, it appears to my eye that you were running out of queue on the down due to the synchronized drops - which could be hitting a limit at the provider or... is there a 1000 packet limit or memory limit? Cake scales this correctly for you on the down, or should. fq-codel we should have ripped out the packet limit long ago....

While this is a good result... 2x better than the default, without that sync'd drop, it too would have ultimately converged nearer to equal bandwidth for all.
Anyway, as we stagger forward on the less-buggy fronts, a repeat of the rtt_fair test in this scenario would be nice, on handling the down better until the sync'd drops go away. (It still might be having that overall weird interruption of service, too, need more data on that...) fq_codel with increased packet limits and memlimit as one thought, cake besteffort with a memlimit 8M perhaps. Finding a way to increase the size of the rx ring, as another. Reducing the shaped bandwidth from 100Mbit down to something less....
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Thu Dec 16, 2021 8:20 pm



Been following along and don't have too much to add other than it's been very interesting and informative to read this discussion!

Regarding your speed issue, I wonder if it's this issue that was mentioned by another user in a different topic? They have a 5009 and a 2.5Gb modem as well it looks like: viewtopic.php?t=179145#p895221
Thank you for linking that! I was literally just thinking about going to the other computer to login to his router and force it to 1gb. It is interesting that he was using fasttrack as I tried re-enabling that without any change. OK, getting out of the recliner now to go test that out! ;)
OK, well.. after changing the port speed last night it wouldn't come back up. So I had to wait for my brother to reset it this morning. Long story short, it didn't work right away, I had to bounce the interface a few times but finally it was showing ~800mbit raw through the router. After a lot of testing.. I am starting to wonder if fq_codel and cake are actually crashing with such high speeds and I just don't see it on my slower connection.

Watching the router closer today on his end, again which is 1gb down cable.. two out of the four CPU cores would get about 50% loaded under test, and when reaching above 600mbit everything seemed to drop and then come back. Again, I am testing through a VPN on a VM on his side. This was when I was running a bandwidth limit on up and down.

To keep it from causing this behaviour, I had to stick with a bandwidth limit with fq_codel ONLY for it to shape at all without this 'drop out'. If I set a bandwidth limit AT ALL even full 1gig, it would happen. If I even used cake at all even without a download bandwidth limit, it would act the same.

I am not convinced that this is a problem with fq_codel or cake. I have a feeling there is something funky with RouterOS.. but this is just a hunch. Hopefully in the future they will add in the availability to at least see the queue stats for these. It is humming along now with a moderate bandwidth limit on the upload with fq_codel. It is helping keep the latency under control under load at least!
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Thu Dec 16, 2021 8:21 pm

Without ack filtering it is extremely difficult to achieve full download speeds at a 15x1 ratio of down to up or worse.

Also rx rings need to be properly sized, as docsis is bursty. A rx ring of 256 is too small. Don't know if you can change that.

i wish more folk were taking packet captures of their network behaviors, using test tools like flent, or at least iperf, rather than web traffic. I also wish I still had my lab setup, and a budget to test this stuff. It's not so much debugging "cake" as suspecting there are other problems in the stack, on this model.
Watching the port bandwidth graph this morning while testing my brothers 1gb cable.. you can most definitely see the bursts!! I remember now why I hated my old cable modem and love my DSL now. Way less bandwidth, but so much 'cleaner'
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Thu Dec 16, 2021 8:38 pm

Well, I feel pretty confident with my DSL config now. As a recap, it is a 100/20 DSL which is synced at 110/22 per the DSL modem.

After some playing around this morning, the best compromise to my eyes at seems like it likes to have the upstream set at the sync, 22mb. I can lower that, but only gain 0.5ms less latency. I can live with that *shrug* Now, this is with ALOT of testing in previous days with getting the framing right. As dtaht has stated before, getting the framing right on DSL is absolutely correct. It gets wild real fast if that isn't right! With that said, the download bandwidth is set at 94.5% of the sync rate.. so 104mbit

So just as a recap of here is a before and after if you will.. before is FIFO and after is cake:
before.png
after.png
You do not have the required permissions to view the files attached to this post.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Thu Dec 16, 2021 8:47 pm

fqcodel_dl.png

despite this being better, it appears to my eye that you were running out of queue on the down due to the synchronized drops - which could be hitting a limit at the provider or... is there a 1000 packet limit or memory limit? Cake scales this correctly for you on the down, or should. fq-codel we should have ripped out the packet limit long ago....

While this is a good result... 2x better than the default, without that sync'd drop, it too would have ultimately converged nearer to equal bandwidth for all.
Anyway, as we stagger forward on the less-buggy fronts, a repeat of the rtt_fair test in this scenario would be nice, on handling the down better until the sync'd drops go away. (It still might be having that overall weird interruption of service, too, need more data on that...) fq_codel with increased packet limits and memlimit as one thought, cake besteffort with a memlimit 8M perhaps. Finding a way to increase the size of the rx ring, as another. Reducing the shaped bandwidth from 100Mbit down to something less....
Alright, I gotta get back to work for a bit.. and then come back and figured out where we left off ;) More tests coming up in a bit!
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Thu Dec 16, 2021 8:47 pm

looks like perfection to me.

rtt_fair? :P :P :P
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Thu Dec 16, 2021 9:21 pm

looks like perfection to me.

rtt_fair? :P :P :P
OK, work is not bad! WOO! =)

This is with the same settings as last post, with cake.. running rtt_fair_var -- It seems to my untrained eyes it is pretty fair on the upload, but going hog wild on download to dallas!
rtt_fair_var_-_cake_rtt_fair.png
rtt_fair_var_-_cake_rtt_fair-upload.png
rtt_fair_var_-_cake_rtt_fair-download.png
You do not have the required permissions to view the files attached to this post.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Thu Dec 16, 2021 9:40 pm

see how the drops are sync'd on the down? Shouldn't happen. Up the memlimit? or it's the rx-ring. Or gamma radiation from Mars.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Thu Dec 16, 2021 10:01 pm

see how the drops are sync'd on the down? Shouldn't happen. Up the memlimit? or it's the rx-ring. Or gamma radiation from Mars.
BWHAH gamma radiation! Ahh, I did not notice them sync'd. Interesting!

OK, so looking at the config the Memory Limit is 0 aka default. Reading the documentation, it states:
	
Limit the memory consumed by Cake to LIMIT bytes. By default, the limit is calculated based on the bandwidth and RTT settings.
So if my math is correct, which my math skills are not good.. I am going to assume we need to base this on the largest RTT time. which from that test was 250ms for singapore.

Here are the results.. it looks like the syncing on download is still there a bit, but not near as bad. Should I increase it more? Is there a limit other than hardware that you would not want to increase it past a certain point?
rtt_fair_var_-_cake_rtt_fair.png
rtt_fair_var_-_cake_rtt_fair-upload.png
rtt_fair_var_-_cake_rtt_fair-download.png

250 x 100 = 25,000 bytes
You do not have the required permissions to view the files attached to this post.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Thu Dec 16, 2021 10:10 pm

Moah! Moah! 8x more! You have the memory to burn. (when we developed cake, *32MB* of ram in the router was a lot)

I tried to explain the "default" calculation had some overheads in it that didn't make as much sense on inbound shaping as out. I can try to explain that better....

The default of 100ms RTT for fq_codel and cake was shown to scale well to about 280ms in early testing. If you were in a situation where the majority of your RTT was longer than 100ms (say a geosync satellite, or a tropic island (one of our core testers is based on the island of maritus), then you should change that parameter. No need to change it for living in texas. The symptom of the synchronized drop is due to running out of memlimit queue space, not the codel (cobalt) algorithm, as your fremont link is now approaching parity with the dallas link in that last test, *most likely*. We won't get this perfect, but avoiding the synchronized drop has always been a goal ( https://en.wikipedia.org/wiki/TCP_globa ... ronization ).

I hope in the end we'll patch cake to autoscale a bit more correctly here. (But I also worry about undersized rx rings in a lot of new products, not tested against bursty macs like cable, 802.11ac).
Last edited by dtaht on Thu Dec 16, 2021 10:24 pm, edited 1 time in total.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Thu Dec 16, 2021 10:22 pm

Moah! Moah! 8x more! You have the memory to burn. (when we developed cake, *32MB* of ram in the router was a lot)

I tried to explain the "default" calculation had some overheads in it that didn't make as much sense on inbound shaping as out. I can try to explain that better....
Ahhh, so correct me if I am wrong.. I am a little slow on the uptake sometimes.. post lunch time sleepy.. haha so generically speaking, upping the memory limit is like increasing the ring buffer?

OK, and for some clarity.. that last run, I only set the memory on the ingress, on egress. This time, I set it to 200M on both. But from your last reply, it seems to matter more on the egress, correct?
rtt_fair_var_-_cake_rtt_fair.png
rtt_fair_var_-_cake_rtt_fair-download.png
rtt_fair_var_-_cake_rtt_fair-upload.png
You do not have the required permissions to view the files attached to this post.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Thu Dec 16, 2021 10:22 pm

Also, another note.. I see the tails on the CDF plot have also made a huge improvement!
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Thu Dec 16, 2021 10:27 pm

so this last one had 200MB on ingress? Dang. I gotta point at available queue space at the provider, or a limited rx ring, (or that bug with bursty failures) to explain a failure to improve here.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Thu Dec 16, 2021 10:39 pm

so this last one had 200MB on ingress? Dang. I gotta point at available queue space at the provider, or a limited rx ring, (or that bug with bursty failures) to explain a failure to improve here.
Yes sir, 200 on both. Well that is a bummer! I was getting excited! haha but honestly.. the internet is so snappy now.. everything just instantly appears.. almost before I click the button on the mouse! ;)
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Thu Dec 16, 2021 10:42 pm

The behavior of multiple queues in series is kind of complex. Theorists like very much to think about things in terms of a fountain of water, but the real world is batchy in so many respects.

Take packets hitting the rx ring. A batch arrives and the ring was nearly full in the first place. A whole bunch of packets (from all sources) get dropped. The cpu arrives to "clean" the rx ring, never sees that, and then tosses the result into the aqm which then tries to fair queue and intelligently drop if it too is overloaded, hopefully desynchronized drops that "fill in" the spaces within the other competing sawtooths. But they end up pretty synchronized when the rx ring overflows and thus the closest hop retains the most bandwidth, as tcp's defined response to multiple drops within a single RTT is to drop the rate, once. (We are now in TCP/IP 401 classes, rather than my usual 101)

The Cubic tcp algorithm only drops the rate by 30% ( which I've long disagreed with ) and then works towards recovering using a cubic function (which is clever), tcp reno uses 50% and climbs back additively, which means other flows can grab more bandwidth faster, but a reno flow gets less bandwidth. (I think you can tell flent to use another algo via --te=cc_algo=reno,reno or --te=CC=reno,reno but I'd have to re-read the codebase). BBR's methods are very different, as you saw. I don't think I have BBR enabled on all the servers under test, I'd have to check.

This scenario is even worse than that in that the ISP has a buffer at their end, the modem, also, and either one of those unable to absorb a burst will drop packets.

Over the last 20 years, the internet got redesigned for speedtest.net, with everyone testing X flows at a time, up, then, down, then ping, all to the same server.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Thu Dec 16, 2021 10:43 pm

I imagine routerOS has no way to see or increase the rx ring? Linux uses "ethtool" to see that.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Thu Dec 16, 2021 10:58 pm

squinting, in both cases it did get better for all but the furthest distance (which is kind of expected) If you started the dallas flow last, they'd converge quicker, or if you ran the test longer (-l 300).

So anyway, I'm pretty sure how we calculate the default for inbound shaping to be wrong, some value larger than the default helps, and will try to come up with something better in the future. Thx for exploring that.

For long distances especially, having ECN on helps, (I've said elsewhere on this thread that given its experimental nature, I use it as a debugging tool - any drops on the test are coming from somewhere else on the path) except we seemed to have a problem there, both on your brothers setup and yours. You got a different mikrotik product?
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Thu Dec 16, 2021 11:16 pm

I imagine routerOS has no way to see or increase the rx ring? Linux uses "ethtool" to see that.
OK, it does not appear from digging through the interface or through the documentation that I can change the ring buffer. Now, if I was running a CHR I could.. since it is RouterOS running on top of linux. This might be something for me to check out in the future. I have a spare SFP+ port on my server, I should be able to pass that through to a VM and run CHR.

Anyhow, back on topic.. the closest I could find was here in the docs: https://wiki.mikrotik.com/wiki/Manual:Queue

Specifically this:

only-hardware-queue leaves interface with only hw transmit descriptor ring buffer which acts as a queue in itself. Usually at least 100 packets can be queued for transmit in transmit descriptor ring buffer. Transmit descriptor ring buffer size and the amount of packets that can be queued in it varies for different types of ethernet MACs.

Having no software queue is especially beneficial on SMP systems because it removes the requirement to synchronize access to it from different cpus/cores which is expensive.


multi-queue-ethernet-default can be beneficial on SMP systems with ethernet interfaces that have support for multiple transmit queues and have a linux driver support for multiple transmit queues. By having one software queue for each hardware queue there might be less time spent for synchronizing access to them.

So here is the test again with multi-queue-ethernet selected for ether1, going to the modem
rtt_fair_var_-_cake_rtt_fair-ethernet_default_hw_queue.png
rtt_fair_var_-_cake_rtt_fair-ethernet_default_hw_queue-download.png
rtt_fair_var_-_cake_rtt_fair-ethernet_default_hw_queue-upload.png
You do not have the required permissions to view the files attached to this post.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Thu Dec 16, 2021 11:24 pm

The behavior of multiple queues in series is kind of complex. Theorists like very much to think about things in terms of a fountain of water, but the real world is batchy in so many respects.

Take packets hitting the rx ring. A batch arrives and the ring was nearly full in the first place. A whole bunch of packets (from all sources) get dropped. The cpu arrives to "clean" the rx ring, never sees that, and then tosses the result into the aqm which then tries to fair queue and intelligently drop if it too is overloaded, hopefully desynchronized drops that "fill in" the spaces within the other competing sawtooths. But they end up pretty synchronized when the rx ring overflows and thus the closest hop retains the most bandwidth, as tcp's defined response to multiple drops within a single RTT is to drop the rate, once. (We are now in TCP/IP 401 classes, rather than my usual 101)

The Cubic tcp algorithm only drops the rate by 30% ( which I've long disagreed with ) and then works towards recovering using a cubic function (which is clever), tcp reno uses 50% and climbs back additively, which means other flows can grab more bandwidth faster, but a reno flow gets less bandwidth. (I think you can tell flent to use another algo via --te=cc_algo=reno,reno or --te=CC=reno,reno but I'd have to re-read the codebase). BBR's methods are very different, as you saw. I don't think I have BBR enabled on all the servers under test, I'd have to check.

This scenario is even worse than that in that the ISP has a buffer at their end, the modem, also, and either one of those unable to absorb a burst will drop packets.

Over the last 20 years, the internet got redesigned for speedtest.net, with everyone testing X flows at a time, up, then, down, then ping, all to the same server.
Coming back to this post.. I never knew cubic drops 30%, heck I never looked into it.. I always assumed it was 50% I guess when I learned about it at the time, I must have been using reno and reading about that?

I thought I had seen something in the docs about the algo's as flags.. but maybe it was documents for another tool.. I just looked through the man page and dont see anything
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Thu Dec 16, 2021 11:31 pm

OK, I forgot to mention.. this morning I had done some testing with ECN again.. with it set to 1 on the host, the upload was a big sync'd wave.. crazy looking.. going from 2-4mb in one big wave on the RRUL test

However, with it set to 2, it seemed normal. Again, atleast the RRUL test. I just set it back to 2 again and also set dallas and fremont last.. here are the results..
rtt_fair_var_-_cake_rtt_fair-ethernet_default_hw_queue.png
rtt_fair_var_-_cake_rtt_fair-ethernet_default_hw_queue-download.png
rtt_fair_var_-_cake_rtt_fair-ethernet_default_hw_queue-upload.png
You do not have the required permissions to view the files attached to this post.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Thu Dec 16, 2021 11:34 pm

Oh, and I do have another router.. it is a CCR1009. The one I was running at the very beginning before we started doing science. It is a 9 core 1.2Ghz TILE processor with 2GB RAM.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Fri Dec 17, 2021 12:55 am

QUICK NOTE!! I went back and added memory to my brothers router.. and it runs cake without crashing or whatever it was doing.. WITH bandwidth on ingress AND egress!

My gut feeling now is that the default memory needs to be increased.. I just didn't see it crashing on my end.. maybe because I am at 100 down and his is 1g down?
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Fri Dec 17, 2021 1:17 am

Another note I have noticed throughout my testing on my own DSL, and my brothers cable.. but definently more so on his.. I also see this 'crash' or whatever it is when you set the bandwidth limit TOO LOW on the egress!!
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Fri Dec 17, 2021 1:23 am

Another note.. not sure the interface queue change made any difference for the ring queue, BUT.. it APPEARS that it does in fact allow for better threading across the cores!
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Fri Dec 17, 2021 1:29 am

don't celebrate too soon. Luck counts, and there still may be an obscure bug... :) And you mean cake memlimit or physical memory?

Is the ack-filter on on brother's egress? Again, given my still held doubts on having the offsets right for dscp, ecn, and that, having it on may do bad things, but it's very useful on asymmetric connections if working. https://blog.cerowrt.org/post/ack_filtering/

Lastly you posted nice plots saying "default" when I think you meant the hw multi-queue? It was good to see it converge at t+40. Yes you really do want to spread more load across cores if possible,
so the rings are drained or filled in smaller bursts more often.

I don't know if the doc is out of date or not, but ideally there's a new subsystem called BQL in play now moderating the tx ring: https://lwn.net/Articles/469652/ - bql was the core tech that made it possible to run fq-codel at line rate with very minimal overhead (compared to shaping), works well across cores,
Last edited by dtaht on Fri Dec 17, 2021 1:36 am, edited 1 time in total.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Fri Dec 17, 2021 1:36 am

don't celebrate too soon. :) And you mean cake memlimit or physical memory?

Is the ack-filter on on egress? Again, given my still held doubts on having the offsets right for dscp, ecn, and that, having it on may do bad things, but it's very useful on asymmetric connections if working. https://blog.cerowrt.org/post/ack_filtering/

Lastly you posted nice plots saying "default" when I think you meant the hw multi-queue? It was good to see it converge at t+40. Yes you really do want to spread more load across cores if possible.
Oh, definitely not celebrating yet! Just happy to start getting some fairly consistant results and I move knobs one way or the other! Yes, I need to slow down, I meant to say cake memlimit!

Ack filter set to filter on egress. Ack, the hardware-multi-queue as in the following set as the physical 'interface queue'

multi-queue-ethernet-default can be beneficial on SMP systems with ethernet interfaces that have support for multiple transmit queues and have a linux driver support for multiple transmit queues. By having one software queue for each hardware queue there might be less time spent for synchronizing access to them.

I need to look at core usage on his while I test, but I noticed on mine after enabling that, it went from ~50% on two cores to almost three cores at %50 and a little on the fourth
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Fri Dec 17, 2021 6:34 am

ecn = 0 = do not accept or initiate ecn negotiation
ecn = 1 = accept and initiate ecn neg
ecn = 2 = accept ecn neg, but do no initiate

The default for much of the internet is "2" (except google, which wants to change the definition of ecn entirely).
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Fri Dec 17, 2021 5:31 pm

I'd appreciate another capture from your brothers box, of rtt_fair, blowing up, with ecn enabled.

also it's easier to look at this stuff in tcptrace/xplot if you just capture those flows.

tcpdump -i the_interface -s 128 -w the_capture host dallas or host sydney or host ...

thx!
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Fri Dec 17, 2021 6:21 pm

I'd appreciate another capture from your brothers box, of rtt_fair, blowing up, with ecn enabled.

also it's easier to look at this stuff in tcptrace/xplot if you just capture those flows.

tcpdump -i the_interface -s 128 -w the_capture host dallas or host sydney or host ...

thx!
Can do! Do you want me to use fq_codel or cake?
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Fri Dec 17, 2021 6:24 pm

cake. also follow with ecn off without resetting the qdisc, to make sure it's not permanently driven wonky? If it's permanently driven wonky, that's almost a CVE, and I've had enough of those this week.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Fri Dec 17, 2021 7:09 pm

OK, I had to redo that test because I messed up. Here is the test

rtt_fair_var with cake and ECN on and then off

https://drive.google.com/drive/folders/ ... sp=sharing
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Fri Dec 17, 2021 7:38 pm

Thx again for helping. Trying to decide on fleeing to mexico or not. Ok, please reload the qdisc(s), leave ecn off, and try again?

Were you seeing these lumps before?
lumps.png
You got no throughput from dallas with ecn.
nodnfromdallas.png
You do not have the required permissions to view the files attached to this post.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Fri Dec 17, 2021 8:00 pm

Thx again for helping. Trying to decide on fleeing to mexico or not. Ok, please reload the qdisc(s), leave ecn off, and try again?

Were you seeing these lumps before?

lumps.png

You got no throughput from dallas with ecn.

nodnfromdallas.png
No problem! Mexico would be nice this time of year! I have some friends that live along the border, I should go visit for a BBQ and a few cervesas!

To be honest, the results vary but I would hope that is due to the network not being totally quiet. He also works from home so I do see some lumps here and there and assumed that was why. I might have to run these tests at night after the network is clear of local traffic. I noticed that with dallas. That is really odd! Maybe something between him and dallas and not the others because of different routes?

Data: https://drive.google.com/drive/folders/ ... sp=sharing
rtt_fair_var_-_cake_no_ecn.png
You do not have the required permissions to view the files attached to this post.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Fri Dec 17, 2021 8:12 pm

Well, you shouldn't see that long term growth pattern either. This is after you tuned up the multipath tx/rx thing? What happens with bandwidth down less 20Mbit?

Anyway, thx again. I'm packing up for a trip south, (not to mexico! trying to get closer to the spacex launch), and can't look at this harder today.

When the network is more idle a reboot, putting in--step-size=0.05 -l 300 with ecn off, with cake, with fq_codel, but ya know, feel free to stop fixing the internet with me, and spend time with family, or shopping?
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Fri Dec 17, 2021 8:36 pm

Well, you shouldn't see that long term growth pattern either. This is after you tuned up the multipath tx/rx thing? What happens with bandwidth down less 20Mbit?

Anyway, thx again. I'm packing up for a trip south, (not to mexico! trying to get closer to the spacex launch), and can't look at this harder today.

When the network is more idle a reboot, putting in--step-size=0.05 -l 300 with ecn off, with cake, with fq_codel, but ya know, feel free to stop fixing the internet with me, and spend time with family, or shopping?
AHHA!! I see what ya did there. It didn't grow this time. I set it to 15mbit on upload.. I had been assuming all along that since wide open it gets 40mbit upload on his link, I was working around that range. My assumption is that I am used to the feel of my DSL setup.. and cable does things diferent? No science behind that statement but that looks way better than anything at 40mbit or even a high percentage of that. Now I have more testing to figure out where the happy place is on the upload bandwidth!

Oh sweet, spacex launch! I have always though it would be cool to go out with like a 600mm lens, probably even needing a tele-converter along with that.. and try to get some cool pictures! (I used to do a lot of drag racing photography years ago) No worries, go relax and decompress!

I will hopefully get some good testing in tonight. The ol lady has left this morning to the casino for the weekend so it's just me and the dogs! If I am really lucky, my brother will be out of town this weekend too, I will have to ask ;) Don't worry about me, I am just a nerd and love this stuff. Something new to learn! It gets me away from the normal grind, and I will be out riding the motorcycle later today and tomorrow with the brothers.. so that will be my break from the internet! ;)

Anyhow.. to the data!

Data: https://drive.google.com/drive/folders/ ... sp=sharing
rtt_fair_var_-_cake_no_ecn.png
You do not have the required permissions to view the files attached to this post.
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Fri Dec 17, 2021 8:48 pm

Oh yeah, and yes this is with the multi-queue-ethernet setting on the interface
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Sat Dec 18, 2021 12:37 am

In an effort to take the human error out of my testing, and automation.. I am making some ansible playbooks, and then will work on making flent batch files to use these ansible playbooks to make appropriate config changes before/after each test as needed.

Dunno if I am gonna have time to get it all done for testing tonight.. but I hope so.. it would be nice to be able to kick off a whole round of tests in the middle of the night while sleeping ;)
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Sat Dec 18, 2021 1:29 am

Woohoo! I got it all working as expected! =)

Now I just need to set it up on my brothers end, and just add a cronjob to kick off the test in the middle of the night. But, I gotta run out for a few hours.. so will set that all up later.. hah I guess I will be the cron job at that point! =P
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Mon Dec 20, 2021 4:14 pm

it looks like mikrotik has lost some data.
 
User avatar
Larsa
Member
Member
Posts: 422
Joined: Sat Aug 29, 2015 7:40 pm

Re: some quick comments on configuring cake

Mon Dec 20, 2021 4:32 pm

Yeah, if you meant this forum it went down the other night. I saw some other threads where they complained about the same thing so I guess Mikrotik lacks a complete backup...
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Mon Dec 20, 2021 6:39 pm

it looks like mikrotik has lost some data.
Yep, they went down for most of the day yesterday.. it appears they restored a backup of the forum =(
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Tue Dec 21, 2021 7:04 pm

Just saw this in the changelog for the newly release 7.2rc1:

*) queue - improved system stability when processing traffic;
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Tue Dec 21, 2021 9:23 pm

Well, interestingly.. we never upgraded the switch at his place.. and there was a ton of updates in the changelogs.. it was v6.47 or something like that.. anyhow.. all the way up to 7.2rc1 on switch and router.

I dunno.. but not too shaby for a quick mid day test.. didn't have much time to get testing in during lunch while he was away. Download looks like a flat table when I clamp the bandwidth down.

This image is setting bandwidth to 40M on upload and uncapped on download.. which is what he is supposed to be getting.. was quite surprised actually! Not to mention.. are my eyes lying to me, it has better ping under load??
rrul_-_40M.png
You do not have the required permissions to view the files attached to this post.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Wed Dec 22, 2021 5:11 am

Looks real to me. There are many possible interactions with the cmts, powersave and downstream shaper.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Thu Dec 23, 2021 8:21 pm

Anyway, since we lost data, and I don't remember what it was, It would be good to post a summary of what the actual mikrotik configurations ended up being. The journey was educational for us, but I imagine to the outside observer, kind of frightening.

thx, and merry christmas! I may not be online a whole lot in the coming days.
 
blurrybird
newbie
Posts: 35
Joined: Sun Jan 19, 2020 12:25 pm

Re: some quick comments on configuring cake

Wed Dec 29, 2021 4:25 am

Would those with a working CAKE configuration please share their configurations within the mikrotik itself? This thread is amazing, but there are so many dials to tune and so much in-depth discussion going on that someone coming across it for the first time would need to spend hours reading everything in series to get context.

Specifically, I have a 100/40 VDSL2+ connection (Australian NBN - FTTN) that I'm looking to tune. I am comfortable running Flent but haven't gone on-site yet. I'd love to see a starting config or one from a similar user's use case. Just run:
/queue export compact
At the moment, I'm just running this without much thought:
# dec/29/2021 13:24:14 by RouterOS 7.1.1
# ...
# model = RBD52G-5HacD2HnD
/queue type
add kind=fq-codel name=fq_codel
/queue simple
add bucket-size=0.005/0.005 max-limit=100M/40M name=internal_qos queue=fq_codel/fq_codel target=ether1 total-queue=fq_codel
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Wed Dec 29, 2021 6:19 am

Apologies for the late reply.. I am still tinkering with my brothers setup. Without any queue.. I can get a gig download.. but any kind of queue, it drops in half.. I thought it was the 2.5g port per other peoples responses.. which in fact even without a queue, I cannot get over roughly 500mbit down on it.. however I can on a 1gb port. Weird. Anyhow.. I will post my working config for my DSL..

I have 100/20 VDSL2.. and this setup has been working like a dream! (The speeds are set to the sync rate in the modem 104/22.. YMMV)
/queue type
add cake-atm=ptm cake-diffserv=besteffort cake-mpu=88 cake-overhead=40 kind=cake name=cake-default
add cake-ack-filter=filter cake-atm=ptm cake-bandwidth=22.0Mbps cake-diffserv=besteffort cake-mpu=88 cake-nat=yes cake-overhead=40 kind=cake name=cake-up
add cake-atm=ptm cake-bandwidth=104.0Mbps cake-diffserv=besteffort cake-mpu=88 cake-nat=yes cake-overhead=40 cake-wash=yes kind=cake name=cake-down

/queue simple
add bucket-size=0.001/0.001 name=cake queue=cake-down/cake-up target=ether1-WAN total-queue=cake-default
 
blurrybird
newbie
Posts: 35
Joined: Sun Jan 19, 2020 12:25 pm

Re: some quick comments on configuring cake

Wed Dec 29, 2021 8:30 am

I have 100/20 VDSL2.. and this setup has been working like a dream! (The speeds are set to the sync rate in the modem 104/22.. YMMV)
Thanks mate - not a slow reply at all! Mine is syncing 106/41 so I'll throw that in there for now, go for a few days, then see how it is.
 
blurrybird
newbie
Posts: 35
Joined: Sun Jan 19, 2020 12:25 pm

Re: some quick comments on configuring cake

Fri Dec 31, 2021 3:38 am

RB5009 arrived. Here's some brief testing of cake.
ISP: Aussie Broadband
Technology: Fibre To The Premise (FTTP)
Down/Up: 1000M/50M
/queue type
add cake-diffserv=besteffort cake-nat=yes kind=cake name=cake-default
add cake-ack-filter=filter cake-bandwidth=45.0Mbps cake-diffserv=besteffort cake-nat=yes kind=cake name=cake-up
add cake-bandwidth=950.0Mbps cake-diffserv=besteffort cake-nat=yes cake-wash=yes kind=cake name=cake-down
/queue simple
add bucket-size=0.001/0.001 name=cake queue=cake-down/cake-up target=ether1 total-queue=cake-default
Waveform Results:
Before: https://www.waveform.com/tools/bufferbl ... 7575df8878
After: https://www.waveform.com/tools/bufferbl ... db297e528f
 
ivicask
Member
Member
Posts: 344
Joined: Tue Jul 07, 2015 2:40 pm
Location: Croatia, Zagreb

Re: some quick comments on configuring cake

Fri Dec 31, 2021 8:31 am

RB5009 arrived. Here's some brief testing of cake.
ISP: Aussie Broadband
Technology: Fibre To The Premise (FTTP)
Down/Up: 1000M/50M
/queue type
add cake-diffserv=besteffort cake-nat=yes kind=cake name=cake-default
add cake-ack-filter=filter cake-bandwidth=45.0Mbps cake-diffserv=besteffort cake-nat=yes kind=cake name=cake-up
add cake-bandwidth=950.0Mbps cake-diffserv=besteffort cake-nat=yes cake-wash=yes kind=cake name=cake-down
/queue simple
add bucket-size=0.001/0.001 name=cake queue=cake-down/cake-up target=ether1 total-queue=cake-default
Waveform Results:
Before: https://www.waveform.com/tools/bufferbl ... 7575df8878
After: https://www.waveform.com/tools/bufferbl ... db297e528f
Funny thing is you will get same results with any queue type, try sfq for example instead cake..
 
kevinb361
Frequent Visitor
Frequent Visitor
Posts: 82
Joined: Wed Jul 01, 2020 5:02 am

Re: some quick comments on configuring cake

Fri Dec 31, 2021 11:35 pm

RB5009 arrived. Here's some brief testing of cake.
ISP: Aussie Broadband
Technology: Fibre To The Premise (FTTP)
Down/Up: 1000M/50M
/queue type
add cake-diffserv=besteffort cake-nat=yes kind=cake name=cake-default
add cake-ack-filter=filter cake-bandwidth=45.0Mbps cake-diffserv=besteffort cake-nat=yes kind=cake name=cake-up
add cake-bandwidth=950.0Mbps cake-diffserv=besteffort cake-nat=yes cake-wash=yes kind=cake name=cake-down
/queue simple
add bucket-size=0.001/0.001 name=cake queue=cake-down/cake-up target=ether1 total-queue=cake-default
Waveform Results:
Before: https://www.waveform.com/tools/bufferbl ... 7575df8878
After: https://www.waveform.com/tools/bufferbl ... db297e528f
Nice! Now run some flent tests! ;)

To see the differences between SFQ and Cake, flent will show you.. ;)
 
blurrybird
newbie
Posts: 35
Joined: Sun Jan 19, 2020 12:25 pm

Re: some quick comments on configuring cake

Sat Jan 01, 2022 8:42 am


Nice! Now run some flent tests! ;)
Thanks for forcing me to do this - perhaps I'm going back to the drawing board. Cake seems to make my upload nice and consistent, but download and latency is still all over the shop.

Taking suggestions on where to go from here!
You do not have the required permissions to view the files attached to this post.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Sat Jan 01, 2022 5:28 pm

My guess is you are thoroughly out of CPU on the download, not being able to crack 400Mbit. So I would suggest applying cake with the ack-filter - to the upload only, at say, 40Mbit, to start with. Cake with the right encapsulation options can get very close to the rated rate (say, 48) on the uplink, but not on the downlink, and

I encourage folk to start with a number that is 85% of the rated bandwidth first, not something that is hard up against the ISP claimed rate. The goal is to take away the ISP's control of the queue. I would also expect any provider doing gbit/50mbit to also have an ack-filter in place themselves. It is nearly impossible to get a 20x1 ratio like that and even half your rated up with an asymmetric link like that due to acks filling up the uplink.

Some of the variability you see even on the baseline test could be due to a failure to keep up. Regrettably very few vendors actually test
full rate up and downloads at the same time. If you post the flent.gz files I can poke harder.

The cake result on the rrul test, where you are getting more on the up is due to having not got full bandwidth on the down, leaving more room for data rather than acks.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Sat Jan 01, 2022 5:37 pm

@blurrybird, also I thought you were going to test a 100Mbit link, not a gbit one?

I am pretty sure cake has a role in a gbit/50mbit scenario on just the uplink, but it has historically required good x86 hardware to inbound shape the down at a gbit.
 
blurrybird
newbie
Posts: 35
Joined: Sun Jan 19, 2020 12:25 pm

Re: some quick comments on configuring cake

Sun Jan 02, 2022 2:07 am

My guess is you are thoroughly out of CPU on the download, not being able to crack 400Mbit.
What is strange is that the resource monitor in the router would suggest it's perfectly fine doing this (40-60% util on all cores), but the numbers don't lie. You're correct I was going to test on a 100/40 link (my in-laws'), but my new RB5009 arrived at my own house and so I wanted to see what it could do on 1000/50 as well. The experience I gain from this exercise will give me the ability to set it up on the in-laws' later on.

Per your advice I have changed it to the following config with these new results (note I was already doing ack-filter on the upload).

I've attached the flent.gz files for both the original baseline/cake runs from yesterday, as well as this morning's set. Really appreciate the help!

I'm worried that I'm configuring this incorrectly within the Mikrotik UI (some say to use interface queues?) - but even that is probably worthwhile feedback for the next person if it ends up being the case.
# Enable fasttrack-connection only on inbound = WAN to exclude download from SQM
/ip firewall filter
add action=fasttrack-connection chain=forward comment="defconf: fasttrack" connection-state=established,related hw-offload=yes in-interface-list=WAN
/queue type
add cake-diffserv=besteffort cake-nat=yes kind=cake name=cake-default
add cake-ack-filter=filter cake-bandwidth=40.0Mbps cake-diffserv=besteffort cake-nat=yes kind=cake name=cake-up
add cake-bandwidth=1000.0Mbps cake-diffserv=besteffort cake-nat=yes cake-wash=yes kind=cake name=cake-down
/queue simple
add bucket-size=0.001/0.001 name=cake queue=cake-down/cake-up target=ether1 total-queue=cake-default
You do not have the required permissions to view the files attached to this post.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Sun Jan 02, 2022 3:31 am

right now I do not trust mikrotik's treatment of the diffserv bits. could you kill the wash option and use rrul_be? No way should that download been able to run away like that.

I guess I should get a mikrotik box myself and experiment? I haven't used it in years. I'm very interested in the many core - octeon - versions.

It's not clear to me you actually managed to disable cake on the down.

I have no idea what this does - what does the bucket-size thing do?

add bucket-size=0.001/0.001 name=cake queue=cake-down/cake-up target=ether1 total-queue=cake-default

Your baseline really is pretty good. Like another gig/50mbit test it strikes me you have pretty short (and packet fifo) buffers there already. Other simpler tests like the tcp_nup test and tcp_ndown test would

As much as I LOVE the rrul (and hate speedtest for single number summaries), and it's easy to determine what a good rrul result looks like, when things go south - like so far me not being able to trust the diffserv OR ecn handling as yet - it's pretty hard to debug without packet captures and actual statistics from cake. Also the server I have in sydney may well be struggling itself at these speeds. I can setup another.
 
eider
newbie
Posts: 32
Joined: Thu Nov 30, 2017 10:14 pm

Re: some quick comments on configuring cake

Sun Jan 02, 2022 4:08 am

Have you tried doing test while setting cake-bandwidth to unlimited? I believe that RouterOS sets all qdiscs with CAKE as default egress, which doesn't work so well for ingress at these speeds.

Specifically, I believe that what it is doing is:
tc qdisc add dev ether1 root cake 1000Mbps besteffort nat
While it should be doing:
tc qdisc add dev ether1 root cake 1000Mbps besteffort nat ingress
Hence, my proposal to verify it is to set bandwith to unlimited and see how it behaves then.

---
I have no idea what this does - what does the bucket-size thing do?

add bucket-size=0.001/0.001 name=cake queue=cake-down/cake-up target=ether1 total-queue=cake-default
https://wiki.mikrotik.com/wiki/Manual:H ... _Algorithm
In short terms he tries to make sure that no packets go through queue unrestricted.
 
blurrybird
newbie
Posts: 35
Joined: Sun Jan 19, 2020 12:25 pm

Re: some quick comments on configuring cake

Sun Jan 02, 2022 4:20 am

New config with wash disabled created almost an identical graph.

I don't know much about the bucket size flag but perhaps this could be my problem? It looks like the mikrotik 'simple queue' incorporates HTB + whatever algo you pick?
See this thread I found on the topic: viewtopic.php?t=108292

I think this comes back to me probably mis-understanding how to actually implement cake inside the router.

In my experimentation, I can only ever observe the rate limiting occurring correctly when cake is set as an interface queue but there's no way to set an asymmetric bandwidth limit in that configuration. Someone else said the tik implementation hardcodes 'egress' in the cake parameters and that's in line with what I see - any limit I set only applies to the upload direction, and only when configured as an interface queue rather than as part of a simple queue.

Would love to hear from someone at mikrotik about this!
 
eider
newbie
Posts: 32
Joined: Thu Nov 30, 2017 10:14 pm

Re: some quick comments on configuring cake

Sun Jan 02, 2022 4:32 am

In my experimentation, I can only ever observe the rate limiting occurring correctly when cake is set as an interface queue but there's no way to set an asymmetric bandwidth limit in that configuration
I see no such issue, I can set my CAKE queue to cake-bandwidth=50M, set that queue type as download on simple queue and it works properly, limiting speed to 50M.

See this for my configuration if you need reference:
/queue type
add cake-diffserv=besteffort cake-flowmode=dual-dsthost cake-mpu=64 cake-nat=yes cake-overhead=22 cake-overhead-scheme=ether-vlan,via-ethernet,docsis cake-rtt-scheme=internet kind=cake name=cake-docsis@download,unlimited
add cake-bandwidth=50.0Mbps cake-diffserv=besteffort cake-flowmode=dual-dsthost cake-mpu=64 cake-nat=yes cake-overhead=22 cake-overhead-scheme=ether-vlan,via-ethernet,docsis cake-rtt-scheme=internet kind=cake name=cake-docsis@download,50M
add cake-ack-filter=filter cake-bandwidth=40.0Mbps cake-diffserv=besteffort cake-flowmode=dual-srchost cake-mpu=64 cake-nat=yes cake-overhead=22 cake-overhead-scheme=ether-vlan,via-ethernet,docsis cake-rtt-scheme=internet kind=cake name=cake-docsis@upload,40M

/queue simple
add bucket-size=0.1/0.2 dst=ether1_wan max-limit=40M/700M name=wan queue=cake-docsis@upload,40M/cake-docsis@50M,unlimited target="" total-queue=default
add bucket-size=0.005/0.005 name=priority packet-marks=icmp,dns,syn,http-init,sip parent=wan priority=1/1 target=""
add bucket-size=0.05/0.1 name=untracked packet-marks=no-mark parent=wan queue=cake-docsis@upload,40M/cake-docsis@download,50M target="" total-queue=default
Usually I use cake-docsis@download,unlimited queue but for tests I have change it to new one cake-docsis@download,50M and I can see it working properly. For reference, cake-docsis@upload,40M also works properly and will reduce speed if modified to lower ones.

Note that I have created simple queue with target 0.0.0.0/0 and destination set to WAN interface, and then used that queue as parent for other queues (omitted for brevity, only priority and untracked shown) however in your case it is not necessary to create any child queues so you can ignore that part.

With CAKE, you should also be able to ignore the max-limit on simple queue itself, however note that if you do that and then create additional children under it you won't be able to use limit-at to guarantee minimum speed for that children.
 
blurrybird
newbie
Posts: 35
Joined: Sun Jan 19, 2020 12:25 pm

Re: some quick comments on configuring cake

Sun Jan 02, 2022 10:57 am

Did a bit more testing, you're right. It's working as intended but it's probably still being hardcoded as 'egress' so it's not ideal.

Hm, could be related to the test server? I ran two speedtest.net tests at the same time as running a flent test. Code config and attached result below.

While individually they showed bad bandwidth metrics, the mikrotik interface showed the ether1 connection being saturated in both directions as intended.
/queue type
add cake-diffserv=besteffort cake-nat=yes kind=cake name=cake-default
add cake-ack-filter=aggressive cake-bandwidth=40.0Mbps cake-diffserv=besteffort cake-nat=yes kind=cake name=cake-up
add cake-bandwidth=900.0Mbps cake-diffserv=besteffort cake-nat=yes cake-wash=yes kind=cake name=cake-down
/queue simple
add bucket-size=0.001/0.001 name=cake queue=cake-down/cake-up target=ether1 total-queue=cake-default
You do not have the required permissions to view the files attached to this post.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Sun Jan 02, 2022 7:17 pm

Really good example to show the effects of a simultaneous speedtest against a rrul test! But I wouldn't call it "bad bandwidth metrics", but perhaps, "sanely reduced?". What that test shows is the side effect of that traffic on the rrul workload through cake. Given that the latencies held are so low, the speedtest enters the path at T+38, the rrul download flows yield (as a function of the number of total flows), and we use up some bandwidth for the acks on the upload, in both cases quickly, the download phase ends (and the rrul flows regain the bandwidth, quickly), then the upload phase starts, and we quickly yield and regain the bandwidth requested.

That was how the internet was supposed to work!!!

In terms of seeing cake outperform fq_codel, if you were to do a speedtest from another IP address, instead, the speedtest would grab roughly half of the overall bandwidth, again yielding and restoring very quickly, which shows the benefit of per host/per flow FQ.

And in all cases the observed latency for all flows from everywhere would stay relatively flat, packet loss would be minimal to non existent for flows of a lower rate than these.

Another very good test is running rrul and doing a web page PLT (page load time) benchmark, and observing the side effects here too. Usually web pages are almost entirely bound by RTT at bandwidths above 20Mbit. I know I've linked to a lot of papers over the course of this testing cycle, but this one was really really crucial and if only more had read it in 2010... https://www.belshe.com/2010/05/24/more- ... tter-much/ (please click through to the paper!!!!)

Seeing how sloppily a link a FIFO might perform with a rrul going, and the side effects on the speedtest itself (much slower bandwidth growth), is really useful for grokking the importance of rtt.

And lastly... speedtest optimizes a network for... speedtest. It has no resemblance to any other form of realistic network traffic at all, be it web, voip, videoconferencing, a typical upload or download pattern, netflix, or a family of four. I try not to rant more than once a week about how much "speedtest" has cursed the internet's design and optimization. rrul attempted to capture and understand the side effects of torrent-like traffic on interactive traffic, but it was always my intent that it be used to create a steady background load that other traffic could be measured against. I talked about those use cases in my talks at MIT and Stanford back in the early days.
 
mducharme
Trainer
Trainer
Posts: 1740
Joined: Tue Jul 19, 2016 6:45 pm
Location: Vancouver, BC, Canada

Re: some quick comments on configuring cake

Sun Jan 02, 2022 10:55 pm

That bucket size is extremely small. My experience with such tiny bucket sizes is that it is often impossible to reach higher rates. I would suggest increasing the bucket size for testing (default is 0.1).
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Mon Jan 03, 2022 12:43 am

that's good to know! Small htb quantums suck, also, and it's totally necessary to scale it in the sqm-scripts.

All the same, NO bucket size no htb instance should be required for cake, or a really big bucket size set.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Mon Jan 03, 2022 12:44 am

very good update on mike belshe's paper here: https://arxiv.org/pdf/1906.04753.pdf
 
blurrybird
newbie
Posts: 35
Joined: Sun Jan 19, 2020 12:25 pm

Re: some quick comments on configuring cake

Mon Jan 03, 2022 2:31 am

Thank you all for the inputs. I read the original article, and skimmed the updated paper.

I've always tried to preach to others that bandwidth doesn't matter; latency does. It's nice to have some solid research on hand to back those claims up.

Through incremental (10Mbps) bumps in the bandwidth limits I've settled on a configuration that works well. The dips are when I executed a speedtest in parallel from a second device. While both tests were running, I saw the adapter pushing ~850Mbps of traffic at peak (via Winbox). A speedtest.net run by itself achieves 900/43 (for those who care about sharing those numbers).

To re-cap (and in case any fellow Australians find this), this is an Aussie Broadband NBN FTTP 1000/50 connection running on an RB5009UG+S+. Fasttrack firewall rules are disabled.
/queue type
add cake-diffserv=besteffort cake-nat=yes kind=cake name=cake-default
add cake-ack-filter=aggressive cake-bandwidth=45.0Mbps cake-diffserv=besteffort cake-nat=yes kind=cake name=cake-up
add cake-bandwidth=945.0Mbps cake-diffserv=besteffort cake-nat=yes kind=cake name=cake-down
/queue simple
add bucket-size=0/0 name=cake queue=cake-down/cake-up target=ether1 total-queue=cake-default
I'll call that a success for now. 🎉 Now to go and tackle the 100/40 connection down the road.
You do not have the required permissions to view the files attached to this post.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Mon Jan 03, 2022 4:33 pm

With the above result, yes, I think y'all have compelling reasons to run out and deploy fq_codel and cake everywhere you can, ASAP. :)
 
User avatar
Tporlapt
just joined
Posts: 3
Joined: Sat Jan 01, 2022 10:40 am

Re: some quick comments on configuring cake

Mon Jan 03, 2022 4:42 pm

With the above result, yes, I think y'all have compelling reasons to run out and deploy fq_codel and cake everywhere you can, ASAP. :)
I feel compelled :D

…but worth reminding that Simple Queues as used in some of the examples in this thread appear to break IPv6 under ROS 7.1.1 (ref viewtopic.php?t=181705) :(
Working with networked devices since the 1980s. That makes me old, not an expert.
 
blurrybird
newbie
Posts: 35
Joined: Sun Jan 19, 2020 12:25 pm

Re: some quick comments on configuring cake

Tue Jan 04, 2022 7:31 am

I feel compelled :D

…but worth reminding that Simple Queues as used in some of the examples in this thread appear to break IPv6 under ROS 7.1.1 (ref viewtopic.php?t=181705) :(
Heh, that's my thread too. Yes it's upsetting, but the Mikrotik Support team said they have replicated the problem based on my logs and look forward to a fix in a future version.

🤞🏻 that version is soon. For now I'm running Cake with IPv6 disabled.
 
WeWiNet
Long time Member
Long time Member
Posts: 586
Joined: Thu Sep 27, 2018 4:11 pm

Re: some quick comments on configuring cake

Tue Jan 04, 2022 11:23 am

/queue type
add cake-diffserv=besteffort cake-nat=yes kind=cake name=cake-default
add cake-ack-filter=aggressive cake-bandwidth=45.0Mbps cake-diffserv=besteffort cake-nat=yes kind=cake name=cake-up
add cake-bandwidth=945.0Mbps cake-diffserv=besteffort cake-nat=yes kind=cake name=cake-down
/queue simple
add bucket-size=0/0 name=cake queue=cake-down/cake-up target=ether1 total-queue=cake-default
I'll call that a success for now. 🎉 Now to go and tackle the 100/40 connection down the road.
I try your setup (just copy above config into my Chateau 5G terminal), IPV6 disabled, WAN interface LTE1 and changing values to 30Mbps DL and 5Mbps UL to adopt to my access speed.
If I do speedtest it goes to 45Mbps and more for DL and 10Mbps for UL. So basically exceeds the speeds set in the config.
Is this expected behavior ... ?
I also notice if you check the simple queue, tab "advanced", it shows "cake-up" under Downlink and cake-down under Uplink? Again, is this expected?
**
MTCNA
Chateau 5G: high speed :D meets ROS7 :shock: , the perfect match... :lol:.
Having an Audience? Use wifiwave2!!! (the more people complain, the faster it gets fixed 8) )
 
User avatar
Amm0
Long time Member
Long time Member
Posts: 611
Joined: Sun May 01, 2016 7:12 pm
Location: California

Re: some quick comments on configuring cake

Wed Jan 05, 2022 5:15 am


The other question from Bithaulers
any tips for LTE connections? Especially ones that go from ~5Mbps to 70Mbps in a few hours?

is also very valid, same problem again on my side. LTE (and soon 5G even worse) is a medium where in 24h the "pipe" itself changes heavily.
In this situation it is really hard to define the pipe size and do queueing with fixed values gets almost impossible.
What can CAKE do in this case?
Do have a followup on this one...

The CQI provided by Mikrotik tells you a fair amount of the expected speed (e.g. it tell you the modcode thus at least max speed). With more data (e.g. RSRP/RSPQ + EARFCN/Mhz) gives you more to proximate at least some temporary max speed. Since this data is readily available in a scheduler script, and a basic heuristic tied to CQI (ranges from 1 to 15, higher is better) is pretty easy to write (obviously a more sophisticated script could calculate even better the max POTENTIAL). This part is just some math and ROS script – to at least know the MAX you could ever see – then adapted downward purportially to set the queue Mb/s limits/etc.

While LTE speeds typically vary wildly by TOD, typically those dramatic swings in speed happen over say a few minutes, as the RF/backhaul situations worse. Network traffic also causes some "stickiness" to a speed, so that also stablitizes speeds somewhat. Anyway LTE isn't some light switch that goes from hero to 0 typically. You more often see it vacillate between a couple different speed profiles (e.g. say 20up/10down OR 40up/20down in a typical 4G network).

The question is if you arrived at some expected speed based on some cell data, how often would updating the queue from a script be appropriate. In other words, do you have to "shape the queue adjustments". Obviously slamming cake every few seconds with a new calculated value for LTE speed that varied wildly would likely not be much use.

Basically curious if there is a sense if "dynamicly updating" the queue config has unattended side-effects... I'm thinking every 15s to 2m would be happy medium – assuming nothing weird happens upon changing the queue like dropping connection or other things I haven't considered...thus the question
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Thu Jan 06, 2022 4:06 am

0) The right way to manage LTE outbound is with backpressure from the radio. :various cusswords elided: as to why we still don't have a good way to do that.

1) Same goes for managing the queues on the enode-bs and other bottlenecks on the path.

We designed fq-codel to be lightweight, with backpressure mechanisms like BQL and AQL, but many device drivers and offloads eat too much data down below where we can stick the AQM. Here are APIs to do that more right: http://www.taht.net/~d/broadcom_aug9_2018.pdf

So... sigh.

2) There is some *very* good research for dynamically estimating the LTE bandwidth in both directions being done on this openwrt thread here: https://forum.openwrt.org/t/cake-w-adap ... dth/108848

While the script is in lua, the math so far seems to be improving, and is thus implementable in anything (there's also a shell script) As to how to apply that to mikrotik, I have
no idea, I hope there is a way?

We designed cake in particular to let you feed in cross layer - or other sorts - of bandwidth statistics without needing to be reset. Still, see items 0 and 1. Backpressure is tons simpler and more accurate to implement on the devices themselves. Ideally someone making 5g gear has been paying attention.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Fri Jan 14, 2022 3:35 am

I am going to take a break from this thread for a while. The bitag report I'd been working on was released yesterday and I hope it's
even more incentive for y'all to focus on reducing "working latency' in your networks. Please reshare; https://www.bitag.org/latency-explained.php

1) Can someone confirm that 7.2 perhaps has a working ipv6?
2) I'd love to know if the hashing algos in fq_codel and cake can actually get into an mpls header and FQ on this encapsulation.
3) Dying to hear of higher end mikrotik boxes being tested with cake, esp with multiple customers over a 10Gbit link.

I have talks scheduled with nznog and with ICSI later this month and have to go heads down on writing those. Also I'd like to reuse some of the plots generated here as examples for the nznog talk, notably the rtt_fair and cake vs speedtest ones, if that's ok?

Happy debloating!
 
User avatar
IPANetEngineer
Trainer
Trainer
Posts: 1571
Joined: Fri Aug 10, 2012 6:46 am
Location: Denver, CO USA
Contact:

Re: some quick comments on configuring cake

Sat Feb 05, 2022 2:49 pm

1) Can someone confirm that 7.2 perhaps has a working ipv6?

I can confirm that IPv6 and Cake are working on 7.2rc3. This is a basic test to validate IPv6 connectivity and shaping. Will setup more complex testing now that I have a baseline for a lab.


Image
Global - MikroTik Support & Consulting - English | Español | Serbian | Danish +1 855-645-7684
https://iparchitechs.com/ecosystem/mikr ... consulting mikrotiksupport@iparchitechs.com
 
mducharme
Trainer
Trainer
Posts: 1740
Joined: Tue Jul 19, 2016 6:45 pm
Location: Vancouver, BC, Canada

Re: some quick comments on configuring cake

Mon Feb 07, 2022 5:54 am

I can confirm that IPv6 and Cake are working on 7.2rc3.
The issue reported was not about IPv6 and cake specifically. It was about IPv6 not working when there was a simple queue (of any type) used with an interface as the "target". Cake works fine with IPv6 with queue trees and interface queues even on 7.1.1, but not with simple queues. My understanding is that 7.2rc3 is no different from 7.1.1 in this way, but I haven't tried it myself to confirm. Your post doesn't make clear whether you tried this with a simple queue that used an interface as the target - if you haven't, then you haven't actually verified whether or not this specific problem is resolved.
 
User avatar
IPANetEngineer
Trainer
Trainer
Posts: 1571
Joined: Fri Aug 10, 2012 6:46 am
Location: Denver, CO USA
Contact:

Re: some quick comments on configuring cake

Mon Feb 07, 2022 7:07 pm

The issue reported was not about IPv6 and cake specifically. It was about IPv6 not working when there was a simple queue (of any type) used with an interface as the "target". Cake works fine with IPv6 with queue trees and interface queues even on 7.1.1, but not with simple queues. My understanding is that 7.2rc3 is no different from 7.1.1 in this way, but I haven't tried it myself to confirm. Your post doesn't make clear whether you tried this with a simple queue that used an interface as the target - if you haven't, then you haven't actually verified whether or not this specific problem is resolved.

That was the test I performed. IPv6 + simple queue using the interface as a target works on 7.2rc3 and CCR2116

/queue type
add cake-bandwidth=1700.0Mbps kind=cake name=aqm-cake
/queue simple
add name=queue1 queue=aqm-cake/aqm-cake target=vlan3200

Global - MikroTik Support & Consulting - English | Español | Serbian | Danish +1 855-645-7684
https://iparchitechs.com/ecosystem/mikr ... consulting mikrotiksupport@iparchitechs.com
 
mducharme
Trainer
Trainer
Posts: 1740
Joined: Tue Jul 19, 2016 6:45 pm
Location: Vancouver, BC, Canada

Re: some quick comments on configuring cake

Mon Feb 07, 2022 7:11 pm

That was the test I performed. IPv6 + simple queue using the interface as a target works on 7.2rc3 and CCR2116
It might be fixed then - is that with connection tracking? i.e. do you have an IPv6 allow established,related firewall rule that is working correctly with that queue in place?
 
jult
newbie
Posts: 36
Joined: Sat Dec 26, 2020 1:16 am

Re: some quick comments on configuring cake

Tue Feb 08, 2022 3:41 pm

* Cake tries really hard to follow a bunch of mutually conflicting diffserv RFCs, and in an age where videoconferencing is very important the cake diffserv4 model is closer to how a wifi AP treats it. see: https://www.w3.org/TR/webrtc-priority/ for this underused facility in webrtc.
As an end-user here, I have some cake-related questions for you;

LTE 5G
I run a network with a maximum of 12 users on it at one time (not counting all the servers/smart/domotica devices). We have an LTE 5G (I think it's still some type of double 4G now) HUAWEI CPE modem as our port to the internet. I run it in Bridge-mode, and then our Mikrotik RB4011iGS+5HacQ2HnD gets the CGNAT IP on its WAN side. When I run speedtests through the Huawei modem only, I reach a maximum of 321 Mbps down, and 98 Mbps upstream, with a lowest ping-pong of 12 ms. Pretty nice results for over LTE, I'd say. but, like many others have stated; LTE is highly unstable, although we're pretty close and in line-of-sight of the cell-tower it's using, it rarely drops further down than to around 48Mbps down and 40Mbps up, which is still more than sufficient for all activities on this network.
LAN
- The wired network runs mostly 1Gbe ports, but also some 10Gbe and a bunch of bound ports that make 2Gbe connections to LAN servers. The main MT router has 3 wireless APs connected to it, that is to say; 2 are built-in (a 5Ghz and a 2.4GHz), and another 2.4GHz is external (a Mikrotik Metal 52 ac device).
fq_codel and cake
- I have switched queue-type for the wireless interfaces to fq_codel now. Any other config tips for wifi regarding fq_codel (or even cake) are welcome.
- I have set the WAN-port (ether1 interface on mikrotik's router), where the cable to the huawei 5g-modem goes, to use cake, with NAT on (since it does NAT), and Wash on as well, because I don't think any QoS is happening on that IO, or is there? If so, that huawei doesn't do anything with it.
RB4011iGS+5HacQ2HnD / RBMetalG-52SHPacn / RB850Gx2 / CSS106-1G-4P-1S
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Sun Feb 20, 2022 3:39 pm

I am not huge on present-day speedtests and have shown off flent through this thread. I really wish we could get good stats out o mikrotik to see the effectiveness of things. Another thing you can be doing is a packet capture of your speedtests and plotting rtt in wireshark.

Historically, lte drivers' interfaces were very overbuffered, and piling fq_anything on top of it did not accomplish much. It's my hope at least some drivers have corrected this. If not you can choose the minimum observed bw for cake, or attempt to control it dynamically ( https://forum.openwrt.org/t/cakes-autor ... ss/108848/ )
 
mke
just joined
Posts: 13
Joined: Wed Sep 27, 2017 3:37 am

Re: some quick comments on configuring cake

Sun Feb 27, 2022 4:35 am

Hi,

Thanks for this thread! Lots of useful info in here. I have a few questions...

Is setting an overhead value only useful if you are trying to run cake close to sync speeds? Ie if you have things running well at a reasonable percentage of sync up/down is there any reason to consider setting overhead besides being able to push things higher? FWIW I am a soho user on a FTTN connection, VDSL2 Ipoe, setup is bridged VDSL2 modem connected to HEX via ethernet cable. Trying to figure out correct overhead has been a very confusing topic.

Secondly, how is MPU determined and once again is there any benefit to setting this, either alongside or in isolation from overhead?

Finally, considering the setup of a simple queue with DL and UL queue types set to "dual-dsthost" and "dual-srchost", what Flow Mode setting makes sense for the Total Queue queue type? This would default to "triple-isolate", and I am assuming would very rarely kick in unless you were maxing downloads and uploads simultaneously.
Last edited by mke on Sun Feb 27, 2022 6:37 am, edited 1 time in total.
 
blurrybird
newbie
Posts: 35
Joined: Sun Jan 19, 2020 12:25 pm

Re: some quick comments on configuring cake

Sun Feb 27, 2022 5:13 am

The issue reported was not about IPv6 and cake specifically. It was about IPv6 not working when there was a simple queue (of any type) used with an interface as the "target". Cake works fine with IPv6 with queue trees and interface queues even on 7.1.1, but not with simple queues. My understanding is that 7.2rc3 is no different from 7.1.1 in this way, but I haven't tried it myself to confirm. Your post doesn't make clear whether you tried this with a simple queue that used an interface as the target - if you haven't, then you haven't actually verified whether or not this specific problem is resolved.

That was the test I performed. IPv6 + simple queue using the interface as a target works on 7.2rc3 and CCR2116

/queue type
add cake-bandwidth=1700.0Mbps kind=cake name=aqm-cake
/queue simple
add name=queue1 queue=aqm-cake/aqm-cake target=vlan3200

It's not fixed. RB5009 upgraded from 7.1.3 to 7.2rc4 after reading this message.

Enabling a simple queue causes new connections to be marked (and dropped) as invalid.

Existing connections created before the simple queue was enabled persist since they're already established.
 
mke
just joined
Posts: 13
Joined: Wed Sep 27, 2017 3:37 am

Re: some quick comments on configuring cake

Tue Mar 01, 2022 1:36 am

To answer my own question above re overhead, the best explanation I have found is on this thread here: https://forum.openwrt.org/t/sqm-cake-li ... n/32578/15

"Most people can just put 44 and it will work for you regardless of what your underlying technology is. For some people with fiber or cable connections, this may waste up to around 1-2% of your bandwidth, but it will bias you towards having lower bufferbloat rather than higher which is usually a good thing. The primary use case for more precise tuning is when your internet speed is relatively low (less than 5Mbps) and / or you have more than 20% of your internet speed in either direction taken up by small-packet traffic such as VOIP or gaming. If you have more than 5Mbps and/or you are not running a call center, further adjustment is probably not worth the effort and you should spend more time trying to better measure your reliable level of internet speed itself."

"I guess what I'm arguing is that exactly right is not needed, within 10% of the right value is probably just fine for all but a VOIP call center running hundreds of calls on a tight 10-15Mbps symmetric line. The reason it's needed is to calculate the true packet size. If packet payload size is 1500 bytes then +-45 bytes makes only ~3% error. If packet size is 150 bytes like in a VoIP call, then 45 bytes is 33% error! So having the overhead included is important, but the difference between overhead say 44 and overhead 48 is 4 bytes and 4 bytes on 150 bytes is now back to 3% error so, do you know the capacity of your line to within 3% error? If not then even if you're a VoIP call center the error in your overhead if you just say 44 bytes is offset by the fact you aren't sure if you should put 10000 kbps or 9700kbps... If you're not a call center it's even an order of magnitude less important..."
Either way setting these values seems valuable for VOIP.

Like blurrybird I am on Aussie Broadband in OZ, which is an Ipoe VDLS2 connection with no VLAN tagging. My modem is connected to my router via ethernet. This translates to "cake-overhead-scheme=bridged-ptm,via-ethernet", which also sets "cake-atm=ptm cake-overhead=22" on save. My assumption is the "PTM" setting accounts for the 64/65 encapsulation (ie you can set shaping slightly closer to sync rate if you want), and MPU is calculated in the background based on overhead scheme settings.
 
skoenman
newbie
Posts: 31
Joined: Fri Nov 07, 2008 11:42 am

Re: some quick comments on configuring cake

Sat Mar 05, 2022 9:03 pm

Hello guys are some of you running cake on breakout RB5009?? Previous time i tried it moment one puts in total limit above 100mbp the router reboots randomly when running for a min or two...
 
blurrybird
newbie
Posts: 35
Joined: Sun Jan 19, 2020 12:25 pm

Re: some quick comments on configuring cake

Fri Mar 25, 2022 12:25 am

CAKE is now happily working with IPv6 again on 7.2rc5 (if anybody was waiting for that).
Like blurrybird I am on Aussie Broadband in OZ
Mke I'd love to get the following info off you to compare configs:

- NBN technology (FTTN or FTTC)
- Your speed tier (100/40, 50/20, etc)
- Your full queue config export
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Mon Apr 04, 2022 9:41 pm

I'm glad to hear the ipv6 problem appears to be fixed. I would love to see some flent benchmarks of ipv6 traffic to prove that though....

Also, on the overhead parameter. Certain forms of DSL are very inefficient at small packets, with 60% overhead. So while one benchmark might look ok (using large packets) another one - like in the rrul test, might get things really wrong due to the presence of the 66 byte acks. Try to get the dsl overhead right.

Still looking for what happens on mpls traffic....
 
User avatar
denisun
Frequent Visitor
Frequent Visitor
Posts: 74
Joined: Wed Jul 16, 2014 6:38 pm
Location: Greece

Re: some quick comments on configuring cake

Wed Apr 06, 2022 9:03 pm

Is it better to have a cake queue without mangle rules or sfq queues with (mangle rules)?
For home use with fiber connection with voip.
"What one programmer can do in one month, two programmers can do in two months."
Fred Brooks...
 
jmszuch1
just joined
Posts: 10
Joined: Fri Oct 19, 2018 10:21 pm

Re: some quick comments on configuring cake

Sun Apr 10, 2022 4:08 am

Hey All, with the release of 7.2 to stable I'd figured I could provide some fresh test results and also an example for configuring things if you have a pretty simple setup like me. This might be a pretty long post! I'm using a RB5009 on a Charter Spectrum cable connection. The speed is rated for 100Mbps download and 10Mbps upload. Running a speed test from the Ubuntu 20.04 VM I have running backs up that result (actually slightly more):
 Speedtest by Ookla

     Server: Winn Telecom - Mount Pleasant, MI (id = 1062)
        ISP: Spectrum
    Latency:    26.63 ms   (10.78 ms jitter)
   Download:   114.17 Mbps (data used: 137.9 MB )
     Upload:    10.93 Mbps (data used: 5.5 MB )
Packet Loss:     0.0%
I created a simple queue and assigned Cake as the queue type. The target was set to be the network range that any device on my network would have, including the IPv6 addresses being received from Spectrum. Here's the configuration export for that:
/queue/export
# apr/09/2022 11:31:30 by RouterOS 7.2
# software id = V13P-7JPC
#
# model = RB5009UG+S+
# serial number = EC1A0E402D35
/queue type
add cake-bandwidth=10.0Mbps cake-diffserv=diffserv4 cake-memlimit=32.0MiB \
    cake-mpu=64 cake-nat=yes cake-overhead=18 cake-overhead-scheme=docsis kind=\
    cake name=cake-up
add cake-bandwidth=105.0Mbps cake-diffserv=diffserv4 cake-memlimit=32.0MiB \
    cake-mpu=64 cake-nat=yes cake-overhead=18 cake-overhead-scheme=docsis kind=\
    cake name=cake-down
/queue simple
add name=Spectrum queue=cake-up/cake-down target=\
    192.168.88.0/24,2600:6c4a:5a00:56a::/64,192.168.5.0/24
I have the bandwidth limits set within each cake queue. One is assigned to the upload and the other to the download. Cake has been left pretty much alone, main changes would probably be that I did enable NAT (since the router is handling it) and also set the overhead to DOCSIS.

I then went about performing some tests. First here's the regular speed test result now with cake:
Speedtest by Ookla

     Server: CMS Internet - Mount Pleasant, MI (id = 735)
        ISP: Spectrum
    Latency:    27.12 ms   (6.73 ms jitter)
   Download:    91.94 Mbps (data used: 120.2 MB )
     Upload:     9.45 Mbps (data used: 8.2 MB )
Packet Loss:     0.0%
Now some simpler tests such as the DSL Reports and Waveform speed tests
Screenshot 2022-04-09 203728.png
Screenshot 2022-04-09 203913.png
Then using all the wonderful information in this thread, I went and installed Flent on my Ubuntu VM and began performing tests to add to the pile here. For reference, this is what the qdisc on the Ubuntu VM is set:
tc qdisc show dev eth0
qdisc mq 0: root
qdisc fq_codel 0: parent :4 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: parent :3 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: parent :2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: parent :1 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
Here's the various tests I ran with Flent. I'll post the summary graph and command I used to generate it. I'll also include a OneDrive link to the full Flent data sets if anyone is really interested in looking through them. Here we go!
cake-spectrum-rb5009-rrul-300.png
flent rrul -p all_scaled -l 300 -H dallas.starlink.taht.net --step-size=.05 -t cake-spectrum-rb5009-rrul-300 -o cake-spectrum-rb5009-rrul-300.png
cake-spectrum-rb5009-rttfair-300.png
flent rtt_fair -p all_scaled -l 300 -H dallas.starlink.taht.net -H fremont.starlink.taht.net -H london.starlink.taht.net -H singapore.starlink.taht.net -H sydney.starlink.taht.net --step-size=.05 -t cake-spectrum-rb5009-rttfair-300 -o cake-spectrum-rb5009-rttfair-300.png
cake-spectrum-rb5009-rttfairvar-300.png
flent rtt_fair_var -p all_scaled -l 300 -H dallas.starlink.taht.net -H fremont.starlink.taht.net -H london.starlink.taht.net -H singapore.starlink.taht.net -H sydney.starlink.taht.net --step-size=.05 -t cake-spectrum-rb5009-rttfairvar-300 -o cake-spectrum-rb5009-rttfairvar-300.png
cake-spectrum-rb5009-tcpndown-300.png
flent tcp_ndown -p ping -l 300 -H dallas.starlink.taht.net --step-size=.05 -t cake-spectrum-rb5009-tcpndown-300 -o cake-spectrum-rb5009-tcpndown-300.png --te=download_streams=4 --te=ping_hosts=8.8.8.8
cake-spectrum-rb5009-tcpnup-300.png
flent tcp_nup -p ping -l 300 -H dallas.starlink.taht.net --step-size=.05 -t cake-spectrum-rb5009-tcpnup-300 -o cake-spectrum-rb5009-tcpnup-300.png --te=upload_streams=4 --te=ping_hosts=8.8.8.8

Now that we're through with those, I also ran the same tests with no queueing enable. That way people could see how the connection operated under load before enabling cake.
Screenshot 2022-04-09 205358.png
none-spectrum-rb5009-rrul-300.png
none-spectrum-rb5009-rttfair-300.png
none-spectrum-rb5009-rttfairvar-300.png
none-spectrum-rb5009-tcpndown-300.png
none-spectrum-rb5009-tcpnup-300.png
So hopefully this proves interesting to people such as Dave and anyone else who sees this thread! If there's any questions (or if I messed up gathering this information) then just let me know :D

Oh, and here's the OneDrive link I mentioned earlier as well: https://1drv.ms/u/s!Ap4u4Rte63FqjqlUGuc ... A?e=2BHzSY
You do not have the required permissions to view the files attached to this post.
 
fragtion
Member Candidate
Member Candidate
Posts: 172
Joined: Fri Nov 13, 2009 10:08 pm
Location: Johannesburg, South Africa

Re: some quick comments on configuring cake

Sun Apr 10, 2022 2:08 pm

There seems to be a routeros stability issue when using cake, even on v7.2. Seemingly affecting most, if not all platforms.
Anyone else run into this issue and know any workarounds (or which specific configuration triggers the issue to occur) for this?

Update: Turns out, there's a workaround, too! check linked thread ^ for details
Last edited by fragtion on Thu Apr 14, 2022 10:53 pm, edited 2 times in total.
 
jmszuch1
just joined
Posts: 10
Joined: Fri Oct 19, 2018 10:21 pm

Re: some quick comments on configuring cake

Sun Apr 10, 2022 3:59 pm

I didn't run into anything like that during my testing, although it's possible that I wasn't putting enough load on the router to cause a problem like that to occur. My router is also a different model with a different architecture than the ones mentioned there. I'll chime in on the thread you linked to keep this one a little cleaner, thanks!
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Sat Apr 30, 2022 8:49 pm

So happy to see a result like this, in spanish: https://www.adslzone.net/foro/mikrotik. ... do.584568/
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Sun May 01, 2022 1:35 am

I didn't run into anything like that during my testing, although it's possible that I wasn't putting enough load on the router to cause a problem like that to occur. My router is also a different model with a different architecture than the ones mentioned there. I'll chime in on the thread you linked to keep this one a little cleaner, thanks!
You can benefit from the ack-filter on the up to some extent, but I'm pretty sure you would prefer the sqm'd result to the non-sqm'd.

btw, I have also been benchmarking BBR behavior. on your vm you can try a

modprobe tcp_bbr

insert this this to the rtt_fair command line

–test-parameter cc_algos=bbr,bbr,bbr,bbr
 
rooneybuk
newbie
Posts: 40
Joined: Fri Feb 20, 2015 12:09 pm

Re: some quick comments on configuring cake

Wed May 04, 2022 12:15 pm

I'm using the below on a symmetrical 1Gb/1Gb connection but it reduces the over upload and download to 500Mb and I've checked CPU usage which is around 58% on an RB4011 any ideas what I'm doing wrong.
/queue type
add cake-diffserv=besteffort cake-nat=yes kind=cake name=cake-default1
add cake-ack-filter=filter cake-bandwidth=950.0Mbps cake-diffserv=besteffort cake-nat=yes kind=cake name=cake-up
add cake-bandwidth=950.0Mbps cake-diffserv=besteffort cake-nat=yes cake-wash=yes kind=cake name=cake-down
/queue simple
add bucket-size=0.001/0.001 name=cake queue=cake-down/cake-up target=ether1 total-queue=cake-default
 
fragtion
Member Candidate
Member Candidate
Posts: 172
Joined: Fri Nov 13, 2009 10:08 pm
Location: Johannesburg, South Africa

Re: some quick comments on configuring cake

Wed May 04, 2022 1:19 pm

I'm using the below on a symmetrical 1Gb/1Gb connection but it reduces the over upload and download to 500Mb and I've checked CPU usage which is around 58% on an RB4011 any ideas what I'm doing wrong.
I experience the same phenomenon on an RB5009 regardless if I use cake or fq_codel. Bandwidth on a Gigabit WAN link is roughly cut in half. I'm guessing it's a CPU constraint ?

I've also picked up another issue with queues where some incoming traffic seems to exceed the limit of the queue quite significantly, effectively breaking cake's ability to keep latency under control. In some cases I need to reduce the "bandwidth limit" to as low as 50% of the line's rated capacity to stabilize ping and prevent throughput saturation, resulting in a glaring discrepancy between traffic shown on the queue vs traffic shown on the actual interface. If anyone has any clues on this please do shout?
 
kikikaka
just joined
Posts: 9
Joined: Sun Jul 03, 2011 9:50 am

Re: some quick comments on configuring cake

Fri May 06, 2022 3:40 pm

I'm using the below on a symmetrical 1Gb/1Gb connection but it reduces the over upload and download to 500Mb and I've checked CPU usage which is around 58% on an RB4011 any ideas what I'm doing wrong.
/queue type
add cake-diffserv=besteffort cake-nat=yes kind=cake name=cake-default1
add cake-ack-filter=filter cake-bandwidth=950.0Mbps cake-diffserv=besteffort cake-nat=yes kind=cake name=cake-up
add cake-bandwidth=950.0Mbps cake-diffserv=besteffort cake-nat=yes cake-wash=yes kind=cake name=cake-down
/queue simple
add bucket-size=0.001/0.001 name=cake queue=cake-down/cake-up target=ether1 total-queue=cake-default
Same experience. Download speed is ok but the upload speed is cut to almost a half.
Just keep the default settings for the "total-queue=default-small" , then everything go fine. I get 9xx / 9xx Mbps.
Any queue type other than the default-small causes serious upload drop.
Actually I dont know why. I just trial and error.
 
ilium007
Member Candidate
Member Candidate
Posts: 187
Joined: Sun Jan 31, 2010 9:58 am
Location: Newcastle, Australia

Re: some quick comments on configuring cake

Tue May 10, 2022 4:15 am

Australian NBN FTTP 50/20 user here. I am using the following on the NBN connection:
/queue type
add cake-diffserv=besteffort cake-nat=yes kind=cake name=cake-default
add cake-ack-filter=filter cake-bandwidth=18.0Mbps cake-diffserv=besteffort cake-nat=yes kind=cake name=cake-up
add cake-bandwidth=47.0Mbps cake-diffserv=besteffort cake-nat=yes cake-wash=yes kind=cake name=cake-down
/queue simple
add bucket-size=0.001/0.001 name=cake queue=cake-down/cake-up target=ether1 total-queue=cake-default
Do I need to include an Overhead Scheme such as via ethernet?

I have been trying to find info about the Australian NBN 'upload policer' and how I limit the upload egress traffic to avoid the NBN Policer dropping packets. I also need to configure uploads to combat bufferbloat. Will one upload queue satisfy the other?
 
Rfulton
Frequent Visitor
Frequent Visitor
Posts: 74
Joined: Tue Aug 08, 2017 2:17 am

Re: some quick comments on configuring cake

Sat May 14, 2022 1:42 am

!) queue - do not allow using CAKE type in simple and tree setups (already configured queues will be disabled);

Cake confirmed dead for Mikrotik.

I assume Mikrotik has never tried to contact the developer?
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Sat May 14, 2022 6:45 am

What are you talking about? It's been working over here just fine?

A developer is me. Things were looking good.
 
ilium007
Member Candidate
Member Candidate
Posts: 187
Joined: Sun Jan 31, 2010 9:58 am
Location: Newcastle, Australia

Re: some quick comments on configuring cake

Sat May 14, 2022 6:46 am

cake working fine in simple queue for me routerOS 7.2.3
 
Rfulton
Frequent Visitor
Frequent Visitor
Posts: 74
Joined: Tue Aug 08, 2017 2:17 am

Re: some quick comments on configuring cake

Sat May 14, 2022 1:42 pm

What are you talking about? It's been working over here just fine?

A developer is me. Things were looking good.
Then maybe you should read the latest patch notes?
 
felixka
newbie
Posts: 41
Joined: Mon Oct 19, 2020 4:12 am

Re: some quick comments on configuring cake

Sat May 14, 2022 10:54 pm

What are you talking about? It's been working over here just fine?

A developer is me. Things were looking good.
They are talking about that Mikrotik decided to limit cake to interface queues only in the latest 7.3beta40 release. Release notes buried here: viewtopic.php?t=185066#p932950
Apparently this breaks some use cases people had for using cake in other scenarios and there is a little but of murmuration because of that.
 
Trunkz
just joined
Posts: 5
Joined: Mon Dec 02, 2019 5:44 pm

Re: some quick comments on configuring cake

Mon May 16, 2022 11:09 pm

Hi!

Can somebody take a look at my CAKE config and tell me if there's anything I can do to get as close to line-speed as possible. Some background:

Zen VDSL2 connection (80/20) into Vigor 130 modem (VLAN 101); PPPoE connection established via RB4011 on ether1. Jumbo packets enabled (ether1 MTU 1508; PPPoE MTU 1500)

add bucket-size=0.001/0.001 max-limit=72M/18M name="Cake - Smaller Bucket" queue=default-cake/default-cake target=pppoe-out1 total-queue=default-cake
add cake-atm=ptm cake-diffserv=diffserv4 cake-memlimit=32.0MiB cake-nat=yes cake-overhead=30 cake-overhead-scheme=pppoe-ptm kind=cake name=default-cake

Line speed according to the Vigor130 is 78M/20M; so the more I can eek out to get towards this speed would be a bonus; but otherwise my question would be around whether the correct overhead or mpu is set (PPPoE connection; VLAN101 however this is currently set modemside and jumbo packets router-end)

Thanks :-)
 
Rfulton
Frequent Visitor
Frequent Visitor
Posts: 74
Joined: Tue Aug 08, 2017 2:17 am

Re: some quick comments on configuring cake

Tue May 17, 2022 4:41 am

Hi!

Can somebody take a look at my CAKE config and tell me if there's anything I can do to get as close to line-speed as possible. Some background:

Zen VDSL2 connection (80/20) into Vigor 130 modem (VLAN 101); PPPoE connection established via RB4011 on ether1. Jumbo packets enabled (ether1 MTU 1508; PPPoE MTU 1500)

add bucket-size=0.001/0.001 max-limit=72M/18M name="Cake - Smaller Bucket" queue=default-cake/default-cake target=pppoe-out1 total-queue=default-cake
add cake-atm=ptm cake-diffserv=diffserv4 cake-memlimit=32.0MiB cake-nat=yes cake-overhead=30 cake-overhead-scheme=pppoe-ptm kind=cake name=default-cake

Line speed according to the Vigor130 is 78M/20M; so the more I can eek out to get towards this speed would be a bonus; but otherwise my question would be around whether the correct overhead or mpu is set (PPPoE connection; VLAN101 however this is currently set modemside and jumbo packets router-end)

Thanks :-)
Cake is interface queue only.
 
Trunkz
just joined
Posts: 5
Joined: Mon Dec 02, 2019 5:44 pm

Re: some quick comments on configuring cake

Tue May 17, 2022 10:11 am

Cake is interface queue only.
Do you mean that the interface needs to be set to eth1 (as opposed to the pppoe-interface) or rather the upcoming change in ROS 7.3 that does not allow cake as a simple queue type?
 
arm920t
newbie
Posts: 37
Joined: Sat Aug 03, 2019 8:02 am

Re: some quick comments on configuring cake

Tue May 17, 2022 1:30 pm

Cake is interface queue only.
Do you mean that the interface needs to be set to eth1 (as opposed to the pppoe-interface) or rather the upcoming change in ROS 7.3 that does not allow cake as a simple queue type?
Mikrotik couldn't fix the bug so they cut the feature.Now cake in 7.3beta40 is useless.
 
Rfulton
Frequent Visitor
Frequent Visitor
Posts: 74
Joined: Tue Aug 08, 2017 2:17 am

Re: some quick comments on configuring cake

Wed May 18, 2022 2:02 pm

Looks like it's back to TP-Link
 
Lodion
just joined
Posts: 5
Joined: Fri Nov 30, 2018 7:05 am

Re: some quick comments on configuring cake

Thu May 19, 2022 8:03 am

Cake is some sort of voodoo magic! Uploads are faster, downloads are more consistent, latency is lower under load??

With no queueing:
Image

With CAKE:
Image
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

less latency under load

Sat May 21, 2022 4:35 pm

Nice result! And yes, under load on some technologies it's possible to get less latency, as remarkable as it is.

Powersave is often a problem. A device will go to sleep until there are more packets to transmit. This is a somewhat foolish behavior network-wise, in that - for example - a tcp syn then syn/ack packet outstanding needs all the boost it can get to get more packets in flight once the flow gets going.

One string of cable modems would sleep stupidly this way. Many of our devices will buffer up small numbers packets over a small interval and only release them after a ms or 4, to save on cpu context switches, also.

In other cases you can get inside the request/grant loop that some gpon and some cable has. The underlying hw makes a request for a slot ahead of time based on an estimate of what it will need in the next cycle from the previous, thus overlapping requests. cable has a 2-6ms request/grant cycle.

In wifi, the rate controller stablizes the more you use it.

the fq-codel derived packet schedulers try hard to give the sparsest flows a boost towards the head of the queue, so they more rapidly can come into balance with the others.

Lastly the reason why your upload is so good and responsive is due to the low latency and shorter queues tcp is seeing in that direction. So few test for up and download at the same time, and your first result is what so many see, an upload goes to hell with big queues.
 
dtaht
Member Candidate
Member Candidate
Topic Author
Posts: 136
Joined: Sat Aug 03, 2013 5:46 am

Re: some quick comments on configuring cake

Sat May 21, 2022 4:36 pm


Do you mean that the interface needs to be set to eth1 (as opposed to the pppoe-interface) or rather the upcoming change in ROS 7.3 that does not allow cake as a simple queue type?
Mikrotik couldn't fix the bug so they cut the feature.Now cake in 7.3beta40 is useless.
they did reach out to me, and toke and I both replied, but they haven't got back to us.
 
User avatar
chechito
Forum Guru
Forum Guru
Posts: 1985
Joined: Sun Aug 24, 2014 3:14 am
Location: Bogota Colombia
Contact:

Re: some quick comments on configuring cake

Sat May 21, 2022 5:15 pm

Cake is some sort of voodoo magic! Uploads are faster, downloads are more consistent, latency is lower under load??

With no queueing:
Image

With CAKE:
Image

how can i do that kind of bandwidth/latency tests with that result graphs ??
 
jmszuch1
just joined
Posts: 10
Joined: Fri Oct 19, 2018 10:21 pm

Re: some quick comments on configuring cake

Sat May 21, 2022 5:51 pm

Cake is some sort of voodoo magic! Uploads are faster, downloads are more consistent, latency is lower under load??

With no queueing:
Image

With CAKE:
Image

how can i do that kind of bandwidth/latency tests with that result graphs ??
Check out Flent! If you're running Windows then you'll need to setup a linux machine in order to use it. If it's a newer Windows computer then I'd consider setting up the Windows Subsystem for Linux to get you going quickly.

Who is online

Users browsing this forum: No registered users and 3 guests