So my office is going to revamp its networking equipments. We now have 500/500 Mbps line, and around 100 devices connected. We now are using hEx and can provide full 500 Mbps traffic without any queue. Tried enabling fq-codel and the throughput dropped to around 100Mbps. Without any queue the bufferbloat is quite annoying and would like to upgrade and also give some room for bandwidth upgrade. Here are my needs:
I don’t really want to limit per-user speeds, I’d like them to have all the bandwidth, and share if any needs it. Do you have any recommendations on what mikrotik should I get that will do its full throughput with queue enabled? Fyi, the rest of the system is unifi based (switch and AP).
X86 might be an option. To run ros on bare matel with J4125.It take nearly 80% cpu on 1000m/50m link when downloading with qbittorrent.Packet rate is 89Kpps/80kpps.
Cpu usage is 44% with CAKE disabled.Packet rate is 89Kpps/54kpps
Do you only recommend x86? I prefer to use a ready made hardware offered by mikrotik as that would need less effort for me to setup. Are you implying that even their arm64 offering like CCR2004-16G-2S+ might not do gigabit + cake?
The problem with official test results is that router config is highly optimized for those tests … which includes fasttrack. For queues to be effective, fasttrack needs to be disabled (at least for traffic subject to queues) which drops performance very considerably even before queue processing overhead kicks in.
Personally I’ve no experience with running huge throughputs through queues, so I’ve no idea what I’m talking about right now. However, I’d take lowest number present in official test results and reduce it to, say, one third. This would be my assesment of device performance when queues are enabled.
Hopefully somebody with real experience will come by and give us some better approximations of performance.
I have CCR2004’s, RB4011’s, and RB5009’s handling 2Gbps of traffic, but without queues. All of them have quad-core 1+GHz CPU’s. Each 1Gbps of traffic (without queues) uses about 10% of CPU throughput.
Supposedly Cake is less CPU intensive than fq-codel, but I haven’t enabled either one on customer-facing routers that serve more than one household.
For my home office (and border routers) I opted for the CCR2116. With 2-3Gbps of traffic (no queues) on the SFP+ ports, they were hitting 15-20% utilization. With L3HW Offloading, their load dropped to 5% (for the world-facing routers doing full BGP) and 0% (for the internal one). They’ve got a 16-core CPU with a 40Gbps connection to the ethernet switch. All ports can be switched or routed, in hardware or software.
I would imagine that you might easily max out one of the quad-core routers trying to do a full gig while shaping dozens of users simultaneously. But doing a speed test with one person to get to 1Gbps likely won’t be an issue.
The RB4011 and RB5009 are cheap enough, it wouldn’t hurt to get one in and just try it. No point in going with the CCR2004 unless you want in-office multi-gigabit connectivity (2-7Gbps bridged generally works well on that box). The 2116 is much more expensive, but would certainly get the job done (and then some).
Thank you guys, your inputs are much appreciated. I think I’ll go with CCR2116 since there are many doubts if the quad-core options could do 1G + queue. Now let’s pray that I can convince the finance team for this purchase…
Cake co-author here. I feel a need to clarify a few things.
cake unshaped - running at the native line rate of the interface - should be able to do its job on most of the hardware discussed. So if your service is gbit, and that’s the line rate of the interface, you are in business on outbound.
However, if your ISP downlink is bloated, you might want to shape on inbound.
cake shaped, for example at 500Mbit, is 9x more cpu intensive than unshaped. I don’t have a lot of good data on cake’s performance on mikrotik hardware as yet (I look forward to yours). I primarily at gbit+ use it on x86, and depending on the x86 product it can go up to about 4gbit/sec.
Cake is not multicore in the shaped scenario, so no amount of cores can speed it up if you run out of oomph.
Ok, this is news for me. I thought cake is multicore. Is this also the case with fq-codel?
Do you have any recommendations for x86? Something like “the more ghz the better” or “intel’s 10th gen onward is good” or “anything with xxx instruction set should do”.
Looking at how the discussion goes, I think I’m now leaned towards CCR2004, and do cake on a separate x86 box.
Unshaped, what do we get? Improved queueing behavior?
I have a number of arm/arm64 routers running RouterOS 7 in production that I could experiment with. I also have 2116’s and 2004’s that could be tested without impacting customer traffic.
(I’m guessing on devices with L3HW offload, however, that most of that traffic would bypass the CPU and therefore any Cake queuing on the interface, line rate or not.)
@iqbalaydrus, what configuration did you try for fq_codel that produced 100Mbits? If you are not using virtual interfaces for WAN (like PPPoE), there is a way to use interface queue while having FastTrack enabled. The latter is the key in this approach as it saves a lot of CPU, giving you resources for queuing. It works really well for me to eliminate bufferbloat.
Take a look at my old post http://forum.mikrotik.com/t/using-routeros-to-qos-your-network-2020-edition/66683/252 In my very simple tests I could achieve ~900Mbps with fq_codel on a hAP ac2. I would suggest you try this approach on hEX just to see how far you get. Granted, hEX is a slower device, but if you can get decent real-world numbers with FastTrack + interface queues, it will give you an idea about routers you can use for gigabit connection. I’m pretty sure RB5009 would do nicely, although I don’t own it, so can’t say for sure.
FYI I implemented this on my RB5009 with cake, cpu load is less than a third of what it was. Guessing 1gbit should be possible. - Spoke too soon, cake is unstable when used like that
Cake can’t be used inside of an HTB queue (Queue Trees). Instead, you set it up directly on the interface facing the devices you want to shape.
I tested this on my CCR2116. I put a 500M Cake shaper on both of the ethernet ports that connect to my WiFi equipment, and it both limited it to 460ish and the latency stayed around 15ms.
I would likewise have put it on the SFP+ facing the Internet, but I have a 10Gbps connection (I am the Internet provider), so that wouldn’t do me any good. The bottleneck is the WiFi AP, not the router’s Internet connection.