Hello,
I’ve read a lot of threads here and on reddit regarding RB5009 2,5Gbe port related issues, but most of them got resolved around ROS7.9 and I haven’t found anyone with setup similar to mine.
I have a dumb 2,5Gbe switch connected to RB5009’s 2,5Gbe port. Apart from RB5009, NAS and PC are connected to this switch which allows me to transfer files between PC and NAS quickly as well as to uzilize 2Gbe downlink from my ISP which is provided using SFP module. That portion of the network works flawlessly and I can utilize full internet speed on both NAS and PC as well as transfer files at >1Gbe speeds between them (HDDs in NAS are limiting factor).
Problem starts with two Xiaomis AX3600 which are flashed with openWRT and wired directly to 1Gbe ports of my RB5009. And this one is really strange. If I connect anything to AX3600 using ethernet cable - no problems there. I can see 1gig troughput from NAS and internet downlink. But as soon as I switch to WiFi that no longer is the case - I still can do speedtest @ 1Gig speed (AX3600’s are capable of 1,3-1,7Gbe wireless while copying from two sources to overcome 1Gbe link limit), but transfer from NAS is limited to ~400Mbps. As soon as I remove 2,5Gbe advertisement on RB5009’s eth1 port, transfer speed goes up to 1Gbe, which can be observed on this screen recording: https://youtu.be/5qoR7-OQsM0
The only clue I got is that 400Mbps is the value I get while running iperf3 between my wirelessly connected laptop and NAS with 1 thread. Increasing thread count with 1Gbe enforced increases troughput to 1Gbe gradually, but when 2,5Gbe is enabled it does close to nothing when it comes to transfer speed. Do note that I have SMB Multichannel enabled on my NAS which also uses multiple threads to copy files. It’s like having 2,5Gbe enabled on eth1 somehow breaks the ability to use multithreaded transfers and that would explain why only wifi is affected as using the cable I can saturate 1Gbe / 2,5Gbe links using just one thread.
To me it looks like I discovered some bug in ROS, but I cannot figure out how to troubleshoot it further. I used two 2,5Gbe switches based on different platforms (Realtek and Broadcom): TP-Link TL-SH1005 and QNAP QSW-1105T-5T. I have RB5009UPR+S+IN running latest ROS stable - 7.11.2. I also attached simple diagram to help visualize my network setup. If some configuration files are needed, I will happily provide them.
I have the similar issues observed on Unifi U6 Pros, which has 1Gb eth. My NAS is wired to CRS326 via 10G DAC, and the Unifi APs are connected to CRS326 via CAT6 cables. It can get 1Gbps from any 1Gb eth device in the LAN, but I can only get 400Mbps download speed from my 10G NAS to any wireless devices via Unifi U6 Pro, but directly running iperf on Unifi U6 Pro via ssh, everything was fine. I guess there is something wrong with the Qualcomm IPQ5018 wlan.
There’s always a problem when passing between zones with different speeds … in particular when traffic is passing from faster towards slower speed. The device which spans the two zones has to buffer data and different devices do it with various success. MT seems to be among the worst and it seems that device type is not the biggest factor. Working implementation of flow control should help a lot, but MT default seems to be “off” (check /interface/ethernet, settings rx-flow-control and tx-flow-control are per port), but it should be enabled on both link partners and in particular on the faster side.
Buffering towards slower interfaces only helps for short bursts. For constant streams like iperf or large file transfers, even large buffers only can help for a short period. If there is more data arriving than possible to output on the egress port, buffers will overflow.
Only flow control can help, as it allows to build back pressure towards the sending device which will stop sending data until there is space in the buffer again.
The question is why MT disables flow control by default, especially on devices like RB5009 having 1, 2.5 and 10G ports on the same switch. It does not hurt if activated while not required.But it hurts the other way round like you have experienced.
I can’t talk for Mikrotik … but I seem to remember forum threads where users were complaining about subtle problems which went away after disabling flow control. Draw your own conclusions …
Hello,
Thank you for your replies. I issued the support ticket few days after posting here but after first response, which didn’t helped, support went silent. Hence my bump here. Unfortunately, enabling flow control for every port of my RB5009 only made it worse - now switching to 1Gig mode doesn’t help to bring the transfer speed back. I think I will have to live with 1Gig downlink until support for rtl8156 is added and maybe issue will go away when I will be using usb nic connected directly to RB5009.
Flow control for backpressure is not recommended these days, it’s better to let packets drop so that the upper level protocols know to slow down. It’s especially bad when you have an uplink port that’s transmitting to an end device - backpressure from the end device will cause flow control to be sent to the uplink port, pausing transmissions to every other device!
Maybe I’m missing something, but flow control does not change the fact that packets are dropped. It just changes where they are dropped. Without flow control packets are dropped at the receiving end because the RX buffer overruns. With flow control in case of backpressure packets are dropped at the sending device because the TX buffer overruns.
A local TX buffer overrun is easier to detect for the sending device compared to lost packets dropped at the receiving end.
A TCP stack can reduce the window size in case of TX buffer overrun of the underlying Ethernet adapter without having to wait for detection of missing ACKs due to packets being dropped at the receiving end.
I confirmed that issue occurs even without the switch. If I have NAS device connected directly to RB5009, download speeds on laptop get halved as soon as I enable 2,5Gbit link. But the problem is visible only when moving data in one direction - when I upload files from laptop to NAS, I get full gigabit speeds.
Lack of any response from Mikrotik support is disappointing. I reported this issue half a year ago, they said that they reproduced it, but became silent since. If that’s the hardware issue, then just admit it. Otherwise it looks like devices are being sold without any customer service. I received better support for some cheap Chinese products than the RB5009…
Trouble is in HW - Marvel 88E6393X is bad chip - small 2Mbit buffer. If you count, you need at least 8Mbit. CRS305-1G-4S+IN uses 98DX3236 with !!! 24Mbit !!! buffer.
Mikrotik can try emulate buffer by software, but it is bad way.
When you change port speeds between interfaces the switch chip hardware does not have high enough buffers to help with the flow as one port is faster than the other - hence the problems you are facing. This is a common issue with most switch chips in general.
There are somethings you can do to alleviate the issue.
Enable flow control on both devices - this is good but can cause other problems for low latency application requirements as it will pause the port for ALL traffic on the port when its needed.
Disable the hardware offloading for the high speed port - essentially forcing the packets to go through the CPU when that port is involved. This is good if you have fast path enabled otherwise your CPU will get saturated really fast. Also with this - its recommended to set multi-queue-ethernet-default for all interfaces in this case - even those that are offloaded - as any intercommunication with the high speed port can cause some TX drops on the port - this interface queue type will help solve this. This method is the most preferable - only if CPU allows it.
Wow! Thank you @CoMMyz! Disabling HW offload for this one port completely resolved this issue. I am able to saturate my 2Gbit connection from ISP while maintaining ~1Gbit transfer rate using wireless!
Attached screenshot might be confusing, because it shows two Command Prompt windows from different machines (thanks to RDP). The one on top skips RB5009 altogether since that’s traffic between PC and NAS connected using switch, but as you can see my connection to internet remains 2Gbit and - most importantly - my wireless still manages to push almost 1Gbit.
I cannot tell difference when it comes to CPU usage on RB5009. Both before and after disabling HW offload it’s ~30% when transferring between WAN and LAN @ 2Gbit speed.
Thank you, I struggled with this for almost a year and Mikrotik Support was not able to provide the solution.
That’s because vast majority of CPU resourdes are used for firewalling, some for routing and only minor portion for interface handling (bridging).
But I wonder if your case falls into same category as other cases in this thread. Others are using RB5009 as switch between 2.5GbE and 1GbE ports where both ports are members of same bridge and HW offload can kick in. If 2.5GbE port is used as WAN, then it’s usually off-bridge and L2 HW offload doesn’t work.