Traffic from 10Gbit machines to other 10Gbit machines is fine - gets ~9Gbit.
Traffic from 1Gbit machines to other 1Gbit machines is fine - gets ~950mbit.
Traffic from 1Gbit machines to 10Gbit machines is fine - gets ~950mbit.
But…
Traffic from 10Gbit machines to 1Gbit machines has a very high retry count and sits at about 150mbit.
Currently I’ve got all the 1Gbit machines on a separate subnet and the CRS317 is routing them to the 10G machines - and this is faster, at ~950mbit.
Happens on RouterOS or SwOS - and I’ve tried the latest RouterOS (6.41rc66) and 6.40.5, both have the issue. SwOS shows the issue too - the 2.3p it ships with I can’t get to work, but 2.6 has the issue too. I’ve also tried with the 1G side being on a SFP-RJ45 converter, the 1G side being 1000BaseLX and the front panel gigabit port. All still do the same thing.
I’ve looked around and I see a similar thread here with issues on the CSS326.
In the changelogs I also see:
*) crs326 - fixed packet processing speed on switch chip if individual port link speed differs;
*) crs326 - improved transmit performance from SFP+ to Ethernet ports;
I’m wondering if the CRS317 has a related issue, because that sounds like what I see…
iperf3 example is below. 172.16.1.20 is a 10Gbit server, 172.16.1.19 is a 1Gbit machine.
RouterOS 6.41rc66 and the upcoming SwOS 2.7 do not fix it on my CRS326/CSS326.
You can downgrade to SwOS 2.3 or 2.4. This will restore performance but might put your router into an endless reboot loop if SFP+ is inserted at boot.
crs317 is slow download for my customers that have a 1g sfp, so i had no choice to 10g trunk into a crs212 10s 1s+
the crs212 needs reboots every few weeks and the 317 does not have good download speeds on 1g sfp, ( download speed very slow for the customers)
so basically i have 9 customers that have a 1gig fiber and i have no reliable way of servicing them because the 212 always needs reboots, ive replaced it 3 times and same issue, and the 317 has an issue with traffic going from sfp+ to sfp 1g ( Slow )
If this dowsnt get fixed im going to have to start looking for alternative solutions other than mikrotik.
Hi I have 2 S+RJ10 on css326, they are connected to a NAS and a workstation. these 2 modules seems extremely unstable. They disconnects all the time, avg once a day. I found it in my workstation log. and sometimes the NIC says it’s connected, but no data sent/received. the cable is CAT6A. and I have put an fan besides the 2 modules inside the switch.
I own a CRS326 and 2 s+rj10 10G base-t module, this switch suffer the same 10g → 1g issue, I tried your suggested approach
enable ip filtering in bridge
create a mangle rule to mark 10g → 1g packets
create a red/pcq/… queue with max limit 1000M
it speeds up the data transfer speed from 200M to 300M, but cpu usage hits 80%
then i noticed that just enabling ip filtering and disable hardware offload for ports can achieve the same result.
I want to inform you that we are aware of 10G to 1G performance problem on CRS3xx devices and currently are working on new software fixes for better buffer allocation.
What is the switch buffer size? It might be good enough for LAN use, but still too small for real bursty traffic from the Internet. The buffer should be approximately sized to hold amount of data corresponding to the average RTT (round trip time), and in a switch this is limited by hardware (high-speed RAM inside the switch chip, not the much larger RAM available to RouterOS). Some switches specify the size (TP-Link T1700G-28TQ packet buffer memory: 1.5 MB) and it’s fairly small (12 ms worth of data at 1 Gbps - small compared to average RTT over the Internet) so it also matters how it is allocated dynamically between ports.
The buffer size (RTT times speed) you’re mentioning is too large (and is usually referred to as buffer-bloat). Needed buffer size in switch is typically smaller, a few frames (jumbo if needed) per switch. If there’s a congestion on single port, flow control needs to kick-in.
The buffer size you’re mentioning is typical TCP window size … which is allocated and managed by IP stack on TCP connection endpoints.
The buffer needs to cope with bursty traffic from not one, but many simultaneous TCP connections - I know this from own experience as a small local ISP (a few hundreds of customers), who had trouble with different upstream ISPs on three separate occasions over a few years. Buffer sizes were left at factory defaults in their switch (1Gb port ->100Mb port) or licensed radio links (1Gb port → 150Mb radio, 1Gb port → 300Mb radio), and larger size (once the issue was discovered and I convinced them to make this change) helped a lot. Otherwise there was packet loss (and my customers complaining about poor speedtest results, on a new larger and supposedly better upstream connection) when the link capacity was only half-utilized. For example, NEC iPasolink 200/400 have default queue size of 64 KB, maximum is 1 MB. My complaints to the upstreams were initially rejected as they couldn’t see the problem when testing with their network tester device (which generated packets at regular intervals, not as bursty as real traffic from the Internet).
Flow control has its own issues, the other device needs to have enough buffer size for it to work as you effectively move the queue there, and can bring down large part of a network if done wrong (bad device continuously flooding a port with pause frames). My point is the buffer size should be specified, and tunable (NEC got it right in their radio, but the operator must also read the docs to make use of this setting). Not every port needs so much, but for real traffic from the Internet it’s a must, as it’s really bursty. While I could shape upload traffic at my end to reduce bursts, I had no such control over the other (upstream ISP’s) end for download traffic.