We noticed a performance problem with CRS3xx switches. We are unable to reach more than 350 Mbps per session on this configuration scheme from S3 to S4, and more than 150 Mbps from laptop to S4
Our 2 datacenters linked by an 10Gbps amplified 320 km fiber with a constant RTT of 2.9 ms.
All our switches are configured in jumbo 9000 / 10218.
Servers are on ESXI running iperf3 with HP 530SFP+ adapters and Mellannox-connectx-3 40 Gbps QSFP cards.
We have now configured a test lab in parallel of our prod system to locate where the performance problem come from.
For us, it’s not the servers, not the amplified fiber, not the negotiation (no FCS error or dropped packets).
We started iperf3 on the ESXI servers with a new vmkernel on a dedicated trunk VLAN betwwen the datacenters and this is the resume we mesured with iperf3:
S3 → S4 2.03 Gbps down 8.07 Gps up 0 retries
S3 → S5 325 Mbps down 987 Mbps up 0 retries
S2 → S4 217 Mbps down 687 up 0 retries
Laptop → S4 130 Mbps down/up 0 retries
As soon as the latency of the link is à 3ms, we go down to 325/350 Mbps through 2 switch and have a diffrential 2 Gbp / 8 Gbps on the directly connected switch.
The amplified fiber was first a Layer2 service with same RTT on the CRS317, same result.
Not sure if I am in a bad mood today, but when someone comes here and asks for help, and then puts a comment like above in their post, I am loath to even attempt to assist
Thanks for your comment, I’m not a native english speaker as you can read
If you talk about brand, I’m asking if it’s a good idea to make a test with another switch brand. To be sure that’s the problem come from a configuration or a CRS cascade.
Are the data centers in production? How much traffic is normally on the link while you are testing?
The CRS326 most likely has 12 MB of packet buffer memory (based on other Marvell Prestera DX3236 board specs) while the CRS 317 packet buffer memory is unknown. As primarily a 10 gig switch, i’d assume the CRS317 has more memory and hopefully MikroTik can comment on the specifications.
I would try the Datacenter Interconnect between both CRS317s and both CRS326s and see if that changes the performance numbers
The link was on the CRS317 before, with another layer 2 provider with same ping time due to distance. We had the same result and put the fault on the provider limiting individual sessions.
Then we ordered a second link layer1 that was plugged on the the CRS317, same results. It’s why we tried now with CRS326-2Q+.
The environnement is in production of course, S3, S4, the two CRS-326-2Q+ and the new layer1 fiber are free of charge for testing.
The old link is still on production on both CRS317 and VMs has been migrated on S2, S5 and another server.
Most of the time we have max 500 mbps peak traffic intrer datacenter. During the test around 100 mbps.
We do the tests with default windows size, 64 kb. Will try this evening with 128 kb window…
ping 192.168.70.15
PING 192.168.70.15 (192.168.70.15): 56 data bytes
64 bytes from 192.168.70.15: icmp_seq=0 ttl=64 time=3.095 ms
64 bytes from 192.168.70.15: icmp_seq=1 ttl=64 time=3.072 ms
64 bytes from 192.168.70.15: icmp_seq=2 ttl=64 time=3.085 ms
You can’t help much. Almost whole RTT delay is physics (speed of light inside optical fibres is slightly more than 200.000 km/s, with round-trip length of 640km in your case it gives just around 3ms RTT) and the delayXspeed product is built into TCP protocol.
The only way I’ve ever been able to deal with this is by having a WAN optimization box between the DCs like Riverbed or WanOS. We had a DC migration between DCs with 50ms latency and had to use a Riverbed Steelhead to change the TCP window to avoid changing it on thousands of hosts due to the bandwidth delay product.
I don’t believe it is possible to do this with RouterOS
It’s routed, but the link was always on the CRS317 ad CRS326 attched to the 1072, this window problem impact the routed traffic on them. We made the crossing VLAN for tests.
We will try to mount the link on the CCR1072 directly and see if routed traffic is subject to the same performance problem or if it disappears.
WanOs has only gigabit solutions, seems to be too small…
We have connected the fiber link directly on the 1072 and directly a server on each side, same result around 619 mbps.