Hello everyone!
I’m facing an issue with TCP connections bandwidth between the two CHR.
I have 2 CHR with RouterOS 7.1beta6: one in Russia and one in Kazakhstan, both have a gigabit Internet connection.
Both routers connected with a WireGuard VPN connection. To measure speed I use the bandwidth test tool in RouterOS.
If I use the UDP test - the results are pretty awesome: ~900Mbit for receive and send.
But if I use TCP test:
RU->KZ ~900Mbit
RU<-KZ ~30Mbit
I’ve done a lot of tests, but cannot figure out where is my issue.
ISP is ok, L2 has no problems, without Miktorik CHR I can get the full speed with no issues.
One of the side effects - I can get up to 140Mbits with TCP test if I use 60+ connections.
Could someone please give me any ideas on how to find the source of my problem?
Thank you.
HI there,
Can you let me know how to test this properly.
I have the same setup here in Canada but still on beta5, between two BELL fiber connections about 5 miles apart.
I use ookla speed tests from client PC to server ISP connection via browser and am getting consistently around 75Mbps up and 75Mbps down throughput.
Clearly not seeing the typical OOKLA speed tests from our PC out the standard ISP of about 900 up and 900 down which is also the case for the other partys standard speed tests (without wireguard and using own ISP connection).
Here I was thinking 70Mbps up or down was pretty good for a VPN connection??
Thus how for me to test to see if I can replicate your numbers?
Key point: Known Limitations
WireGuard is a protocol that, like all protocols, makes necessary trade-offs. This page summarizes known limitations due to these trade-offs.
TCP Mode
WireGuard explicitly does not support tunneling over TCP, due to the classically terrible network performance of tunneling TCP-over-TCP. Rather, transforming WireGuard’s UDP packets into TCP is the job of an upper layer of obfuscation (see previous point), and can be accomplished by projects like udptunnel and udp2raw.
Just guessing… what happens if you swap the roles of the routers in the bandwidth test, is it always the server->client direction (or always the client->server one) that is slow, or it is always the KZ->RU one?
The manual says you should not run the bandwidth test on the router whose throughput you are testing, so what does /tool profile say while you’re running the TCP bandwidth test?
Better results with more connections suggest some round-trip delay issue, but if it was the case, both directions should be affected, and they are not.
I tried from another router as well. Same problem, same direction issue.
My guess, that if UDP can fully saturate my wan link, TCP should be able to do almost the same.
My WAN link is a 1 Gigabite connection from both sides, certified data centers, ~60ms delay between.
Both CHR has 2 cores, 512Mb RAM, high priority pool on ESXi and seems this is not an issue - at least one direction is working well.
All configurations are pretty default, MTU is the same, no Fasttrack.
There are Tx Drops and Tx Errors on WG interfaces, but Tx for both routers - not Tx and Rx.
Still open question - why only one way is laggy? Some sort of misconfiguration maybe.
P.S. Submitted support ticket.
I got some updates.
Today in our DC was a massive blackout and our hypervisor was rebooted.
I still have the same problem, but now in opposite direction!
RU->KZ ~6Mbit
RU<-KZ ~860Mbit
Just a general observation: TCP doesn’t work great with large delays out-of-the-box. That’s where TCP windows comes into play and with 40ms delay, TCP window size should be larger than around 5 MB to be able to reach 1Gbps throughput. Default window size of 64kB is only enough for around 13Mbps with that kind of RTT. Indeed most TCP stacks can scale window size (until it reaches max window size or until a retransmission occurs when window size shrinks back to some small value, could be that it shrinks to default value), but that only allows higher speeds in medium to long term. Lossy link affects TCP window scaling very much and if you experience any kind of packet loss, you won’t be able to reach full wire speed using WG over TCP. Beware that TCP, being essentially bi-directional (at least ACKs travel in the opposite direction), saturating link in any direction means increase in RTT and will thus throttle throughput down (which is intended behaviour) …
With WG over TCP window size on clients is only of secondary importance, the bottleneck can be window size on routers terminating WG connection. I’m not sure how this can be changed for WG on ROS …
I use standard ROS Wireguard with UDP, 1400 MTU.
There is a general consideration, that if I test two different L4 protocols UDP and TCP inside of the same tunnel - the results have to be almost the same (except for TCP overhead).
Also, I see a lot of Tx Drops on my WG interface (3 493 517) for the last 24h.
The TCP window settings on both client and server still apply aven if WG is run over UDP (which explains why UDP tests can saturate link). Not sure how that’s affected by WG properties. But then Tx drops indicate that WG link is not perfect and that will definitely affect performance of TCP connections through WG tunnel.
What does happen if you limit throughput to something little less than link speed on Tx side of testing suite?