SSTP MikroTik-to-MikroTik performance much lower than Windows-to-MikroTik

I have four MikroTik routers, all running RouterOS 7.23; three of them each individually maintain a pegged-up SSTP VPN to the fourth. Windows clients may sit behind any of the four, and reach devices across the three pegged-up VPNs. Windows clients may also connect directly to the SSTP VPN server on any of the four MikroTik routers.

[ at location 4] Windows iperf client -> WiFi5 -> MikroTik h AP ax S (so, more RAM, bigger faster CPU cores) -> pegged-up SSTP VPN client tunnel -> Gb Ethernet -> Internet router -> Internet -> [at location 1] -> Internet router -> Gb Ethernet -> MikroTik E50UG (so, also, not very small) SSTP VPN server tunnel -> Gb Ethernet -> Windows iperf server:

Performance, max about 3Mb/s

Same thing, except now run the Windows SSTP VPN client on the Windows computer at location 4, connecting to the same SSTP VPN server on the location 1 MikroTik router -- everything else is the same:

Performance, max about 10Mb/s

(The location 4 Internet connection is a 100Mb/s down, 40Mb/s up, DSL connection, which reliably delivers those speeds. The location 1 Internet connection is a 1Gb symmetric fibre connection).

I've read that SSTP is a very inefficient VPN, but it doesn't seem to be as simple as that.

I had read about SSTP and MTU, and I noticed that the Windows SSTP VPN client MTU is 1400, and the MikroTik-to-MikroTik MTU was 1500, so I reduced the MikroTik-to-MikroTik MTU to 1400 - it did not make any difference. (Thinking about it, that may have increased the need to fragment packets coming from the Windows client before they could go through the MikroTik-to-MikroTik SSTP VPN). I restored the MikroTik-to-MikroTik SSTP connection to its default 1500 MTU.

I used PING -F -L {packetsize} on the Windows client to determine the lowest MTU across; 1460 did not require fragmentation. I set the Windows Wi-Fi interface MTU to 1400.

Speed did not increase.

So, it doesn't seem to be fragmentation, either.

I'm trying to think of what's different, between letting the location 4 MikroTik route across the pegged-up VPN, vs the Windows client connecting directly to the location 1 MikroTik SSTP VPN server:

On the client side MikroTik, withOUT connecting the Windows SSTP VPN client directly to the server-end MikroTik SSTP VPN, packets go through an additional routing step, and an additional NAT step. But on an h AP ax S, the CPU should have no trouble at all doing that at 10Mb/s. So I don't suppose that's what's causing it.

I'm left scratching my head. What would be causing this much lower SSTP VPN performance when going over the pegged-up MikroTik-to-MikroTik SSTP VPN vs the Windows client to remote MikroTik server SSTP VPN?

thanks.

Wow. I started setting up WireGuard VPNs as possible replacements for the SSTP VPNs, and initial performance tests demonstrate that SSTP is really, really low performance. I know, I know, that's commonly known, and often repeated. I'm just .. wow. The degree to which SSTP is slow blows my mind.

PC -> MikroTik -> SSTP -> MikroTik -> PC, iperf, ~3Mb to 10Mb/s

PC -> MikroTik -> WireGuard -> MikroTik -> PC, iperf, >95% of the end-to-end path maximum speed...

So, yeah, the answer is, just don't use SSTP.

Okay, but, still: WireGuard Windows client connecting to WireGuard MikroTik server, bandwidth is approximately equal to the network link (in this case, 100Mb/s /40Mb/s ADSL, iperf between Windows hosts on either side of the VPN gets 90+Mb/s in the one direction, over 30Mb/s in the other).

However, MikroTik Wireguard to MikroTik Wireguard (over the same connection), the same iperf pair gets still over 90Mb/s in the one direction, but only about 10Mb/s in the other direction.

Both MikroTiks are newer models, with dual-core 950MHz CPUs; monitoring resource utilization on the two MikroTiks during the iperf tests shows CPU utilization never above 25%. I also checked - no NAT occurs in either test path, nor mangling (which in any case, if it was the cause for the slower performance, should have been reflected in CPU utilization).

It's the same physical network path, all of which is far higher bandwidth than the ADSL Internet connection at one end which is the limit for this test, so it shouldn't be that Windows to MikroTik1, through WireGuard to MikroTik2 travels a slower physical path than Windows to MikroTik2 through WireGuard (still via MikroTik1).

The two Internet routers (one at the 100/40 ADSL end, and the other at the 1Gb/1Gb fibre end) should not see any difference between the two test paths, as from the Internet routers' perspectives, both test paths are "lots of UDP packets on the same ports".

I thought it might have to do with packet fragmentation, because the WireGuard interfaces have 1420 MTUs, whereas the Wi-Fi interface by which the Windows client at the ADSL end of the connection talks with its MikroTik (for the MikroTik-to-MikroTik WireGuard VPN) has an MTU of 1500, but setting that Wi-Fi interface to MTU 1300 made no performance difference.

I note that iperf uses TCP by default; choosing UDP drops the bandwidth to about 1Mb/s, including setting MTU to 1300 for the test.

Here are sample outputs, first Windows -> Wi-Fi -> MikroTik (into WireGuard tunnel) -> ADSL -> Internet -> fibre -> MikroTik (out of WireGuard tunnel) -> Windows, second Windows (into WireGuard tunnel) -> Wi-Fi -> MikroTik -> ADSL -> Internet -> fibre -> MikroTik (out of WireGuard tunnel) -> Windows

c:>iperf -c 192.168.255.37 -e -r

Server listening on TCP port 5001 with pid 175
Read buffer size: 1.44KByte
TCP window size: 64.0 KByte (default)

Client connecting to 192.168.255.37, TCP port 5001 with pid 175
Write buffer size: 128 KByte
TCP window size: 64.0 KByte (default)

[ 4] local 192.168.251.139 port 53848 connected with 192.168.255.37 port 5001 (ct=52.01 ms)
[ ID] Interval Transfer Bandwidth Write/Err
[ 4] 0.00-10.02 sec 11.0 MBytes 9.20 Mbits/sec 88/0
[ 4] local 192.168.251.139 port 5001 connected with 192.168.255.37 port 62728
[ 4] 0.00-10.12 sec 116 MBytes 96.4 Mbits/sec 87915 182:336:699:1518:2692:3636:848:78004

c:>iperf -c 192.168.255.37 -e -r

Server listening on TCP port 5001 with pid 1011
Read buffer size: 1.44KByte
TCP window size: 64.0 KByte (default)

Client connecting to 192.168.255.37, TCP port 5001 with pid 1011
Write buffer size: 128 KByte
TCP window size: 64.0 KByte (default)

[ 4] local 192.168.255.151 port 54751 connected with 192.168.255.37 port 5001 (ct=53.68 ms)
[ ID] Interval Transfer Bandwidth Write/Err
[ 4] 0.00-10.00 sec 38.4 MBytes 32.2 Mbits/sec 307/0
[ 4] local 192.168.255.151 port 5001 connected with 192.168.255.37 port 62734
[ 4] 0.00-10.12 sec 114 MBytes 94.8 Mbits/sec 86283 249:314:522:1091:2409:4146:1211:76341

Ideas on why having a MikroTik being both ends of the WireGuard tunnel limits performance in one direction by a factor of 3+X ?

thanks,

Make sure that when you do site-to-site WG tunneling using two routers between your two Windows clients, that on both router you have also put mangle rules to clamp the TCP MSS values of the connections transiting through the tunnel.

On both routers, add these mangle rules. Assuming that the MTU of the WireGuard interface is set to the standard value of 1420 and that the WireGuard interface is named wg1:

/ip firewall mangle
add action=change-mss chain=forward comment="reduce MSS for WG" \
    new-mss=1380 out-interface=wg1 protocol=tcp tcp-flags=syn tcp-mss=1381-65535
add action=change-mss chain=forward comment="reduce MSS for WG" \
    new-mss=1380 in-interface=wg1 protocol=tcp tcp-flags=syn tcp-mss=1381-65535

/ipv6 firewall mangle
add action=change-mss chain=forward comment="reduce MSS for WG" \
    new-mss=1360 out-interface=wg1 protocol=tcp tcp-flags=syn tcp-mss=1361-65535
add action=change-mss chain=forward comment="reduce MSS for WG" \
    new-mss=1360 in-interface=wg1 protocol=tcp tcp-flags=syn tcp-mss=1361-65535

If the MTU of the WireGuard interface is not 1420, then adjust the 1380, 1381, 1360, 1361 values accordingly. 1380 is 1420 - 60 and 1360 is 1420 - 80.

In the setup, both Windows clients don't know about the reduced MTU in the middle and will try to send packets with size up to the normal MTU of 1500. That will cause fragmentation, the tunnel will have to split the TCP packets into multiple UDP packets.

It doesn't happen when you run WG directly on Windows, because in that case the iperf3 client on that Windows installation knows about the 1420 MTU limitation and will not send 1500-byte packets.

Adding the mangle rules did not change the iperf TCP results. (I also had tried setting the Windows client's MTU to 1300, so that anything it sent toward the MikroTik before going into the MikroTik-to-MikroTik WireGuard tunnel would already be smaller than the WireGuard tunnel MTU, but that also made no difference).

[ I also tried iperf over UDP, explicitly setting the iperf packet size to 1240, but at any packet size, and with either the Windows WireGuard endpoint or Windows - LAN - MikroTik - WireGuard endpoint, iperf over UDP returns only about 1Mb/s in either direction, so there's something else different about iperf with UDP. ]

So, it doesn't seem to be MTU / fragmentation.
Is there a way to see whether (lots of) packets are being fragmented by RouterOS? I did a quick search and didn't find it.

thanks,

Is there Wi-Fi in-between any of these tests, since that can effect all results since there are occasional latency differences if line is noisy.

Also, IDK since you seem to keep futzing with MTU - which make sense - however, unless you actually "do the math" on the entire path, it hard to know what's right since that's requires ICMP tests to confirm. But there is a "right answer" and you generally want to MTU as lowest of the physical limits of all path involved. So some arbitrary number just going to introduce side-effect that then all effect the math involved in the tests.

Keep in mind there is PMTUD that should figure out lowest MTU, but this is based on ICMP/"ping" being allowed on all paths. So if firewall is blocking ping, that's a problem. And, it can be cache... so if you make and change and test... the calculated MTU may take some time (or link down) to recompute.

Just note that "benefit" of SSTP is that it requires no hole punching and being HTTPS-based it should pass though even more restricted firewalls since it appears like web traffic. And, it does have a native client on Windows, so no install of WG is requires, which is another benifit. (So if you wanted to avoid a "VPN client install" on Windows, your only other choices is IPSec - which Windows also support natively and likely on-par with WG on speed.).

But its true SSTP translation comes at a cost in speed. So if you control the firewalls, often it's not needed. But for critical remote access over something like LTE or CGNAT, SSTP still has a use.

There is a short Wi-Fi hop - the Windows test client at that end is sitting 3m from the WiFi 5GHz 802.11ax access point (which is built-in to the MikroTik router). speedtest.net consistently shows that the ADSL connection (100Mb/s / 40Mb/s) upstream of this MikroTik is the speed limit on the whole route being used for these tests, and that this ADSL connection consistently delivers its full promised speed; so the much faster, short Wi-Fi hop is not introducing problems.

I just ran a Path MTU detection; it came up with 1392. So my arbitrary - but definitely lower than anything on this route should be, and lower than 1392 - tests should have taken care of any MTU / fragmentation problems, if that was where the performance bottleneck is being introduced.

As for the remaining reason to sometimes use SSTP, yes; but in almost all of my use cases, it's not necessary, and the performance penalty is severe.

thanks,

  1. With everything set to 1500 and 1420 on WG? Any existing MTU settings lower than 1500 will affect the results.

  2. And sure ping works on all paths?

Otherwise IDK. And it can be easy to make things worse than just leaving the default MTUs alone without careful testing.

iperf client specific setting:
-b, --bandwidth n[KM] Set target bandwidth to n bits/sec (default 1 Mbit/sec for UDP, unlimited for TCP).

I realized that I had only run the Path MTU detection over the WireGuard tunnels themselves. Running Path MTU detection over the Internet produces the higher 1464. So, the lower MTU that I had earlier discovered, and used in my tests, should have avoided fragmentation. (Assuming that my Windows client/ iperf on my Windows client respected the lower MTU that I temporarily set on the Wi-Fi adapter for the tests).

In UDP mode, using iperf's -b 30M flag, even over the MikroTik to MikroTik WireGuard tunnel, it gets the full requested 30Mb/s, so apparently there's something about iperf's TCP test and the Wi-Fi first hop which results in the lower speed. (Which, per my last sentence in the paragraph above, suggests that iperf did not know/respect the lower temporary MTU on the Windows client's Wi-Fi adapter...)
It's still worth debugging since many connections which ride over this path will use TCP.

Back to MTU: running iperf (in default TCP mode) with -l 1200, which should limit the packets to well below any MTU on the path, the results now show that it can absorb the full bandwidth of the ADSL link.

So, it is an MTU issue. grr

Since I have added @CGGXANNX 's suggested mangle rules to set MSS for all new TCP connections (on the MikroTiks at both ends of the MikroTik to MikroTik WireGuard VPN) and it was still necessary to explicitly tell iperf to use a smaller packet size, is this Just The Way It Is(TM), or, is there still something else that I could do to get a Windows client one hop away from this end of the MikroTik to MikroTik WireGuard tunnel to understand the path MTU limit imposed by the WireGuard tunnel, which is smaller than the Windows machine's local e.g. Wi-Fi interface MTU? (I found discussion of a Windows Registry setting to enable Path MTU discovery by default - HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters DWORD value EnablePMTUDiscovery, but I think that's on by default, and anyway after adding that parameter and setting it to 1, and rebooting, the iperf tests continue to produce the same results).

Hm, if the MTU that your internet connection can achieve is only 1464, then you should reduce the MTU of the WireGuard interface down to 1404 (from the default 1420) if you use IPv4, or 1384 if your WG use IPv6 (the default 1420 that WG chooses is to accommodate an IPv6 transport over a link with MTU 1500, because WG over IPv6 has 80 bytes overhead, and 60 bytes for IPv4).

If the WG interface MTU is 1404, then in the MSS mangle rules, reduce those 1380/1381/1360/1361 values down to 1364/1365/1344/1345 accordingly.

I made the mangle and WireGuard interfaces MTU/MSS updates. iperf still returns the same result.

I then also reduced the Windows client's Wi-Fi MTU to 1200. iperf still returns the same result.

That is, iperf -c remote.host.ip.address gets around 10Mb/s, whereas iperf ... -l 1000 gets over 30Mb/s.

Is the problem just that iperf defaults to a buffer size that is larger than the real MTU of the path? But, then, I suppose so would most programs, so I'd still have this problem unless I could tell ever program to choose a lower MTU? I thought that TCP clients were supposed to adapt to their closest IP interface MTU, at worst, even if they didn't try to get Path MTU?

Well, iperf has (or at least used to have) all sorts of warnings about its use on Windows, specifically related to these sorts of things.

So I wouldn't rule out the possibility that normal programs would work normally.

Fair point. I tested with netcat (nc), and it looks like that pushes around 20Mb/s up (iperf maxed at 30Mb/s; the link's limit is 40Mb/s; so that's not too bad).

My main use case for which maximizing performance matters if large file transfer (from Windows to Windows).

COPY, obviously, sucks.

XCOPY and ROBOCOPY max out at around 10Mb/s.

netcat, which is a lot less convenient, got me around 20Mb/s.

The winner, for the moment, remains what is has long been: don't use the VPN; just 7-zip encrypt and write the output onto OneDrive, let OneDrive sync the result; and decrypt and extract it.

But it would be nice to have a Windows file transfer option that would benefit from the WireGuard VPN and avoid the need to perform intermediate steps like copying to cloud storage and back from cloud storage, or dd if=file | nc ...., etc.

Obviously, this is now getting far away from MikroTik; any suggestion for Windows programs that are native Windows authentication/ networking aware which behave well over WireGuard links?