VPN performance

For recent version of Windows, to enable BBR2, run these with UAC elevation:

netsh int tcp set supplemental Template=Internet CongestionProvider=bbr2
netsh int tcp set supplemental Template=InternetCustom CongestionProvider=bbr2
netsh int ipv4 set global LoopbackLargeMTU=disabled

The 3rd command is needed to avoid problem with some programs and services that connect to loopback, among them are Steam and the Hyper-V Console.

Verify the active congestion control provider with:

netsh int tcp show supplemental

In my setup with RB5009 + GPON SFP stick, this improves the upload throughput significantly when latency is high (though still not as good as Linux with BBR enabled).

To undo, run the commands above with cubic instead of bbr2, and enabled instead of disabled.

Regarding possible WireGuard HW acceleration, newer ARMv8.6-A specs bring the "FEAT_CHACHA20" option for hardware acceleration of ChaCha20, but you need a Linux kernel v6.2 or newer and of course chip support.

that is good idea

WireGuard uses UDP so that unfortunately won’t work since BBR is specific to TCP-endpoints.

I spent some more time configuring IPSec and Wireguard and did some simple bandwidth checks with the builtin tool.

Wireguard indeed is a very fast solution on MikroTik routers! My setup is the following:

CCR2216 <---> RB5009 (Internet) <--- PPPoE + IPSec with GRE tunnel/Wireguard ---> RB5009

All physical connections were 10Gbit links.

Wireguard reaches about 850MBit/sec in this setup. IPSec reaches only about 450MBit/sec. I expected the performance impact of the GRE tunnel in the case of my IPSec setup to be more noticeable but you would need to do specific measurements to even notice it. I also expected IPSec to be at least 50% faster, especially as it is listed with up to 1400MBit/sec on the MikroTik site.

IPSec seems to scale better on the CCR2216 because load is about 50% lower than with IPSec.

I didn't manage to push Wireguard over a total of 950 MBit in my setup, even with multiple routers. But that might be related to bottlenecks caused by PPPoE. I did not investigate this any further.

MikroTik seem to use 5.6.3 so I guess that's quite some time away.

Throughput is likely bottlenecked by a misconfiguration if you're capping at 450 Mbps.

Good to know. What can I do to find that bottleneck?

I believe this could have been a misconfigured MTU.

I looked into this some more and this what I got: it's about 720MBit/sec with UDP and about 600MBit/sec over TCP.

Throughput is a little higher if traffic is not routed over a GRE/EoIP tunnel but that is expected due to the lower MSS. PPPoE is also only a small difference but has noticeable CPU penalty on the RB5009.

The left part is UDP traffic, the right part is TCP traffic.

As this was never meant to be a benchmarking exercise, I did not look further into impact of firewall rules, generating traffic with iperf3 instead of the bandwidth tool. I did not max out the CPU, thus I believe that the limiting factor should not be traffic generation.

As a golden rule of thumb for reliable throughput test results: always use external load-generating tools (on both sides!). Make sure the remote device has equal or greater capacity than the device under test. An RB5009 should be able to push well over 1 Gbps with minimal CPU impact.

I went back to the setup to try with iperf.

Internet over DHCP:

[ ID] Interval           Transfer     Bitrate
[  5]   0.00-60.00  sec  3.60 GBytes   515 Mbits/sec                  sender
[  5]   0.00-60.01  sec  3.60 GBytes   515 Mbits/sec                  receiver

Internet over PPPoE:

[ ID] Interval           Transfer     Bitrate
[  5]   0.00-60.00  sec  3.79 GBytes   542 Mbits/sec                  sender
[  5]   0.00-60.01  sec  3.79 GBytes   542 Mbits/sec                  receiver

A direct connection to the target server maxes out the 1GBit network port on my laptop.

@Larsa any ideas what I should look into, as this is far below what you suggested what should be achievable?