Wireguard Bandwidth Between Two Router OS Instances

Recently, I've been looking at what speeds can be obtained between two RouterOS instances over a Wireguard connection. The genesis of this was because I had set up a Cloud Hosted Router (CHR) instance to use as my own VPN end point. My VDS is on dedicated hardware and sitting on a 10 Gbit network link. My local router (5009) is on a 2.5 Gbps internet connection.

I was only getting speeds of maybe 300 to 400 Mbps at best between my 5009 and the CHR. I assumed it was an issue with my provider's network connection and I was really not sitting on a 10 Gbit link.

I had a lot of back and forth with a couple others here on the forums about this on another thread.

I got the idea on my head to set up my own test locally - set up my own CHR instance on my LAN where I could test LAN speed between the two instances as well as WIreguard speed. The results are pretty shocking.

Hardware:

  • 5009 with a 10 Gbit connection to my LAN on the SFP port
  • Router OS running virtualized on a QNAP TS-873A with 10 Gbit Fiber Connection

So 10 Gbps between the 5009 and the virtual router instance on the QNAP. Last tests shown below utilized 6 CPU cores and 4 Gig memory on the virtual router. The number of cores really didn't make much difference beyond one core.

Results:

Testing over a "LAN" connection (no Wireguard) showed speeds of roughly 3 Gbps:

The 5009 IP address is 192.168.1.1. The "WAN" interface of the virtual router was 192.168.1.242. I also had another ethernet interface added on the LAN side, but speeds were unchanged. So there's not a speed drop going through the router interface.

Now for the Wireguard connection:

A significant difference! 192.168.2.1 is the WG endpoint on the virtual router. No matter what I tried, I can only get 300 to 400 Mbps at best out of a WG connection between two Mikrotik router instances. This really dovetails with what I am seeing on my remote CHR. Very similar performance.

I don't think there's anything in the firewall or router settings that would alter this.

There is a bug in Winbox though. In the images shown, the TX and RX colors are reversed.

Wondering if anyone else has seen similar results. Is there a different VPN protocol that is faster between two Mikrotik instances?

Check profile tool on RB5009 while performing your test.
I suspect 1 core will be at 100% and then you're done.

Wireguard is single core, as far as I know.
But it remains the faster option in most cases ( unless you can use HW offloaded IPSEC on a beast device at both ends).

If possible, try test from CHR to another CHR.
What happens then ?

Per the wireguard website:

“WireGuard and IPsec have both gotten faster, with WireGuard stil edging out IPsec in some cases due to its multi-threading”

So I don’t think that’s an issue unless Mikrotik’s implementation isn’t multi-threaded.

I knew I saw it before.
You should be able to get a lot more for that RB5009.
Someone reports 1.3Gb there.

See here.

Maybe post config for inspection ?

As far as I know ROS implementation is single core.
Unless that changed and I missed it ?

So I've tested between my CHR and home over the internet and on my LAN between my 5009 and a CHR running virtualized on my QNAP.

Here you go. Doesn't seem like WG is single core...

5009:

Virtual Router:

Here you go. This is the current setup. Please everyone, no comments, questions on the VLANs and why certain things are set up on the 5009 the way they are. It's for a specific purpose....

5009 config:
5009-1-6-25.txt (23.9 KB)

CHR config:
CHR-1-6-25-2.txt (4.3 KB)

The CHR config is the one running right now on the VDS but the local one on my QNAP is basically the same thing.

I should state that when using my ProtonVPN connections, I get over a gigabit speed on those WG connections. So a 5009 is capable of much higher performance, it's just the link between two MIkrotik routers seems to be the issue.

Here's my take. Wireguard in multi-threaded from the start in the in-kernel implementation and it is so on Mikrotik as well from the initial release.

There are two possible problems. The first is that although bandwidth test was made multi-threaded some time ago, it still runs in user space. For a normal (forwarded) connection, the packets never leave the kernel. This should be expected to have a huge hit on performance.

The other is the mtu/mss. The value of 1420 and 1380 are for the case where the connection is done over (outer or underlay) is ipv4 with full 1500 mtu. If vlan tags have to be added, that pushes the frame size up many vps environments don't play nice with this.

So... try lower mss values. Also, your change-mss rules are only in the forward chain which means they don't affect input/output traffic, e.g. your bandwidth test, at all.

These together will likely explain the difference with the protonvpn result. Your measurements there are likely "through" the router (in the forward path) and I would be very surprised if protonvpn didn't clamp mss aggressively on their end.

I can also confirm that the rb5009 reliably does somewhere above 1Gbps. Not that much, and the exact number will vary with even small changes in configuration.

There's no VLAN tags going over the VPN.

The mangle rules were added at the suggestion of @CGGXANNX as I had been trying out SurfShark and SurfShark definitely needs those. They make no difference in the speeds between the 5009 and the CHR if they are enabled or not.

I will try reducing the mss values in the mangle rules and report back.

Well, yeah, I see my 5009 doing over a Gb when connected on Proton. But two Mikrotik routers seem to have the issue.

Changing the MTU of the WG connection did nothing. I dropped it to 1000 and things got slower. I raised it to 1500 and speed was back where it was around 1380.

With an MTU around 1350, I see peaks at 600 but nothing sustained.

I see nothing that will double the speed of the connection.

Hi,

I'm like you, very bad performance with WG site2site.
Trying lot and lot of things.... but having bad performance.

But it's when i'm routing through the tunnel with iperf on server routed on each sides.
It's comic but i've good performance when using integrated BW tester on each rb5009.

Would be nice to have a second RB5009 to try as opposed to a virtualized router. But still, it should not make that much of a difference.

What I don't understand is that I get over a gigabit on Wireguard over my ProtonVPN connection. So the 5009 is certainly capable of unwrapping the encrypted packets quickly enough. So perhaps the speed issue is due to the fact that it can't wrap the unencrypted packets fast enough?

But then why isn't the CPU usage higher? If there is spare CPU cycles, why is the router not using those to do the encryption...

Hi, if you're in France, i've a spare rb5009 and be able to lend it.

I've 2 sites, with same fiber connection, 5G in DL and 1G in UL.
WG is configured and running since fews years, it's not new for me ; i'm not setuping this now...

This is my perf with integrated bw tester running at the same time, so about 350Mbit/s "RAW" :

[admmikrotik@router50a] > /tool/bandwidth-test address=192.168.250.70 protocol=tcp direction=transmit 
                status: running  
              duration: 1m25s    
            tx-current: 335.9Mbps
  tx-10-second-average: 371.3Mbps
      tx-total-average: 386.3Mbps
           random-data: no       
             direction: transmit 
      connection-count: 20       
        local-cpu-load: 67%      
       remote-cpu-load: 68%


[admmikrotik@router70a] > /tool/bandwidth-test address=192.168.250.50 protocol=tcp direction=transmit 
                status: running  
              duration: 1m41s    
            tx-current: 255.9Mbps
  tx-10-second-average: 404.2Mbps
      tx-total-average: 430.7Mbps
           random-data: no       
             direction: transmit 
      connection-count: 20       
        local-cpu-load: 61%      
       remote-cpu-load: 71%    

And when using servers and routing through the WG it's asymetric. I've already try to iperf3 directly over internet with PAT and it rocks well. In most case, rb5009 cpu is less 50/60%.


root@pve50a:/home/dginhoux# iperf3 -c 192.168.67.221 -l 1500  -t 3600 
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  2.69 MBytes  22.5 Mbits/sec   21   60.1 KBytes       
[  5]   1.00-2.00   sec  5.58 MBytes  46.8 Mbits/sec    0    108 KBytes       
[  5]   2.00-3.00   sec  8.70 MBytes  73.0 Mbits/sec    0    154 KBytes       
[  5]   3.00-4.00   sec  11.6 MBytes  97.1 Mbits/sec    0    200 KBytes       
[  5]   4.00-5.00   sec  14.2 MBytes   119 Mbits/sec    0    247 KBytes       
[  5]   5.00-6.00   sec  17.8 MBytes   149 Mbits/sec    0    293 KBytes       
[  5]   6.00-7.00   sec  20.0 MBytes   168 Mbits/sec    0    338 KBytes       
[  5]   7.00-8.00   sec  22.7 MBytes   191 Mbits/sec    0    385 KBytes       
[  5]   8.00-8.59   sec  14.5 MBytes   207 Mbits/sec    0    411 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-8.59   sec   118 MBytes   115 Mbits/sec   21            sender
[  5]   0.00-8.59   sec  0.00 Bytes  0.00 bits/sec                  receiver


root@pve50a:/home/dginhoux# iperf3 -c 192.168.67.221 -l 1500  -t 3600 -R
Connecting to host 192.168.67.221, port 5201
Reverse mode, remote host 192.168.67.221 is sending
[  5] local 192.168.47.221 port 34468 connected to 192.168.67.221 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  50.7 MBytes   425 Mbits/sec                  
[  5]   1.00-2.00   sec  52.5 MBytes   440 Mbits/sec                  
[  5]   2.00-3.00   sec  54.7 MBytes   460 Mbits/sec                  
[  5]   3.00-4.00   sec  54.0 MBytes   453 Mbits/sec                  
[  5]   4.00-5.00   sec  59.4 MBytes   498 Mbits/sec                  
[  5]   5.00-6.00   sec  61.8 MBytes   518 Mbits/sec                  
[  5]   6.00-7.00   sec  63.7 MBytes   535 Mbits/sec                  
[  5]   7.00-7.19   sec  12.6 MBytes   556 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-7.19   sec  0.00 Bytes  0.00 bits/sec                  sender
[  5]   0.00-7.19   sec   409 MBytes   478 Mbits/sec                  receiver

You've misunderstood me a bit.

Vlan tags don't go over wireguard, ever. But whether the mtu is constricted inside or outside the tunnel ultimately doesn't matter. Virtual machines (virtual network drivers, vswitches, etc.) are notorious for this. There usually is some solution, you just have to find it. Worse: lot's of cloud hosts use some sort of overlay networking for private networks which makes it worse.

Your change-mss rules DO NOT affect the built-in bandwidth test. For it to work for it, you have to place the same rule in "output" as well. Changing the values on rules that have no effect... has no effect.

When you decrease either mtu or the clamp mss value, you have to do it on both sides.

Again, the built-in bandwidth test is not the best. Part of the reason is that it loads the device unnecessarily. While it probably doesn't halve the performance, it does have a dramatic effect. I also suspect that it has other secrets regarding tcp tuning, so if you use it at all, use udp and tune the target bandwidth manually. (This way, btw. you can also adjust packet sizes easily instead of relying on mtu/mss.)

Just an additional idea: recently there was quite a bit of development on veth and container stuff, some of which led to performance problems on bridges to which the veth interface was attached. (As far as I know, this affected traffic that didn't touch the veth interface at all and was only through the same bridge.) Just to rule out things, I'd do a test while the veth interfaces are detached.

Agreed a VM is not the best solution for testing but it pretty much confirms the same thing I am seeing when trying to test data between my 5009 and the remote CHR.

Yep. And I did that. Really made no difference. Change was made to both sides.

Actually the built in bandwidth test performs better than things like an Ookla speed test. I know this from my efforts with the CHR.

I can do that but I doubt it will make a difference. The veth interfaces were introduced because I wanted to see what sort of performance I was getting from the CHR outward and take the Wireguard link out of the equation. So I installed a container to utilize the Ookla cli speedtest. I set it up on my 5009 first and then on the CHR. I have no container set up on the local virtual router.

Here's results from Speedtest.

First testing from my 5009 out through the CHR, I got:

Speedtest by Ookla

      Server: Kansas Research and Education Network - Wichita, KS (id: 20531)
         ISP: Massivegrid
Idle Latency:    70.27 ms   (jitter: 6.35ms, low: 62.84ms, high: 72.92ms)
    Download:   107.04 Mbps (data used: 140.3 MB)                                                   
                115.66 ms   (jitter: 36.93ms, low: 69.62ms, high: 474.37ms)

Then from the CHR itself I got:

Speedtest by Ookla

      Server: Kansas Research and Education Network - Wichita, KS (id: 20531)
         ISP: Massivegrid
Idle Latency:    36.16 ms   (jitter: 0.15ms, low: 36.07ms, high: 36.47ms)
    Download:  3098.95 Mbps (data used: 3.8 GB)

So big difference.

I don't understand why there is such a huge difference between testing using the built in bandwidth test and what I am seeing when using Ookla. But it's a major difference.

I have not tried these tests over the virtualized instance here.

My theory is that RouterOS has an issue with encoding the packets for uplink. Why do I say this? Well, when connected to ProtonVPN and doing Ookla speed tests, I can get speeds over a gigabit. However, my uplink speed is limited to roughly 350 Mbps. This is inline with the maximum speed that I see over a WB link.

When I am connecting remotely to my home from one of my travel routers or my cell phone, I am either limited by the speed of the connection or by my upload speed since the maximum speed I can send from home is 350 Mbps. If I had a faster uplink speed, I would be able to confirm this but I don't. So the weakness of RouterOS is buried due to my uplink.

Downlink does not seem to really be a problem. But I can't tell if the speed I am seeing on downlink is limited by the VPN server or by RouterOS or both. Probably some of both given what I see here.

I am happy to talk about tweaks and adjustments. @lurker888 you have helped me a ton with my setup over the last year and I really appreciate that. Big shout out to you. I would be more than willing to sit and learn more for a Jedi master so I'm not trying to minimize your suggestions or what you are saying. I just have not found any tweaks so far that result in any sort of noticeable improvement. If I could get from say 300 Mbps to say 800 Mbps - OK that's still not great, but it's a step in the right direction. So far nothing.

I don't understand though why things are being limited as much as they are as the CPU loading of both sides of the connection is not near maximum. I know there's a difference between CPU usage percentage and CPU load (ie: the number of threads in queue) so maybe it's that there's too many threads to process and things are stuck there. But you would then expect that adding CPUs would help. It did not either on the remote CHR (same performance if I am using one dedicated Xeon core or 4) or locally on the VM CHR.

Again, open to looking at anything....

Well, I am near Chicago so that is a little out of your area! :smiley:

But thank you. If you were closer, I would take you up on it!

1 Like

I'm surprised.... after upgrade to 7.20.7 (before 7.20.6)...i've a good bandwith

Are you supposed to run bandwidth-test with udp instead of tcp ? Usually with udp , you probably get close to what the bandwidth your isp provided.

I always use tcp ; most of protocols used and passing through the tunnel are tcp.

# test 1 : at the same time, site50 to site70 and site70 to site50 in tcp transmit
[admmikrotik@router50a] > /tool/bandwidth-test address=192.168.250.70 protocol=tcp direction=transmit 
                status: running  
              duration: 9m32s    
            tx-current: 375.7Mbps
  tx-10-second-average: 355.5Mbps
      tx-total-average: 380.0Mbps
           random-data: no       
             direction: transmit 
      connection-count: 20       
        local-cpu-load: 70%      
       remote-cpu-load: 72%     
[admmikrotik@router70a] > /tool/bandwidth-test address=192.168.250.50 protocol=tcp direction=transmit 
                status: running  
              duration: 9m25s    
            tx-current: 394.8Mbps
  tx-10-second-average: 432.4Mbps
      tx-total-average: 410.5Mbps
           random-data: no       
             direction: transmit 
      connection-count: 20       
        local-cpu-load: 70%      
       remote-cpu-load: 70%      


# test 2 : site50 to site70
root@pve50a:/home/dginhoux# iperf3 -c 192.168.67.221 -l 1500  -t 600
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-600.00 sec  26.5 GBytes   379 Mbits/sec  4191            sender
[  5]   0.00-600.01 sec  26.5 GBytes   379 Mbits/sec                  receiver


# test 3 : site70 to site50
root@pve50a:/home/dginhoux# iperf3 -c 192.168.67.221 -l 1500  -t 600 -R
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-600.01 sec  26.2 GBytes   375 Mbits/sec  9702            sender
[  5]   0.00-600.00 sec  26.2 GBytes   375 Mbits/sec                  receiver