Wireguard tunnel - speed problem

Guys,

I’ve been trying to get a full speed between 2 links using Wireguard tunnel. Ping between them is 5-7ms. Two different operators, but having link in local IX so path is very short.

Both routers RB5009. One site which has server on board (let’s call it A) is full symmetric 1Gbit/s, second one has 2Gbit/s download is a client (call it B).

A : got static IP and 1500MTU (1000/1000) key server is located there
B : dynamic external IP + PPPoE and 1492MTU (2000/600)

Problem is that I can’t really get anything over 300-350Mbit/s from A to B. Samba, FTP everything has got the same speed. I was trying everything.

-Changing MTUs
-Clamping to PMTU
-Fastrack removal

Nothing. Fixed 30-35MB/s some spikes to 42MB/s (

During this 250-350Mbit/s transfer both routers showing 35% CPU load, but if I upload something from B to A I can handle ~500Mbit/s upload getting speeds 50MB/s.

Do you have any clues how to squeeze from this setup a bit more?

I guess you’re hitting the CPU ceiling here. While running tests, run CPU profiler, likely one of CPU cores will be at 100%. And I can imagine that wireguard handling might be tied to single CPU core for a few good reasons.

AND the Wireguard AND the PPPoE overhead probably explains why your “only” get 300-350Mbits/sec
CPU-profiler will give you insight.

If you have a “spare” RB5009 you could perform a back2back test with a piece of ethernet-wire in between to see what the max is you can reach.
You’ll save some CPU-cycles on the PPPoE for sure.

I would say 300-350 is pretty decent wireguard speeds, I would not be complaining.




So even if I change to CCR2116 (for example) I won’t get any better speeds here? RB5009 is just doing 40% CPU utilization (all cores equally) and in most cases it needs 700Mhz clock not full 1400Mhz one. That’s not RB5009 limitation for sure.

RB5009 to RB5009 should be way higher then that, I would think ?
From AX Lite to RB5009 using 1Gb ethernet I can reach close to 400-405 Mbps UDP.
TCP was around 211Mbps.
For TCP CPU was hitting 100% on AX Lite so that was a hard limit (RB was still doing “nothing” there :laughing: ).
For UDP I had to top it off around 400Mbps because of way too much lost packages. CPU was around 20% however.

@OP:
can you provide small drawing of your test setup (paper is ok, just scan it).
Also, how do you test ? Add those devices to the drawing.
And then config on both devices.

PS if you can get them together in the same lab-environment, try to use 2.5Gb port to connect both.
Just to see how far you can get.

Hmm, way better than my results 1gig to 1gig connection but that was using an RB4011 at one end and RGB450Gx4 at the other.

I can’t get more than 350Mbit/s on TCP, but server can receive 450Mbit/s. I don’t get it.

This is speed test from router “A” (1Gbit symmetric MTU=1500 without PPPoE) to “B” (2/600, PPPoE MTU=1492) on WireGuard tunnel.

But let’s check only TX from “A” on TCP and UDP - I can get 800Mbit/s

The following is some performance specs … look at WireGuard …. This is a TPLINK ER8411
https://www.tp-link.com/ca/business-networking/vpn-router/er8411/#specifications

Impressive IMO … YEP I am now considering switching from my CCR1009 to this TPLINK router ….

I have enough of those Mikrotiks unfortunately, but I need 2.5GbE ethernet. Slowly I start to think to get rid of Mikrotiks.

I think about : QNAP QHORA-322 but doesn’t have SFP+, I need 2.5GbE and SFP+.

You’ve encountered the ultimate traffic generator opportunity.
Do not run the traffic generator on the device under test, the measurements will be erroneous.
You can estimate the maximum traffic generation capability of the device by running a TCP test to 127.0.0.1

Hi,
Could you graph the Rx only (To A) as well please.
(Also, just have a look the queues if any at both ends, maybe disable them briefly)

Don’t run bandwidth test on the devices under test.

And we still like to see the config of both devices.
Everything else is speculating and guessing.

I just tested FTP connection without WireGuard, and I managed to get full 1Gbit/s.

There’s no any queues, and config is really basic, with just UDP ports opened for WireGuard server. There’s seriously nothing extraordinary in config. Totally default, zero queues.

From “B” to “A” only RX → almost full 1Gbit/s UDP

And TCP

Well the speeds all look pretty good, from one wireguard endpoint to the other endpoint, via wireguard.
(This is what the graphs are isn’t it?)

So perhaps it is the Lan interfaces that are causing the issues.
My thoughts are perhaps MTU and MSS.
One end wg MTU will likely (should) be at 1420, while the other should be 1412?

I am not sure if you need to do MSS clamping on wireguard, but it might be worth trying.
1420-40=1380
1412-40=1372

perhaps:
/ip firewall mangle
add out-interface=wireguard1 protocol=tcp tcp-flags=syn action=change-mss new-mss=1380 chain=forward tcp-mss=1381-65535

and at the other end.

/ip firewall mangle
add out-interface=wireguard1 protocol=tcp tcp-flags=syn action=change-mss new-mss=1372 chain=forward tcp-mss=1373-65535

(or perhaps new-mss=clamp-to-pmtu, and remove the tcp-mss parameter)

What kinds of CPE’s you have in both sites? I’m guessing from the speeds that at least the 2000/600 side is GPON

some very cheap modules (such as the realtek-based SFP ones) have 32mb of ON-SOC-RAM that is split between the OS and Ethernet/PON buffers

depending on the load characteristics (such as packet interspacing due to 10/40/100GB uplink ports on the ISP side), those cheap CPE’s can get buffer-starved, and start dropping packets;

Another thought,
It is very likely the bandwidth test was running with multiple (20?) connections.

This probably helps by allowing more cores onto the task, and also reduces the impact of
latency, and the different connections can overall fill the link.

This possibly means that with a few ftp transfers running at the same time you could get an overall substantially higher throughput.

Not quite sure how to work around this for a single connection, somehow reduce the latency maybe.
Apparently ftp has a segmented download option that allows you to download multiple segments in parallel.
(Needs support at both ends)

You are right. I launched 3 FTP sessions on WireGuard and managed to reach 80MB/s. But when I launched 3xSMB transfers they were all total 35MB/s so they were split 10-12MB/s each.

Both are GPONs and each one has ONT + LAN to 2.5GbE port of Mikrotik RB5009.

I was trying with those Clamp-to-PMTU too with no difference at all, but will test those settings you provided.

EDIT1 : Tested with clamp-to-PMTU and those 1380, 1381-65535 & 1372, 1373-65535 → zero difference.

Hmm,

A couple of (possibly poorly conceived) thoughts.

If you haven’t tried it, try copying using robocopy with /MT flag, (is this different to what you have already tried with SMB?)

If you are running large frame sizes 4k+ in both networks, perhaps you could make the MTU on wireguard this large size + 80 or so for the wireguard overhead, with an appropriate MSS. They will get fragmented over the wan link, but this processing might be fairly efficient with big frame sizes.

I don’t like this option much, but anyway:
Perhaps some sort of IPSEC (IKEv2) link instead of wireguard. Its encryption will be done in hardware. And its processing is mostly inline rather than being an endpoint and sort of going through each router twice.

Yeah I think about moving back to IPSEC (IKEv2) to test it…

GUYS YOU WON’T BELIEVE!

It’s running full speed. What I have done? I forced both Mikrotiks RB5009 to work on max CPU Frequency = 1400Mhz without “auto” power savings!

And here comes the results on WireGuard tunnel from 30-35MB/s to 60-80MB/s