I’ve been trying to get a full speed between 2 links using Wireguard tunnel. Ping between them is 5-7ms. Two different operators, but having link in local IX so path is very short.
Both routers RB5009. One site which has server on board (let’s call it A) is full symmetric 1Gbit/s, second one has 2Gbit/s download is a client (call it B).
A : got static IP and 1500MTU (1000/1000) key server is located there
B : dynamic external IP + PPPoE and 1492MTU (2000/600)
Problem is that I can’t really get anything over 300-350Mbit/s from A to B. Samba, FTP everything has got the same speed. I was trying everything.
-Changing MTUs
-Clamping to PMTU
-Fastrack removal
Nothing. Fixed 30-35MB/s some spikes to 42MB/s (
During this 250-350Mbit/s transfer both routers showing 35% CPU load, but if I upload something from B to A I can handle ~500Mbit/s upload getting speeds 50MB/s.
Do you have any clues how to squeeze from this setup a bit more?
I guess you’re hitting the CPU ceiling here. While running tests, run CPU profiler, likely one of CPU cores will be at 100%. And I can imagine that wireguard handling might be tied to single CPU core for a few good reasons.
AND the Wireguard AND the PPPoE overhead probably explains why your “only” get 300-350Mbits/sec
CPU-profiler will give you insight.
If you have a “spare” RB5009 you could perform a back2back test with a piece of ethernet-wire in between to see what the max is you can reach.
You’ll save some CPU-cycles on the PPPoE for sure.
So even if I change to CCR2116 (for example) I won’t get any better speeds here? RB5009 is just doing 40% CPU utilization (all cores equally) and in most cases it needs 700Mhz clock not full 1400Mhz one. That’s not RB5009 limitation for sure.
RB5009 to RB5009 should be way higher then that, I would think ?
From AX Lite to RB5009 using 1Gb ethernet I can reach close to 400-405 Mbps UDP.
TCP was around 211Mbps.
For TCP CPU was hitting 100% on AX Lite so that was a hard limit (RB was still doing “nothing” there ).
For UDP I had to top it off around 400Mbps because of way too much lost packages. CPU was around 20% however.
@OP:
can you provide small drawing of your test setup (paper is ok, just scan it).
Also, how do you test ? Add those devices to the drawing.
And then config on both devices.
PS if you can get them together in the same lab-environment, try to use 2.5Gb port to connect both.
Just to see how far you can get.
You’ve encountered the ultimate traffic generator opportunity.
Do not run the traffic generator on the device under test, the measurements will be erroneous.
You can estimate the maximum traffic generation capability of the device by running a TCP test to 127.0.0.1
I just tested FTP connection without WireGuard, and I managed to get full 1Gbit/s.
There’s no any queues, and config is really basic, with just UDP ports opened for WireGuard server. There’s seriously nothing extraordinary in config. Totally default, zero queues.
Well the speeds all look pretty good, from one wireguard endpoint to the other endpoint, via wireguard.
(This is what the graphs are isn’t it?)
So perhaps it is the Lan interfaces that are causing the issues.
My thoughts are perhaps MTU and MSS.
One end wg MTU will likely (should) be at 1420, while the other should be 1412?
I am not sure if you need to do MSS clamping on wireguard, but it might be worth trying.
1420-40=1380
1412-40=1372
What kinds of CPE’s you have in both sites? I’m guessing from the speeds that at least the 2000/600 side is GPON
some very cheap modules (such as the realtek-based SFP ones) have 32mb of ON-SOC-RAM that is split between the OS and Ethernet/PON buffers
depending on the load characteristics (such as packet interspacing due to 10/40/100GB uplink ports on the ISP side), those cheap CPE’s can get buffer-starved, and start dropping packets;
Another thought,
It is very likely the bandwidth test was running with multiple (20?) connections.
This probably helps by allowing more cores onto the task, and also reduces the impact of
latency, and the different connections can overall fill the link.
This possibly means that with a few ftp transfers running at the same time you could get an overall substantially higher throughput.
Not quite sure how to work around this for a single connection, somehow reduce the latency maybe.
Apparently ftp has a segmented download option that allows you to download multiple segments in parallel.
(Needs support at both ends)
You are right. I launched 3 FTP sessions on WireGuard and managed to reach 80MB/s. But when I launched 3xSMB transfers they were all total 35MB/s so they were split 10-12MB/s each.
Both are GPONs and each one has ONT + LAN to 2.5GbE port of Mikrotik RB5009.
I was trying with those Clamp-to-PMTU too with no difference at all, but will test those settings you provided.
EDIT1 : Tested with clamp-to-PMTU and those 1380, 1381-65535 & 1372, 1373-65535 → zero difference.
If you haven’t tried it, try copying using robocopy with /MT flag, (is this different to what you have already tried with SMB?)
If you are running large frame sizes 4k+ in both networks, perhaps you could make the MTU on wireguard this large size + 80 or so for the wireguard overhead, with an appropriate MSS. They will get fragmented over the wan link, but this processing might be fairly efficient with big frame sizes.
I don’t like this option much, but anyway:
Perhaps some sort of IPSEC (IKEv2) link instead of wireguard. Its encryption will be done in hardware. And its processing is mostly inline rather than being an endpoint and sort of going through each router twice.