So to take things out of the 7.19rc thread and keep the discussion going, here are some facts that I can back up with evidence.
Hardware: CCR2216
- Wireguard on version 7.18.2 results in stable bi-directional bandwidth of roughly 1.5Gb/s for a single threaded iperf3 test from a client on the local LAN through the router WG tunnel to another client on a different VLAN.
- Same setup in point 1, on version 7.19rc, results in asymmetric bandwidth. Flows from the WG client to the client on the local lan work normally, roughly 1.5Gb/s, but in the reverse direction the bandwidth is wildly variable between 400Mb/s to 800Mb/s, never hitting 1.5Gb/s as it was before.
Other Notes:
- Max performance I can extract is roughly 2Gb/s with multiple threads (roughly 3-4), going higher than this doesn’t result in more performance on a single WG interface.
- Unique WG interfaces run on separate threads (this is the same on any arch) but allows you to do ECMP for more bandwidth. The highest I was able to get was a stable 3.1GB/s across 3 WG tunnels, more than that and there isn’t enough CPU.
- If you are running 10-15Mb/s traffic you will likely not ever see the slow downs or issues, although currently I am seeing another symptom of a problem where the Rx Error counter for the WG interface just keeps incrementing even at very low bandwidth loads (800Kbps) as well as at high throughput.
Feel free to post your own results on different setups. I’m going to try a container with wireguard in it to see if this is a “RouterOS implementation” issue.