TL;DR - WireGuard encryption performance, general info about hardware acceleration and WireGuard throughput testing..
The WireGuard encryption
The ChaCha20 cipher used in WireGuard currently lacks hardware acceleration on most platforms and architectures, which means it runs entirely in software, including on ROS v7. Even if ChaCha20 is efficient, it’s still a CPU-heavy algorithm. When the CPU on either end of a WireGuard tunnel hits its limit, that becomes the bottleneck for throughput. This applies even to high-end devices like the CCR2216, which can only handle a limited number of WireGuard tunnels before the CPU maxes out, unlike IPsec with AES hardware offload, where the same device can handle hundreds of tunnels with virtually no CPU load at all.
Future hardware acceleration for ChaCha20
Some ARM chip manufacturers, like Ampere and Qualcomm, have implemented dedicated ChaCha20 instructions in certain models as part of the ARMv8.6-A architecture to enable hardware acceleration. To benefit from this feature, you need Linux kernel 6.2 or newer and ARM hardware with FEAT_CHACHA20 support. But since ROS v7 uses Linux kernel 5.6.3, it can’t support FEAT_CHACHA20. This means it can’t take advantage of ChaCha20 acceleration even if the hardware supports it. Important note: both ends of the tunnel must support it in order for the acceleration to be fully utilized.
IPsec AES hardware acceleration
IPsec uses AES encryption, which can be hardware accelerated. Most Intel CPUs and many ARM chips used in networking gear do support AES acceleration, but it’s often missing in cheaper devices.
Important note: both ends of the tunnel must use the AES hardware offload in order for the acceleration to be fully utilized. If AES hardware acceleration isn’t an option, WireGuard is often the better choice, since ChaCha20 in software is lighter on the CPU and typically performs better, especially on low-end devices.
Testing WireGuard throughput
If you want meaningful test results that clearly show a performance change in WireGuard, for example after RouterOS updates, all tests must be performed in exactly the same way. That means the exact same configuration, the exact same test environment, the exact same method, including things like single or multiple streams, CPU usage etc. Otherwise, any difference may not be related to ChaCha20 at all.