So... first of all: OP wants to keep 400 tunnels alive and idle, so is not really interested in the maximum throughput that can be achieved. A quick calculation tells us that such a router must support 2-3 handshakes / second and (depending on the keepalive timer) 30 packets of encryption/decryption per second. Even the cheapest Mikrotik can do handshakes in the thousands and packet crypto operations measured in the tens of thousands.
So OP's question is settled.
Mikrotik uses the Linux in-kernel implementation in Wireguard. It's true that a newer version was recently backported, and it does contain some limited performance improvements, they're nice to have, but ultimately not really significant.
As the Linux in-kernel Wireguard module is multithreaded, so is Mikrotik's.
Wireguard uses Chacha20 as its cipher, which is indeed a stream cipher. This has no bearing on parallelism. Each packet is encrypted as a new "stream" with a new nonce, therefore, yes, the encyrption of a single packet is nearly impossible to parallelize, but multiple packets can be handled in parallel just fine. BTW it's the same with AES-CBC, where because of the CBC (cipher block chaining,) each (block) operation can only commence once those preceding it are complete and their results are available.
The standard wireguard module works like this:
- packets are received and some networking operations are done
- in the wireguard module, certain preliminary operation are performed
- the packet is distributed to different cores for the cryptographic operations
- encryption/decryption happens in parallel
- the packets are collected and sequence is restored sequence
- some further networking is needed to emit them
What leads people to believe that it's single threaded is that: even Mikrotik's "most powerful" cpus are in fact objectively weak (you can look it up on cpubenchmark), so the part where the packets are received, distributed, collected, sent, i.e. the single threaded part, can easily block throughput. The other reason is that Mikrotiks contain NICs that distribute received packets based on a packet hash, which is calculated for the packet keader of the encapsulating wireguard packet.
Yes, Mikrotik's hw devices have very nice integrated NICs, but they absolutely do not come with beefy cpus. But that's basically the whole story.