I have seen posters having success using vxlan, apparently it runs multi core
and supports fast path.
If the above works, For later experimentation:
You could somehow have the encrypted traffic in the internal network(s) use bigger than 1500 sized packets. (say 2800+)
And let them be fragmented over the wan with a relatively small loss of efficiency.