Well, that suggestion was relevant in the context of one CPU thread being loaded at 100 % and the others idling as you’ve stated here, not the whole machine running at 100 % as you’ve stated in the 6.45 beta topic. If we talk about 100 % of all CPUs, organizing the 2nd pass through the firewall makes no sense of course. If it’s still true that you have some idling CPU threads and you want to let them do the job of the other router, the instruction how to set up the IPIP “loopback” tunnel is here in chapter Implementation Details.
But the key question is whether the IPIP transport packets are handed over via some RAM queue so their “transmission” and “reception” can be handled by different threads, or whether the same thread will always handle both ends of the tunnel. So it may happen that you build the tunnel and find out that all the load is still concentrated in the same CPU thread.