The whole point of a VTI is that you can use regular routing rather than traffic matching by selectors, which quickly turns into a nightmare if you use more subnets at each end of a link. VTI violates the security concept of IPsec in terms that if you use VTI, traffic matching an existing traffic selector must be accepted even if it arrives through some other path than the IPsec SA bound to that traffic selector, because the two peers of the VTI tunnel must negotiate a traffic selector matching all traffic (0.0.0.0/0 <-> 0.0.0.0/0).
The difference between L2 and L3 interface is not in what you can route through them but in the fact that the L3 ones don't transport the L2 (MAC addresses, Ethertype) headers. Hence no ARP (as it has no purpose there), no ?STP, and no possibility to become a part of a bridge. Examples: IPIP (ipencap), GRE/IP (i.e. not GRETAP), various flavors of PPP (L2TP, PPTP, SSTP) without BCP...
You can use IPsec to encrypt an EoIP tunnel or an L2TP/BCP tunnel - both are L2 tunnels so their interfaces can be made member ports of bridges, but EoIP is Mikrotik's proprietary misuse of GRE and L2TP/BCP is a standard one but only seems to be supported by Mikrotik, so you need a Mikrotik device at both ends. And L2 tunneling has a lot of drawbacks (all the broacast traffic being transported, more overhead per byte of IP payload) and as far as I can see, no advantages for your use case.
Instead of writing novels, post /export hide-sensitive. Use find&replace in your favourite text editor to systematically replace all occurrences of each public IP address potentially identifying you by a distinctive pattern such as my.public.ip.1.