Reading your scenario, I get why you’re investigating how to do it. I mostly try and discourage people because they look at the features/documentation, see that it’s possible, and then end up stitching LANs together for “convenience” rather than for a solid reason. If you’re providing essentially an Ethernet virtual circuit service to a customer, that’s what you have to provide and one of these is likely going to be the answer.
IPIP is out because it just can’t transport non-IP packets as the next header by design. All of the others are still on the table and it sounds like you currently have EoIP (basically Ethernet over GRE) in place. Frankly, if I was just doing this for a few customers, I’d do it exactly the way you’re already doing it with EoIP.
Even though the terminology can be imprecise, it’s also important to establish that what you’re really doing is a virtual circuit service that happens to use Ethernet framing. Fully emulating Ethernet over a provider network requires contending with the reality that “native” switched Ethernet is a multipoint protocol. Frames addressed to the broadcast or multicast MAC addresses are replicated by native transport hardware (usually a switch) and flooded out all of the relevant ports:

If your client asks for three or more branches to share this same Ethernet segment, then that’s when things become more complicated. This is where my thing about GRE not being a multipoint protocol comes in. If you are trying to fully emulate Ethernet with GRE, then each branch office would need its own GRE tunnel to one central router that will act as a bridge. If the sites are really far apart, then this can be a real pain:

If you use something like VXLAN, then you either have to support multicast routing in the provider network or your VTEP hardware has to support head-end replication. RouterOS does not appear to support head-end replication, so multicast would be a requirement. On top of that, really a multicast RP is still just the root of a tree, so you’d have to pick a router in the middle that was an acceptible location for the replication to happen. That may or may not be any better than the GRE hub and spoke. Alternatively, you could have routers that support an anycast RP but that also doesn’t appear to be supported in RouterOS. That would look something like this:

It gets complicated fast. I would hold the line with the client, use EoIP like you’re already doing, and only provide point-to-point virtual circuit service if at all possible.
The rest of the situation with trying to bond multiple connections is a completely different wrinkle in this. Can you produce a quick sketch of the routers and links in the entire path between these two branch offices? I’m not really sure how much this is going to factor into what tunneling mechanism you choose, though.