Yes, this is the case, I used the default settings of IPSec, I thought this will work. Is it better to use allways Exchange-mode=ike2?
Well, for site to site configurations most likely yes as IKEv2 should be a better choice than IKE(v1); however, at least older client implementations on Windows do not attempt to establish IKEv2 sessions for L2TP transport, and you if you want to use IKEv2, you have to configure the IPsec layer manually which needs more understanding than when you let RouterOS do the job for you.
So I've mentioned IKEv2 only as one of possible explanations why you've encountered a service breakdown without a configuration change, as that very bug was relevant to IKEv2 alone.
Sorry, I see again and found 1 policy for each connected device
Look yet another time - at the central site, there are two dynamically created policies, one per each client (user), exactly as it should be. As the two differ from each other in
sa-dst-address, there is no conflict and your issue doesn't come from such conflict.
Site3 has problem with the ISP device, it cant be set to bridge mode, I am still talking with the ISP trying to get the public IP on the router, anyway, so far, the VPN is still working in both client sites
The fact that an L2TP/IPsec client is behind a NAT only causes trouble if two clients of the same server would be behind the same NAT IP. So it doesn't break anything in your case.
All in all, I cannot see anything in the configuration what would explain why after a recovery from a disruption on the internet path between the client and the server the connections should not re-establish. It may take some time before the peers notice that they cannot see each other after the disruption (the dynamically created IPsec peer has
dpd-interval=2m and
dpd-maximum-failures=5, so it takes up to 12 minutes before the connection is considered dead at IPsec level, I don't know how many unresponded L2TP keepalive messages are tolerated but these are sent every minute at two different levels).
For the GRE you used before, by default the keepalive messages are sent every 10 seconds and 10 must fail for the tunnel to be considered down.
Unlike Windows and Android, RouterOS doesn't need to be kicked to attempt re-connection if an IPsec or L2TP or GRE connection breaks down - during and after a network failure, it keeps on trying to re-establish the tunnel until you disable/unconfigure the client.
Are you able to replicate the issue somehow? I.e. is there a known sequence of steps you have to take to make it happen again? If so, do the following on both machines:
/system logging add topics=ipsec,!packet
/system logging add topics=l2tp
/global printjob [execute {log print follow-only file=log-from-site-X where topics~"l2tp|ipsec"}]
Then, replicate the issue and let the fate do its job for 35 minutes. If the tunnel does not re-establish even after that time, do what you have to do so that you could access both devices (preferably, nothing) and then issue
/system script job remove $printjob. Then, do whatever is necessary to re-establish the tunnel and download the two log files from the devices, and see what the logs tell you about what has happened. If they tell you nothing, obfuscate the public IPs and possibly the l2tp usernames and publish the logs.