i have mostly public static ip on one wan interface and one NAT-ed ip on another.
So the typical SOHO scenario with no own AS number and with two ISPs, i. e. the public addresses from which you connect to servers in the internet are different and thus a failover to the secondary uplink means that all existing sessions break down.
Problem lies where I can go to gateway, its pingable, but there is a ISP problem and it wont go outside of router, then this kind of failover don't work, that's why i try to find something better for me.
That's surprising - this setup (recursive routing where the "canary" (path transparency check) IP addresses are routed via the actual gateways and everything else is routed via the "canary" IPs) deals exactly with the issue you describe, i.e. that the actual gateway stays up but the network behind it loses connection to the rest of the internet. If that happens, the check-gateway ping stops getting responses from the canary IP (virtual gateway) and the route thus becomes inactive.
So if this "doesn't work" for you, something else must be broken (I can e.g. imagine that ping keeps getting through an uplink but other traffic doesn't), or "doesn't work" must mean something else than how I understand it.
maybe to add some netwatch?
...
Ah, the
/ip firewall connection remove [find] in your netwatch script maybe gives a hint on what "doesn't work" means? Whereas TCP sessions time out eventually once the remote server stops responding, UDP sessions (IPsec, L2TP, SIP, ...) that get refreshed from the LAN side more frequently than once in 3 minutes stay stuck with the same
reply-dst-address. If this is indeed the issue you need to address, do concentrate at that - use a scheduled script to remove these connections whenever the route through their respective WAN becomes inactive.
There is also a follow-up question - what to do when the WAN becomes available again. The answer to this one depends on the usage strategy of the WANs. If the strategy is load distribution, nothing needs to be done - connections that migrated to WAN B due to failure of WAN A may be left running via WAN B even after WAN A recovers. If the strategy is pure backup because WAN B is more expensive and/or offers less bandwidth than WAN A, the script has to remove connections from WAN B once WAN A recovers, but maybe after some guard time rather than immediately.
As compared to netwatch, a scheduled script gives you more flexibility in what it tracks. So e.g. it can inspect the current state of all the WANs at each run, compare it with the state detected during the previous one, and execute actions best matching the particular state change detected (there may be more than two WANs and more than one usage strategy).