ping problem on ros3.7 on x368 PC

One PC, ¨PC-Gateway¨, 2 x WAN, 1 x Lan

1st WAN connect direct to ADSL modem, 2nd WAN connects through 3 MT routers and wireless to another ADSL modem
For fail over with load balancing have scripts testing the route through each WAN connection to a dedicated Google server IP, each route another IP.

Scripts set local variables, and start 2 x´test gw´ scripts that each send a ping towards google IP. The mangle and scr-nat and dedicated default route force that ping to take its designated route. Upon the return result local variable is set.
Next script checks local variables and upon its values disables, or enables scr-nat and default gateway´s for traffic flow.
The designated routes for the specifice ´test´ icpm traffic stays in place. It is only the traffic flow from the failing route that gets routed towards the other still alive default route. By having the ping routes consistent the ´check gw ping´ stays in place.

All worked (with some in-between adjustments during upgrades of ros) fine until upgrade to ros3.7 lately.

What happens is that one of the ping commands is not recognized by its dedicated nat rule anymore???
One of the ping (and only that one!) runs fine for a while after a reboot, but then stops beeing shown in the src-nat rule.
It is still beeing marked by the mangle rules but the src-nat hardly ´sees´ it anymore, it only counts one of the four pings, or at times nothing at all!

My fail over system becomes now a failing system taking over my life!

Test: After a reboot, everything is fine, disabled that specific route, everything fine, ping gets no return, local variables are adjusted, script put other src-nat and default route in place to replace the default ones for that route and traffic starts to flow to the other route.

Enabled that specific route again, nothing happens. Although the route is ´alive´ again (I can ping its designated google server IP from all kind of devices through this route, but still NOT from the Main Gateway router!)
Local variables are now not adjusted back and that default route stays disabled for my traffic flow.

Manually forced to run a script that enables that default route again (with its scr-nat) and suddenly the ping gets returned!
Good news? No! The ping stops getting returns after a while. The src-nat stops showing (counting) that command after some 20-40 secs! No script issued pings, no manual pings from the router. The mangle ´sees´ the packages, but the src-nat sees nothing, or only one out or 4. (And once in a while sees 2 out of 4)

But that google server is still ping able from any other source through this router, or from any closer source (the ADSL modem) on my network. I can even run a trace route from my main router to that server, but NO ping!
Also, the traffic flow runs on this route now, while is should not. According the result of the ping returns the scripts should disable this route, but it doesn’t happen?
Why? The script that sets the local variable is not setting accordingly. Why? Shoot me, I don´t know! It worked in ros3.6 and its sister script for the other route in this same router and that is exactly the same apart from the IP’s works perfect!)

When I now just reboot the router, everything works fine again. All servers are pingable, and scripts works fine.
But once every 2 or 3 days, the ping to that specific server stops completely with prescribed effects. (´Torch´ also does not see the ping anymore after failure)

It looks like after the mangle routine the packages get lost?

Here the ¨check-gateway-one¨ script which is subject of the problem:
:global gw1 $gw1; :if ([/ping 66.249.91.103 count=4]>2) do={:set gw1 0} else={:set gw1 1}

Any suggestions to look for? I cracked my brain over it but don´t see any logical solutions.

I will get rb1000 in shortly, will replace this x386 system with it with exact the same config, and see if the problem is still there. Maybe it is a x386 bug again after all?


rgds.