Netwatch/Ping Problem with Recursive Route

I have a puzzling problem. I’m using a recursive route for WAN failover, and on a second RB I am using Netwatch to ping this recursive route so that a specific port can be disabled when the primary WAN fails, and enabled when the primary WAN recovers.

Topology:

<WAN1>--<ether1|rb1|ether2>--<ether1|rb2|wlan1>--<WAN2>

On rb1, recursive route using Google DNS for validation:

/ip route
add check-gateway=ping comment="primary route" distance=1 gateway=8.8.8.8
add comment="secondary route" distance=2 gateway=172.16.44.2
add comment="validate primary route" distance=1 dst-address=8.8.8.8/32 gateway=47.223.56.1 scope=10

IP 172.16.44.2 is rb2. On rb2:


/ip route
add distance=3 gateway=172.16.44.1
add distance=1 dst-address=8.8.8.8/32 gateway=172.16.44.1

Also on rb2 is a dynamic default route with distance=2 that is automatically added when WAN2 is manually activated during failure.

And the netwatch is pretty standard:

/tool netwatch
add down-script="/interface disable ether2;" host=8.8.8.8 interval=2s up-script="/interface enable ether2;"

So here is the problem:

  • In the regular state with WAN1 working, rb2 pings to 8.8.8.8 are successful and netwatch works as expected (up)
  • In the failure state with WAN1 failing, rb2 pings to 8.8.8.8 are unsuccessful and netwatch works as expected (down)
  • In the recovery state with WAN1 working again, rb2 pings to 8.8.8.8 are still unsuccessful and netwatch is still down; traceroute is successful but ping is not!
  • Only a reboot of rb2 will fix the ping/netwatch problem

I just cannot understand why traceroute is successful on recovery, but ping is not!

[admin@rb2] > ping 8.8.8.8
  SEQ HOST                                     SIZE TTL TIME  STATUS             
    0 8.8.8.8                                                 timeout            
    1 8.8.8.8                                                 timeout            
    2 8.8.8.8                                                 timeout            
    sent=3 received=0 packet-loss=100% 

[admin@rb2] > /tool traceroute 8.8.8.8
 # ADDRESS                          LOSS SENT    LAST     AVG    BEST   WORST
 1 172.16.44.1                        0%    2   0.3ms     0.4     0.3     0.4
 2                                  100%    2 timeout
 3 173.219.227.28                     0%    2   7.5ms     8.8     7.5      10
 4 173.219.152.234                    0%    2  15.2ms    13.3    11.3    15.2
 5 173.219.196.47                     0%    2  18.8ms    18.8    18.8    18.8
 6 209.85.245.179                     0%    1  21.3ms    21.3    21.3    21.3
 7 72.14.234.61                       0%    1  11.7ms    11.7    11.7    11.7
 8 8.8.8.8                            0%    1  11.8ms    11.8    11.8    11.8

Any ideas of more things to check?

I observed the same issue on my dual wan router. But I don’t know what the cause is.

@boberov

I am not really sure, but it seems like OP is missing proper scope/target-scope in routes on rb1.

For what it costs, you can try seeing what happens with (adapted to your actual configuration):

/ip route
add check-gateway=ping comment="primary route" distance=1 gateway=8.8.8.8 scope=10 target-scope=12
add comment="validate primary route" distance=1 dst-address=8.8.8.8/32 gateway=47.223.56.1 scope=10 target-scope=11
add comment="secondary route" distance=2 gateway=172.16.44.2

But I am not really understanding the setup of rb2 in the OP, and I doubt that you have exactly the same one.

When the main route (via 47.223.56.1) on rb1 fails, the one going through 172.16.44.2 comes into effect, OK.
Then on rb2 there is a route with distance1 for 8.8.8.8/32 via 172.16.44.1 and a general (omitted 0.0.0.0/0) one with distance 3 still via 172.16.44.1.
What is the route dynamically added with distance 2?
And what it is needed for?

In any case, you should post your configuration (anonymized) for review, follow the steps here:
http://forum.mikrotik.com/t/forum-rules/173010/1

Correct interrelated moving parts, and its unfair to ask for definitive specific answers to vaguish questions without the context and information required.