Dual WAN failover - check internet

Tested again with suppress-hw-offload=no enabled on the testwan vrf. Made no difference apart from now I seem to be able to traceroute via the vrf but still no ping.

In order below:

  • can’t ping 1.1.1.1 via testwan vrf
  • can traceroute 1.1.1.1 via testwan vrf
  • ping 1.1.1.1 via main routing table
  • can ping 1.1.1.1 via testwan vrf


[xxxx@RB5009] > /ip/route/export where dst-address=0.0.0.0/0
/ip route
add comment=primary_route disabled=no distance=1 dst-address=0.0.0.0/0 gateway=124.1.1.2 routing-table=main scope=30 suppress-hw-offload=no target-scope=\
    10
add comment=secondary_route disabled=no distance=2 dst-address=0.0.0.0/0 gateway=10.31.0.2 routing-table=main scope=30 suppress-hw-offload=no target-scope=10
add disabled=no distance=1 dst-address=0.0.0.0/0 gateway=10.31.0.2 routing-table=testwan scope=30 suppress-hw-offload=no target-scope=10
[xxxx@RB5009] >
[xxxx@RB5009] > ping 1.1.1.1 vrf=testwan count=3
  SEQ HOST                                     SIZE TTL TIME       STATUS
    0 1.1.1.1                                                      timeout
    1 1.1.1.1                                                      timeout
    2 1.1.1.1                                                      timeout
    sent=3 received=0 packet-loss=100%

[xxxx@RB5009] > /tool/traceroute 1.1.1.1 vrf=testwan
ADDRESS                          LOSS SENT    LAST     AVG    BEST   WORST STD-DEV STATUS
10.31.0.2                          0%    4   0.6ms     0.6     0.5     0.6     0.1
                                 100%    4 timeout
10.4.78.211                        0%    3  66.6ms    59.5    29.8      82    21.9
                                 100%    3 timeout
10.5.86.97                         0%    3  29.3ms      33    29.3    36.3     2.9
10.5.86.98                         0%    3  29.9ms    28.5    25.7    29.9       2
10.5.86.105                        0%    3  29.6ms    31.7    29.6    35.7     2.8
203.50.61.96                     33..    3  32.4ms    34.2    32.4    35.9     1.8
203.50.12.133                      0%    3  27.8ms    39.5    27.8    56.3    12.2
138.217.254.98                     0%    3  30.9ms    30.9    29.9    31.8     0.8
108.162.250.7                      0%    3  29.9ms      30    29.9      30       0
1.1.1.1                            0%    3    30ms    29.5    28.6      30     0.6

[xxxx@RB5009] > ping 1.1.1.1 count=3
  SEQ HOST                                     SIZE TTL TIME       STATUS
    0 1.1.1.1                                    56  55 6ms714us
    1 1.1.1.1                                    56  55 6ms785us
    2 1.1.1.1                                    56  55 6ms720us
    sent=3 received=3 packet-loss=0% min-rtt=6ms714us avg-rtt=6ms739us max-rtt=6ms785us

[xxxx@RB5009] > ping 1.1.1.1 vrf=testwan count=3
  SEQ HOST                                     SIZE TTL TIME       STATUS
    0 1.1.1.1                                    56  53 65ms59us
    1 1.1.1.1                                    56  53 47ms673us
    2 1.1.1.1                                    56  53 39ms381us
    sent=3 received=3 packet-loss=0% min-rtt=39ms381us avg-rtt=50ms704us max-rtt=65ms59us

traceroute somehow tries every which way it can to reach the destination.

ping does not, BUT if you ping the destination successfully (via the main interface), then ping also works, I posted about this with some references in your other thread:
http://forum.mikrotik.com/t/dual-wan-failover-script-feedback-pls/183423/1

In this case traceroute is going out via the vrf first hop, 10.31.0.2, which is not the main default gateway. If it truly was somehow getting a packet out somewhere it certainly does not reflect in the first hop in my traceroute output. The ICMP packet should be going via 10.31.0.2 as per the traceroute output.

I have seen your response and the vrf with no interfaces is exactly what I have implemented here. Performing a ping before the vrf ping is not possible if I am using the Netwatch ICMP probe type. I have also tried the narrow /32 route and second blackhole route but it means in a failover, that could last hours, the target IP is not available to the network as it’s being forced out the down interface. The VRF option would be much nicer… if only it worked.

Why not?
The reported (and admittedly “ugly”) workaround suggested here:
http://forum.mikrotik.com/t/how-to-use-ping-with-multiple-routing-marks-in-ros-version-7/175887/3
Is about running through scheduler a ping with an interval shorter than what the whatever resets the ping capability, that is completely independent from Netwatch.

The way ping currently works (rectius fails to work) is IMHO a bug, and having it fixed and/or the return of the routing-table parameter would be very welcome, still now is all we have.