Community discussions

MikroTik App
 
ilium007
Member Candidate
Member Candidate
Topic Author
Posts: 281
Joined: Sun Jan 31, 2010 9:58 am
Location: Brisbane, Australia

Dual WAN failover - check internet

Fri Apr 25, 2025 10:35 am

I am setting up dual WAN failover using netwatch and scripts to manipulate two 0.0.0.0/0 route distances and need to check if my main ISP is back up before bringing up the main ISP default route.
Screenshot 2025-04-25 at 17.26.11.png
During testing (with Main ISP still up) I can't ping via ether4 to 8.8.8.8.
[xxxx@RB5009] > ping 8.8.8.8 interface=ether4
  SEQ HOST                                     SIZE TTL TIME       STATUS                                                                          
    0 8.8.8.8                                                      timeout                                                                         
    1 8.8.8.8                                                      timeout                                                                         
    2 8.8.8.8                                                      timeout                                                                         
    3 10.31.0.1                                  84  64 120ms974us host unreachable                                                                
    sent=4 received=0 packet-loss=100% 
The RUT950 has internet access via an LTE connection and from that device I can ping 8.8.8.8.

Pinging via ether 2 works as expected:
[xxxx@RB5009] > ping 8.8.8.8 interface=ether2
  SEQ HOST                                     SIZE TTL TIME       STATUS                                                                          
    0 8.8.8.8                                    56 118 6ms676us  
    1 8.8.8.8                                    56 118 6ms529us  
    2 8.8.8.8                                    56 118 6ms519us  
    sent=3 received=3 packet-loss=0% min-rtt=6ms519us avg-rtt=6ms574us max-rtt=6ms676us 

The default routes:
/ip/route/export where dst-address=0.0.0.0/0
/ip route
add comment=primary_route disabled=no distance=1 dst-address=0.0.0.0/0 gateway=124.1.2.2 routing-table=main scope=30 suppress-hw-offload=no \
    target-scope=10
add comment=secondary_route disabled=no distance=2 dst-address=0.0.0.0/0 gateway=10.31.0.2 routing-table=main scope=30 suppress-hw-offload=no \
    target-scope=10
Both ether2 and ether4 interfaces have NAT configured and I can ping 10.31.0.2 from the RB5009.

Why can't I ping 8.8.8.8 via ether4?
You do not have the required permissions to view the files attached to this post.
 
A9691
Frequent Visitor
Frequent Visitor
Posts: 62
Joined: Sat May 14, 2016 10:58 am

Re: Dual WAN failover - check internet

Fri Apr 25, 2025 1:11 pm

[xxxx@RB5009] > ping 8.8.8.8 interface=ether2
  SEQ HOST                                     SIZE TTL TIME       STATUS                                                                          
    0 8.8.8.8                                    56 118 6ms676us  
    1 8.8.8.8                                    56 118 6ms529us  
    2 8.8.8.8                                    56 118 6ms519us  
    sent=3 received=3 packet-loss=0% min-rtt=6ms519us avg-rtt=6ms574us max-rtt=6ms676us 
I used /ping <ip addres> interface=<interface name> in my testing environment and it worked. But not on the real gateway. It worked for one interface and did not for the other. Both interfaces were ethernet with fixed ip.
I'm using right now:
ip vrf add name=Test1 interfaces=none
/ip route add gateway=<nexthopp on ether1> routing-table=Test1
With this setup /ping 8.8.8.8 vrf=Test1 works. In case it does not work for an interface I just increase in routing rule the distance. A mistake here is not a game changer in my case.

I' couldn't find more info about how a vrf with interfaces=none works and I also cannot say anything about weather this is the right way.
(RoS v 7.16.2)
 
jaclaz
Forum Guru
Forum Guru
Posts: 2900
Joined: Tue Oct 03, 2023 4:21 pm

Re: Dual WAN failover - check internet

Fri Apr 25, 2025 2:52 pm

It seems normal to me.

Run
/ip route print
what do you get as output?

Likely the route via 124.1.2.2 will be AS (Active Static) whilst the one via 10.31.0.2 will be only S (Static) (due to the bigger distance).
A route that is not active is like it doesn't exist.
You will have also a DAc (Dynamic Active connect) route for 10.31.0.0 on ether4 and one (still DAc) on ether2 for 124.1.2.0, these are automatically created from the IP addresses you assigned to the interfaces.

So what happens should be:
1) you ask for 8.8.8.8 on ether4
2) there is a route on ether 4 is for 10.31.0.0, and clearly 8.8.8.8 is not part of that range
3) there is no route for 8.8.8.8 (as contained in 0.0.0.0) using ether4

When you use the other interface:
1) you ask for 8.8.8.8 on ether2
2) there is a route on ether2, it is for 124.1.2.0, and clearly 8.8.8.8 is not part of that range
3) there is an Active route for 0.0.0.0 (that contains 8.8.8.8, via ether2)
4) this route is taken and via the gateway 124.1.2.2 the 8.8.8.8 is reached
 
ilium007
Member Candidate
Member Candidate
Topic Author
Posts: 281
Joined: Sun Jan 31, 2010 9:58 am
Location: Brisbane, Australia

Re: Dual WAN failover - check internet

Fri Apr 25, 2025 3:06 pm

/ip/route/print
Flags: D - DYNAMIC; A - ACTIVE; c - CONNECT, s - STATIC
Columns: DST-ADDRESS, GATEWAY, DISTANCE
 #     DST-ADDRESS        GATEWAY         DISTANCE
;;; primary_route
 0  As 0.0.0.0/0          124.1.2.2          1
;;; secondary_route
 1   s 0.0.0.0/0          10.31.0.2              2
   DAc 10.31.0.0/29       ether4                 0
My failover script changes the route distance on ether2 to '3' enabling the default route via ether4. I was hoping I could use `ping 8.8.8.8 interface=ether2` to periodically check if the primary WAN was back up but my testing (I am currently testing around the other way, that is, try gin to ping out via ether4) shows that this will not work.
 
ilium007
Member Candidate
Member Candidate
Topic Author
Posts: 281
Joined: Sun Jan 31, 2010 9:58 am
Location: Brisbane, Australia

Re: Dual WAN failover - check internet

Fri Apr 25, 2025 3:27 pm


I used /ping <ip addres> interface=<interface name> in my testing environment and it worked. But not on the real gateway. It worked for one interface and did not for the other. Both interfaces were ethernet with fixed ip.
I'm using right now:
ip vrf add name=Test1 interfaces=none
/ip route add gateway=<nexthopp on ether1> routing-table=Test1
With this setup /ping 8.8.8.8 vrf=Test1 works. In case it does not work for an interface I just increase in routing rule the distance. A mistake here is not a game changer in my case.

I' couldn't find more info about how a vrf with interfaces=none works and I also cannot say anything about weather this is the right way.
(RoS v 7.16.2)

This didn't work for me:

/ip/vrf/add name=testwan interfaces=none
/ip route add gateway=10.32.0.2 routing-table=testwan

ping 8.8.8.8 vrf=testwan
  SEQ HOST                                     SIZE TTL TIME       STATUS
    0 8.8.8.8                                                      timeout
    1 8.8.8.8                                                      timeout
    2 8.8.8.8                                                      timeout
    3 8.8.8.8                                                      timeout
    sent=4 received=0 packet-loss=100%

ping 8.8.8.8 vrf=main
  SEQ HOST                                     SIZE TTL TIME       STATUS
    0 8.8.8.8                                    56 118 6ms580us
    1 8.8.8.8                                    56 118 6ms538us
    2 8.8.8.8                                    56 118 6ms554us
    sent=3 received=3 packet-loss=0% min-rtt=6ms538us avg-rtt=6ms557us max-rtt=6ms580us

Strange though... trace route via the test wan vrf works and the ping does not:
/tool/traceroute 8.8.8.8 vrf=testwan
ADDRESS                          LOSS SENT    LAST     AVG    BEST   WORST STD-DEV STATUS
10.31.0.2                          0%   18   0.5ms     0.5     0.4     0.5       0
                                 100%   18 timeout
10.4.78.163                        0%   17  51.2ms    51.5    34.6    65.8     8.1
                                 100%   17 timeout
10.5.86.97                         0%   17  51.1ms    46.6    28.9    57.5     8.8
10.5.86.98                         0%   17  29.9ms    30.4    28.6    39.9     2.4
10.5.86.105                        0%   17  29.6ms    29.7    25.9      31       1
203.50.61.96                       0%   17  30.2ms    31.7    29.9    39.9     2.2
203.50.11.177                      0%   17  30.9ms    29.9      28      34     1.3
58.163.91.194                      0%   17  29.9ms    31.2    28.7    38.7     2.9
192.178.97.87                      0%   17  30.9ms    31.2    30.4    32.9     0.7
142.250.234.211                    0%   17  29.9ms    31.5      29    39.9     3.3
8.8.8.8                            0%   17    30ms    29.7      28    38.9     2.4

And via primary gateway:

/tool/traceroute 8.8.8.8
ADDRESS                          LOSS SENT    LAST     AVG    BEST   WORST STD-DEV STATUS
124.1.2.2                      0%    4   5.3ms     5.3     5.2     5.4     0.1
                                 100%    4 timeout
59.154.142.250                     0%    3   5.9ms     5.9     5.9     5.9       0
74.125.147.174                     0%    3   6.3ms     6.3     6.3     6.4       0
192.178.97.225                     0%    3   6.5ms     6.4     6.4     6.5       0
142.251.64.179                     0%    3   6.5ms     6.5     6.5     6.5       0
8.8.8.8                            0%    3   6.4ms     6.4     6.4     6.5       0
Last edited by ilium007 on Fri Apr 25, 2025 3:32 pm, edited 1 time in total.
 
jaclaz
Forum Guru
Forum Guru
Posts: 2900
Joined: Tue Oct 03, 2023 4:21 pm

Re: Dual WAN failover - check internet

Fri Apr 25, 2025 3:32 pm

/ip/route/print
Flags: D - DYNAMIC; A - ACTIVE; c - CONNECT, s - STATIC
Columns: DST-ADDRESS, GATEWAY, DISTANCE
 #     DST-ADDRESS        GATEWAY         DISTANCE
;;; primary_route
 0  As 0.0.0.0/0          124.1.2.2          1
;;; secondary_route
 1   s 0.0.0.0/0          10.31.0.2              2
   DAc 10.31.0.0/29       ether4                 0
My failover script changes the route distance on ether2 to '3' enabling the default route via ether4. I was hoping I could use `ping 8.8.8.8 interface=ether2` to periodically check if the primary WAN was back up but my testing (I am currently testing around the other way, that is, try gin to ping out via ether4) shows that this will not work.
But nothing prevents you from adding a "narrow" /32 route via ether4 for the chosen address (since 8.8.8.8 is more widely used, better IMHO 8.8.4.4 for this use or another DNS server).

Check this other (simpler) approach:
viewtopic.php?t=198999
viewtopic.php?t=198999#p1102129
 
ilium007
Member Candidate
Member Candidate
Topic Author
Posts: 281
Joined: Sun Jan 31, 2010 9:58 am
Location: Brisbane, Australia

Re: Dual WAN failover - check internet

Fri Apr 25, 2025 3:47 pm

I don't understand why trace route works and ping doesn't:
[xxxx@RB5009] > /ip/vrf/print
Flags: X - disabled; * - builtin
 0    name="testwan" interfaces=none

 1  * name="main" interfaces=all

[xxxx@RB5009] > /ip/route/print detail where dst-address=0.0.0.0/0
Flags: D - dynamic; X - disabled, I - inactive, A - active;
c - connect, s - static, r - rip, b - bgp, o - ospf, i - is-is, d - dhcp, v - vpn, m - modem, y - bgp-mpls-vpn; H - hw-offloaded; + - ecmp
 0  As   ;;; primary_route
         dst-address=0.0.0.0/0 routing-table=main gateway=124.1.2.2 immediate-gw=124.1.2.2%ether2 distance=1 scope=30 target-scope=10
         suppress-hw-offload=no

 1   s   ;;; secondary_route
         dst-address=0.0.0.0/0 routing-table=main gateway=10.31.0.2 immediate-gw=10.31.0.2%ether4 distance=2 scope=30 target-scope=10 suppress-hw-offload=no

17  As   dst-address=0.0.0.0/0 routing-table=testwan gateway=10.31.0.2 immediate-gw=10.31.0.2%ether4 distance=1 scope=30 target-scope=10
[xxxx@RB5009] >

[xxxx@RB5009] > ping 8.8.8.8 vrf=testwan
  SEQ HOST                                     SIZE TTL TIME       STATUS
    0 8.8.8.8                                                      timeout
    1 8.8.8.8                                                      timeout
    2 8.8.8.8                                                      timeout
    3 8.8.8.8                                                      timeout
    4 8.8.8.8                                                      timeout
    5 8.8.8.8                                                      timeout
    6 8.8.8.8                                                      timeout
    7 8.8.8.8                                                      timeout
    8 8.8.8.8                                                      timeout
    sent=9 received=0 packet-loss=100%

[xxxx@RB5009] > /tool/traceroute 8.8.8.8 vrf=testwan
ADDRESS                          LOSS SENT    LAST     AVG    BEST   WORST STD-DEV STATUS
10.31.0.2                          0%    4   0.4ms     0.5     0.4     0.7     0.1
                                 100%    4 timeout
10.4.78.163                        0%    3  51.4ms    43.3    29.2    51.4      10
                                 100%    3 timeout
10.5.86.97                         0%    3  49.6ms      47    31.2    60.2      12
10.5.86.98                         0%    3  29.8ms    27.1    21.9    29.8     3.7
10.5.86.105                        0%    3    30ms    30.3    29.9    30.9     0.4
203.50.61.96                       0%    3  30.8ms    30.9    30.8      31     0.1
203.50.11.177                      0%    3  28.9ms    28.9    27.9    29.8     0.8
58.163.91.194                      0%    3  30.9ms    30.6    29.9    30.9     0.5
192.178.97.87                      0%    3    31ms    30.9    30.7      31     0.1
142.250.234.211                    0%    3  29.8ms      30    29.8    30.2     0.2
8.8.8.8
The ping via testwan vrf should go via 10.31.0.2%ether4. Is it a return path issue?
 
ilium007
Member Candidate
Member Candidate
Topic Author
Posts: 281
Joined: Sun Jan 31, 2010 9:58 am
Location: Brisbane, Australia

Re: Dual WAN failover - check internet

Fri Apr 25, 2025 3:58 pm

This seems buggy.... I went into winbox and tried to use the GUI ping tool to 8.8.8.8 via the test wan vrf and it failed. I clicked ARP ping, it failed, delselected and it then pinged via the test wan vrf.

Now works in cli as well:

[xxxx@RB5009] > ping 8.8.8.8 vrf=testwan
  SEQ HOST                                     SIZE TTL TIME       STATUS
    0 8.8.8.8                                    56 113 121ms12us
    1 8.8.8.8                                    56 113 29ms654us
    2 8.8.8.8                                    56 113 40ms294us
    3 8.8.8.8                                    56 113 34ms595us
    sent=4 received=4 packet-loss=0% min-rtt=29ms654us avg-rtt=56ms388us max-rtt=121ms12us

[xxxx@RB5009] > ping 8.8.8.8
  SEQ HOST                                     SIZE TTL TIME       STATUS
    0 8.8.8.8                                    56 118 6ms688us
    1 8.8.8.8                                    56 118 6ms646us
    2 8.8.8.8                                    56 118 6ms645us
    3 8.8.8.8                                    56 118 6ms653us
    sent=4 received=4 packet-loss=0% min-rtt=6ms645us avg-rtt=6ms658us max-rtt=6ms688us
 
User avatar
anav
Forum Guru
Forum Guru
Posts: 23623
Joined: Sun Feb 18, 2018 11:28 pm
Location: Nova Scotia, Canada
Contact:

Re: Dual WAN failover - check internet

Fri Apr 25, 2025 6:44 pm

Netwatch leaks out any wan to find a connection and thus you need to blackhole any netwatch routing with a second following route same table distance add one.
 
jaclaz
Forum Guru
Forum Guru
Posts: 2900
Joined: Tue Oct 03, 2023 4:21 pm

Re: Dual WAN failover - check internet

Fri Apr 25, 2025 6:55 pm

Netwatch leaks out any wan to find a connection and thus you need to blackhole any netwatch routing with a second following route same table distance add one.
Which is essentially point #2 in the given:
viewtopic.php?t=198999#p1102129
 
User avatar
anav
Forum Guru
Forum Guru
Posts: 23623
Joined: Sun Feb 18, 2018 11:28 pm
Location: Nova Scotia, Canada
Contact:

Re: Dual WAN failover - check internet

Fri Apr 25, 2025 7:05 pm

Sweet!!
 
ilium007
Member Candidate
Member Candidate
Topic Author
Posts: 281
Joined: Sun Jan 31, 2010 9:58 am
Location: Brisbane, Australia

Re: Dual WAN failover - check internet

Sat Apr 26, 2025 2:57 am

I was trying to do this without using the 8.8.8.8/32 narrow route hence trying to ping via the down (primary) interface. At the moment the second VRF / route table method is working.

Edit: I spoke too soon. This morning the VRF method no longer allows a ping to 8.8.8.8 but again traceroute works:

[xxxx@RB5009] > ping 8.8.8.8 vrf=testwan
  SEQ HOST                                     SIZE TTL TIME       STATUS
    0 8.8.8.8                                                      timeout
    1 8.8.8.8                                                      timeout
    2 8.8.8.8                                                      timeout
    3 8.8.8.8                                                      timeout
    4 8.8.8.8                                                      timeout
    sent=5 received=0 packet-loss=100%

[xxxx@RB5009] > /tool/traceroute 8.8.8.8 vrf=testwan
ADDRESS                          LOSS SENT    LAST     AVG    BEST   WORST STD-DEV STATUS
10.31.0.2                          0%    3   1.3ms     0.8     0.5     1.3     0.3
                                 100%    3 timeout
10.4.78.163                        0%    2  61.7ms    49.3    36.9    61.7    12.4
                                 100%    2 timeout
10.5.86.97                         0%    2    58ms    43.6    29.2      58    14.4
10.5.86.98                         0%    2  29.7ms    29.8    29.7    29.9     0.1
10.5.86.105                        0%    2    30ms    29.9    29.8      30     0.1
203.50.61.96                       0%    2  30.9ms    30.9    30.9    30.9       0
203.50.11.177                      0%    2  29.9ms    29.8    29.7    29.9     0.1
74.125.49.138                      0%    2    30ms    30.1      30    30.1     0.1
192.178.97.219                     0%    2  29.9ms    29.9    29.9    29.9       0
142.251.64.179                     0%    2  30.1ms      30    29.9    30.1     0.1
8.8.8.8                            0%    2  30.6ms    30.3    29.9    30.6     0.4
 
ilium007
Member Candidate
Member Candidate
Topic Author
Posts: 281
Joined: Sun Jan 31, 2010 9:58 am
Location: Brisbane, Australia

Re: Dual WAN failover - check internet

Sat Apr 26, 2025 4:37 am

Decided to use the suggested /32 route but using 4.2.2.2 so no DNS is interrupted during failover. Using ICMP probe type in netwatch. This seems to be working.
 
A9691
Frequent Visitor
Frequent Visitor
Posts: 62
Joined: Sat May 14, 2016 10:58 am

Re: Dual WAN failover - check internet

Mon Apr 28, 2025 10:28 am

I don't understand why trace route works and ping doesn't:
[xxxx@RB5009] > /ip/vrf/print
Flags: X - disabled; * - builtin
 0    name="testwan" interfaces=none

 1  * name="main" interfaces=all

[xxxx@RB5009] > /ip/route/print detail where dst-address=0.0.0.0/0
Flags: D - dynamic; X - disabled, I - inactive, A - active;
c - connect, s - static, r - rip, b - bgp, o - ospf, i - is-is, d - dhcp, v - vpn, m - modem, y - bgp-mpls-vpn; H - hw-offloaded; + - ecmp
 0  As   ;;; primary_route
         dst-address=0.0.0.0/0 routing-table=main gateway=124.1.2.2 immediate-gw=124.1.2.2%ether2 distance=1 scope=30 target-scope=10
         suppress-hw-offload=no

 1   s   ;;; secondary_route
         dst-address=0.0.0.0/0 routing-table=main gateway=10.31.0.2 immediate-gw=10.31.0.2%ether4 distance=2 scope=30 target-scope=10 suppress-hw-offload=no

17  As   dst-address=0.0.0.0/0 routing-table=testwan gateway=10.31.0.2 immediate-gw=10.31.0.2%ether4 distance=1 scope=30 target-scope=10
[xxxx@RB5009] >

[xxxx@RB5009] > ping 8.8.8.8 vrf=testwan
  SEQ HOST                                     SIZE TTL TIME       STATUS
    0 8.8.8.8                                                      timeout
    1 8.8.8.8                                                      timeout
    2 8.8.8.8                                                      timeout
    3 8.8.8.8                                                      timeout
    4 8.8.8.8                                                      timeout
    5 8.8.8.8                                                      timeout
    6 8.8.8.8                                                      timeout
    7 8.8.8.8                                                      timeout
    8 8.8.8.8                                                      timeout
    sent=9 received=0 packet-loss=100%

[xxxx@RB5009] > /tool/traceroute 8.8.8.8 vrf=testwan
ADDRESS                          LOSS SENT    LAST     AVG    BEST   WORST STD-DEV STATUS
10.31.0.2                          0%    4   0.4ms     0.5     0.4     0.7     0.1
                                 100%    4 timeout
10.4.78.163                        0%    3  51.4ms    43.3    29.2    51.4      10
                                 100%    3 timeout
10.5.86.97                         0%    3  49.6ms      47    31.2    60.2      12
10.5.86.98                         0%    3  29.8ms    27.1    21.9    29.8     3.7
10.5.86.105                        0%    3    30ms    30.3    29.9    30.9     0.4
203.50.61.96                       0%    3  30.8ms    30.9    30.8      31     0.1
203.50.11.177                      0%    3  28.9ms    28.9    27.9    29.8     0.8
58.163.91.194                      0%    3  30.9ms    30.6    29.9    30.9     0.5
192.178.97.87                      0%    3    31ms    30.9    30.7      31     0.1
142.250.234.211                    0%    3  29.8ms      30    29.8    30.2     0.2
8.8.8.8
The ping via testwan vrf should go via 10.31.0.2%ether4. Is it a return path issue?
The only difference I see in the routing through 10.31.0.2 is that in vrf there is no suppress-hw-offload=no. I would like to know if it makes any difference?
 
User avatar
robmaltsystems
Forum Veteran
Forum Veteran
Posts: 756
Joined: Fri Jun 21, 2019 12:04 pm

Re: Dual WAN failover - check internet

Mon Apr 28, 2025 11:48 am

Aside, but what's the current best practise around WAN failover to LTE? When I last did this, we were still on RouterOS v6. Is the method/support different in RouterOS 7? From memory, it was mainly PING tests plus scripting.
 
jaclaz
Forum Guru
Forum Guru
Posts: 2900
Joined: Tue Oct 03, 2023 4:21 pm

Re: Dual WAN failover - check internet

Mon Apr 28, 2025 1:01 pm

Aside, but what's the current best practise around WAN failover to LTE? When I last did this, we were still on RouterOS v6. Is the method/support different in RouterOS 7? From memory, it was mainly PING tests plus scripting.
I don't think there is unanimous consent on the matter, basically it is recursive vs. netwatch, each one has some slight different features, but if properly implemented they both work just fine in most setups.

Then some people believes that the one (or the other) can be bettered or fine tuned by complicating the one or the other with sophisticated scripts, and again as long as they work, they are just fine.

Only as a side-side note, there is also a (not yet tested/reported about AFAIK) newish approach hinted about in Mikrotik's help page for Netwatch ICMP testing, making use not of the reply of the canary address, but leveraging on a low TTL exhausting on an intermediate hop :
https://help.mikrotik.com/docs/spaces/R ... 8/Netwatch
accept-icmp-time-exceeded=yes can be used together with a manually set low ttl value to monitor Internet connectivity, without relying on a specific endpoint.

For example, you can monitor a public IP address, but that address can filter your ICMP request, or just become unreachable itself, if the Netwatch probe is using this address to monitor Internet connectivity this would cause a false alarm.

To make sure you can reach the Internet, it's generally enough to make sure you can reach a device a few routing hops away. Low time to live value will expire in transit to the specified host you want to monitor - each router passing the ICMP packet will subtract "1" from TTL value, upon TTL reaching 0, ICMP "time exceeded" packet will be generated, and sent back to the Netwatch probe. If all other fail thresholds are not broken, this response will be considered a success.
 
ilium007
Member Candidate
Member Candidate
Topic Author
Posts: 281
Joined: Sun Jan 31, 2010 9:58 am
Location: Brisbane, Australia

Re: Dual WAN failover - check internet

Mon Apr 28, 2025 1:36 pm

Only as a side-side note, there is also a (not yet tested/reported about AFAIK) newish approach hinted about in Mikrotik's help page for Netwatch ICMP testing, making use not of the reply of the canary address, but leveraging on a low TTL exhausting on an intermediate hop :
https://help.mikrotik.com/docs/spaces/R ... 8/Netwatch
I did read this note the other day and didn't get time to test it further. I have looked at it tonight and It is actually a clever method of determining if a link is up or down.

I would like to use netwatch but there is simply not enough documentation around how it considers a connection back up. In my testing with both simple and ICMP probes it seems that after 1 successful ICMP response the link was deemed back up, I need better odds on a link up transition.
 
ilium007
Member Candidate
Member Candidate
Topic Author
Posts: 281
Joined: Sun Jan 31, 2010 9:58 am
Location: Brisbane, Australia

Re: Dual WAN failover - check internet

Mon Apr 28, 2025 1:54 pm

I noted that the Netwatch doco mentions the ICMP probe accepts a vrf option so I thought, great, I can set up an interface=none vrf and then a static route to 1.1.1.1 via this vrf so that in a failover (which could last for hours) network devices still had access to 1.1.1.1 DNS. Unfortunately netwatch seems to suffer the same issue I documented here: viewtopic.php?t=216455#p1140252 in that when pinging via a vrf the device requires one successful ping via the main vrf before it will send ICMP packets via the specified vrf:

Screenshot 2025-04-28 at 20.45.12.png
Screenshot 2025-04-28 at 20.45.52.png
Screenshot 2025-04-28 at 20.47.31.png
You do not have the required permissions to view the files attached to this post.
 
User avatar
robmaltsystems
Forum Veteran
Forum Veteran
Posts: 756
Joined: Fri Jun 21, 2019 12:04 pm

Re: Dual WAN failover - check internet

Mon Apr 28, 2025 2:04 pm

I might make a new post on this as don't want to take over the original post. Are there moderator controls here to split posts?
 
ilium007
Member Candidate
Member Candidate
Topic Author
Posts: 281
Joined: Sun Jan 31, 2010 9:58 am
Location: Brisbane, Australia

Re: Dual WAN failover - check internet

Mon Apr 28, 2025 2:15 pm

I might make a new post on this as don't want to take over the original post. Are there moderator controls here to split posts?
Do you have a fix for this?
 
ilium007
Member Candidate
Member Candidate
Topic Author
Posts: 281
Joined: Sun Jan 31, 2010 9:58 am
Location: Brisbane, Australia

Re: Dual WAN failover - check internet

Mon Apr 28, 2025 2:32 pm

The only difference I see in the routing through 10.31.0.2 is that in vrf there is no suppress-hw-offload=no. I would like to know if it makes any difference?

Tested again with suppress-hw-offload=no enabled on the testwan vrf. Made no difference apart from now I seem to be able to traceroute via the vrf but still no ping.

In order below:
- can't ping 1.1.1.1 via testwan vrf
- can traceroute 1.1.1.1 via testwan vrf
- ping 1.1.1.1 via main routing table
- can ping 1.1.1.1 via testwan vrf

[xxxx@RB5009] > /ip/route/export where dst-address=0.0.0.0/0
/ip route
add comment=primary_route disabled=no distance=1 dst-address=0.0.0.0/0 gateway=124.1.1.2 routing-table=main scope=30 suppress-hw-offload=no target-scope=\
    10
add comment=secondary_route disabled=no distance=2 dst-address=0.0.0.0/0 gateway=10.31.0.2 routing-table=main scope=30 suppress-hw-offload=no target-scope=10
add disabled=no distance=1 dst-address=0.0.0.0/0 gateway=10.31.0.2 routing-table=testwan scope=30 suppress-hw-offload=no target-scope=10
[xxxx@RB5009] >
[xxxx@RB5009] > ping 1.1.1.1 vrf=testwan count=3
  SEQ HOST                                     SIZE TTL TIME       STATUS
    0 1.1.1.1                                                      timeout
    1 1.1.1.1                                                      timeout
    2 1.1.1.1                                                      timeout
    sent=3 received=0 packet-loss=100%

[xxxx@RB5009] > /tool/traceroute 1.1.1.1 vrf=testwan
ADDRESS                          LOSS SENT    LAST     AVG    BEST   WORST STD-DEV STATUS
10.31.0.2                          0%    4   0.6ms     0.6     0.5     0.6     0.1
                                 100%    4 timeout
10.4.78.211                        0%    3  66.6ms    59.5    29.8      82    21.9
                                 100%    3 timeout
10.5.86.97                         0%    3  29.3ms      33    29.3    36.3     2.9
10.5.86.98                         0%    3  29.9ms    28.5    25.7    29.9       2
10.5.86.105                        0%    3  29.6ms    31.7    29.6    35.7     2.8
203.50.61.96                     33..    3  32.4ms    34.2    32.4    35.9     1.8
203.50.12.133                      0%    3  27.8ms    39.5    27.8    56.3    12.2
138.217.254.98                     0%    3  30.9ms    30.9    29.9    31.8     0.8
108.162.250.7                      0%    3  29.9ms      30    29.9      30       0
1.1.1.1                            0%    3    30ms    29.5    28.6      30     0.6

[xxxx@RB5009] > ping 1.1.1.1 count=3
  SEQ HOST                                     SIZE TTL TIME       STATUS
    0 1.1.1.1                                    56  55 6ms714us
    1 1.1.1.1                                    56  55 6ms785us
    2 1.1.1.1                                    56  55 6ms720us
    sent=3 received=3 packet-loss=0% min-rtt=6ms714us avg-rtt=6ms739us max-rtt=6ms785us

[xxxx@RB5009] > ping 1.1.1.1 vrf=testwan count=3
  SEQ HOST                                     SIZE TTL TIME       STATUS
    0 1.1.1.1                                    56  53 65ms59us
    1 1.1.1.1                                    56  53 47ms673us
    2 1.1.1.1                                    56  53 39ms381us
    sent=3 received=3 packet-loss=0% min-rtt=39ms381us avg-rtt=50ms704us max-rtt=65ms59us
 
jaclaz
Forum Guru
Forum Guru
Posts: 2900
Joined: Tue Oct 03, 2023 4:21 pm

Re: Dual WAN failover - check internet

Mon Apr 28, 2025 2:49 pm

traceroute *somehow* tries every which way it can to reach the destination.

ping does not, BUT if you ping the destination successfully (via the main interface), then ping also works, I posted about this with some references in your other thread:
viewtopic.php?t=216455#p1140249
 
ilium007
Member Candidate
Member Candidate
Topic Author
Posts: 281
Joined: Sun Jan 31, 2010 9:58 am
Location: Brisbane, Australia

Re: Dual WAN failover - check internet

Mon Apr 28, 2025 2:52 pm

traceroute *somehow* tries every which way it can to reach the destination.

ping does not, BUT if you ping the destination successfully (via the main interface), then ping also works, I posted about this with some references in your other thread:
viewtopic.php?t=216455#p1140249

In this case traceroute *is* going out via the vrf first hop, 10.31.0.2, which is *not* the main default gateway. If it truly was *somehow* getting a packet out somewhere it certainly does not reflect in the first hop in my traceroute output. The ICMP packet should be going via 10.31.0.2 as per the traceroute output.

I have seen your response and the vrf with no interfaces is *exactly* what I have implemented here. Performing a ping *before* the vrf ping is not possible if I am using the Netwatch ICMP probe type. I have also tried the narrow /32 route and second blackhole route but it means in a failover, that could last hours, the target IP is not available to the network as it's being forced out the down interface. The VRF option would be much nicer...... if only it worked.
 
jaclaz
Forum Guru
Forum Guru
Posts: 2900
Joined: Tue Oct 03, 2023 4:21 pm

Re: Dual WAN failover - check internet

Mon Apr 28, 2025 4:29 pm

I have seen your response and the vrf with no interfaces is *exactly* what I have implemented here. Performing a ping *before* the vrf ping is not possible if I am using the Netwatch ICMP probe type.
Why not?
The reported (and admittedly "ugly") workaround suggested here:
viewtopic.php?p=1074419#p1082265
Is about running through scheduler a ping with an interval shorter than what the *whatever* resets the ping capability, that is completely independent from Netwatch.

The way ping currently works (rectius fails to work) is IMHO a bug, and having it fixed and/or the return of the routing-table parameter would be very welcome, still now is all we have.