Random success with next hop gateway

I have two internet gateways which I am tracking their connection state with ping to an outside address using recursive routing and connection tracking.

/ip route
add check-gateway=ping comment="" disabled=no distance=1 dst-address=0.0.0.0/0 gateway=4.2.2.1 routing-mark=\
    to_ether1 scope=30 target-scope=10
add check-gateway=ping comment="" disabled=no distance=1 dst-address=0.0.0.0/0 gateway=4.2.2.2 routing-mark=\
    to_ether2 scope=30 target-scope=10
add disabled=no distance=1 dst-address=4.2.2.1/32 gateway=192.168.1.1 scope=10 target-scope=10
add disabled=no distance=20 dst-address=4.2.2.1/32 type=blackhole
add disabled=no distance=1 dst-address=4.2.2.2/32 gateway=192.168.2.1 scope=10 target-scope=10
add disabled=no distance=20 dst-address=4.2.2.2/32 type=blackhole

The problem I am having is that randomly after reboot routes 1 & 2 will not select 3 & 4 as their next hop thus not bringing up the route. It seems that if I reboot the router a few times it will eventually work. Am I doing something wrong? Has anyone else experienced this?

Thanks,
Nick

what version?

show your “/ip route print detail” and “/ip route nexthop print detail” when problem appears

If had issues with 4.16, 4.17, and 5.0rc11 on an x86 system. It seems that it’s not just the reboot that fixes it, but rather restoring from a backed up config. I can reboot and have it fail then restore the backup that I saved just before rebooting and it works again.


Output you requested with the first 3 octets of my public IP’s changed to 192.168.1 and 192.168.2. You should get the idea.

[admin@firewall] > /ip route print detail
Flags: X - disabled, A - active, D - dynamic, C - connect, S - static, r - rip, b - bgp, o - ospf, m - mme, B - blackhole, U - unreachable, P - prohibit
0 A S dst-address=0.0.0.0/0 gateway=4.2.2.1 gateway-status=4.2.2.1 recursive via 192.168.1.193 ether1-T1 check-gateway=ping distance=1 scope=30
target-scope=10 routing-mark=to_ether1
1 A S dst-address=0.0.0.0/0 gateway=4.2.2.2 gateway-status=4.2.2.2 recursive via 192.168.2.1 ether2-DSL check-gateway=ping distance=1 scope=30
target-scope=10 routing-mark=to_ether2
2 A S dst-address=4.2.2.1/32 gateway=192.168.1.193 gateway-status=192.168.1.193 reachable ether1-T1 distance=1 scope=10 target-scope=10
3 SB dst-address=4.2.2.1/32 type=blackhole distance=20
4 A S dst-address=4.2.2.2/32 gateway=192.168.2.1 gateway-status=192.168.2.1 reachable ether2-DSL distance=1 scope=10 target-scope=10
5 SB dst-address=4.2.2.2/32 type=blackhole distance=20
6 A S dst-address=10.0.0.0/8 gateway=172.16.10.2 gateway-status=172.16.10.2 reachable ether6-LAN distance=1 scope=30 target-scope=10
7 ADC dst-address=192.168.2.0/24 pref-src=192.168.2.2 gateway=ether2-DSL gateway-status=ether2-DSL reachable distance=0 scope=10
8 ADC dst-address=192.168.2.192/27 pref-src=192.168.1.194 gateway=ether1-T1 gateway-status=ether1-T1 reachable distance=0 scope=10
9 A S dst-address=172.16.0.0/12 gateway=172.16.10.2 gateway-status=172.16.10.2 reachable ether6-LAN distance=1 scope=30 target-scope=10
10 ADC dst-address=172.16.10.0/24 pref-src=172.16.10.1 gateway=ether6-LAN gateway-status=ether6-LAN reachable distance=0 scope=10

hmmm… seems good…

  1. gateway=4.2.2.1 gateway-status=4.2.2.1 recursive via 192.168.1.193
  2. gateway=4.2.2.2 gateway-status=4.2.2.2 recursive via 192.168.2.1

and what should be there during normal operation?..

What I pasted was from a good and working period.

Here is from a failed session.

[admin@firewall] > /ip route print detail
Flags: X - disabled, A - active, D - dynamic, C - connect, S - static, r - rip, b - bgp, o - ospf, m - mme, B - blackhole, U - unreachable, P - prohibit 
 0   S  dst-address=0.0.0.0/0 gateway=4.2.2.1 gateway-status=4.2.2.1 inactive check-gateway=ping distance=1 scope=30 target-scope=10 routing-mark=to_ether1 
 1   S  dst-address=0.0.0.0/0 gateway=4.2.2.2 gateway-status=4.2.2.2 inactive check-gateway=ping distance=1 scope=30 target-scope=10 routing-mark=to_ether2 
 2 A S  dst-address=4.2.2.1/32 gateway=192.168.1.193 gateway-status=192.168.1.193 reachable ether1-T1 distance=1 scope=10 target-scope=10 
 3   SB dst-address=4.2.2.1/32 type=blackhole distance=20 
 4 A S  dst-address=4.2.2.2/32 gateway=192.168.2.1 gateway-status=192.168.2.1 reachable ether2-DSL distance=1 scope=10 target-scope=10 
 5   SB dst-address=4.2.2.2/32 type=blackhole distance=20 
 6 A S  dst-address=10.0.0.0/8 gateway=172.16.10.2 gateway-status=172.16.10.2 reachable ether6-LAN distance=1 scope=30 target-scope=10 
 7 ADC  dst-address=192.168.2.0/24 pref-src=192.168.2.2 gateway=ether2-DSL gateway-status=ether2-DSL reachable distance=0 scope=10 
 8 ADC  dst-address=192.168.1.192/27 pref-src=192.168.1.194 gateway=ether1-T1 gateway-status=ether1-T1 reachable distance=0 scope=10 
 9 A S  dst-address=172.16.0.0/12 gateway=172.16.10.2 gateway-status=172.16.10.2 reachable ether6-LAN distance=1 scope=30 target-scope=10 
10 ADC  dst-address=172.16.10.0/24 pref-src=172.16.10.1 gateway=ether6-LAN gateway-status=ether6-LAN reachable distance=0 scope=10 
11 A S  dst-address=192.168.0.0/16 pref-src=172.16.10.1 gateway=172.16.10.2 gateway-status=172.16.10.2 reachable ether6-LAN distance=1 scope=30 target-scope=10

failed nexthop output

[admin@firewall] /ip route> nexthop print detail 
 0 address=192.168.2.1 gw-state=reachable scope=10 check-gateway=none 
 1 address=192.168.1.193 gw-state=reachable scope=10 check-gateway=none 
 2 address=172.16.10.2 gw-state=reachable scope=10 check-gateway=none

All I had to do to make it fail was reboot. To make it work again all I have to do is restore the backup config I captured just before rebooting.

It’s really not a big deal since it hardly ever gets rebooted, but I wasted quite a few hours with this anomoly. The only reason it was rebooted was that I wanted to update to 4.17 after hours. It ended up being a fairly late night before I gave up and went back to 4.16 and restored from a good known config. Since then I’ve been able to test this issue with 4.16, 4.17, and 5.0rc11 to try and narrow it down.

did you try to disable/enable routes 1&2? to disable ‘check-gateway’ option?..