In our setup we always set up a loopback IP address for our routers, and we use that IP as the router-id in OSPF, and that IP gets distributed using OSPF (redistribute connected as type 1). We cooked up the below script to detect this situation and restart OSPF (we don't have the time, or adequate skill, to dig into the RouterOS code to find and repair the bug currently).
On the routers we've tested this on so far we've had success with it. There is a use-case we've not bumped into yet, and that is on multi-route configuration where the peer's loopback might actually be accessible via another path. In that case we'll probably have to check that the gateway ($nexthop inside the do={} block for the ip route check command) matches the address of the neighbor as well in the decision path (we're only up if the remote-id is reachable via nexthop matching that neighbor).
Store the following in a file called check_ospf.rsc and scp it to the routerboard.
Code: Select all
# Script to check if OSPF is functioning. It relies on the fact that each
# adjacent neighbor will be reachable via it's router-id, and that if we're
# unable to exchange routes for whatever reason, the router-id will be
# unreachable. By default a router that comes up will not permit itself to be
# rebooted until OSPF has come up at least once. The cases we've seen where it
# fails the failing side states neighbor state Full, but the working side says
# Exchange. So this check should prevent rebooting core routers, but alas, I'm
# not 100% confident of that.
#
# The logic here reboots if we're in a bad state with at leats one peer.
# Possibly this should be negated to only reboot if no peers are in a good
# state and at least one is in a bad state (more complex though).
# Reboot possibility will get enabled once at least one peer has managed to
# come up successfully.
#
# In case of multiple OSPF instances, if any one of them is functioning we move
# towards the mayreboot state, but we will only restart non-functioning
# instances.
:if ([/file find name=ospfstatus.txt] = "") do={
:put "ospf status file doesn't exist - creating."
/file print file=ospfstatus
/file set [/file find name="ospfstatus.txt"] contents="no"
:put "Done, please re-run the script."
# Continuing with the rest of the script is pointless as our view of /file is
# a snapshot in spite of /file set ... which above sets it to some arbitrary
# content (looks like a file list).
} else={
:local mayreboot [/file get ospfstatus.txt contents]
# After initial set the file content is completely bogus ...
:if ($mayreboot != "yes" && $mayreboot != "no") do={
:set mayreboot "no"
}
:put "Checking OSPF (mayreboot=$mayreboot) ..."
:foreach n in=[/routing ospf neighbor find where state="Full"] do={
:local loopback [/routing ospf neighbor get $n router-id]
:put "Remote OSPF $loopback @ $remoteaddress"
/ip route check $loopback once do={
:if ($status != "failed") do={
:put "OSPF is functioning correctly."
:if ($mayreboot != "yes") do={
:log info "OSPF restored - restoring"
/file set ospfstatus.txt contents="yes"
}
} else={
:put "OSPF is not functioning correctly."
:if ($mayreboot = "yes") do={
:local instancename [/routing ospf neighbor get $n instance]
/file set ospfstatus.txt contents="no"
:put "Restarting OSPF"
:log error "OSPF malfunctioned - restarting."
/routing ospf instance set [/routing ospf instance find name=$instancename] disabled=yes
/routing ospf instance set [/routing ospf instance find name=$instancename] disabled=no
}
}
}
}
}
Code: Select all
/system scheduler add interval=30s name=check_ospf on-event="/import check_ospf.rsc"