Here is an attempt at a simple failover rescue script.
Partition 0 considered “primary”, partition 1 considered “safe-config”. HW is a CCR-2116.
I want to switch to part1 if the router has booted, but the network appears to be down, ie if upgrade borks so it boots ok but an interface is lost / corrupted, which happened to me in a 7.2 upgrade.
Any thoughts on this concept of operation? Any best practices for deciding the current partition is healthy? Trying to keep it simple.
{
:if( [/partitions get part1 active] ) do={
:log info "Safeboot Watchdog: Already on part1, exiting"
/quit
}
:log info "Safeboot Watchdog: Waiting 180s"
:delay 180s
# Maybe we probe a web page instead of ping
# :local fetchresult [/tool/fetch url="https://www.google.com" mode=https check-certificate=yes as-value output=user]
# :if($fetchresult->"status" = "finished") do={
:for i from=0 to=5 do={
:log error "Safeboot Watchdog: Pinging 8.8.8.8"
:if( [/tool/ping address=8.8.8.8 count=10] = 0) do={
:log error "Could not ping 8.8.8.8"
:delay 30s
} else={
:log info "Safeboot Watchdog: Ping ok"
/quit
}
}
# Maybe we probe another host, if the first one is down
:for i from=0 to=5 do={
:log error "Pinging 8.8.4.4"
:if( [/tool/ping address=8.8.4.4 count=10] = 0) do={
:log error "Safeboot Watchdog: Could not ping 8.8.4.4"
:delay 30s
} else={
:log info "Safeboot Watchdog: Ping ok"
/quit
}
}
:log error "Safeboot Watchdog: Network seems to be down!"
:log info "Safeboot Watchdog: Activating part1"
/partitions {
activate part1
}
:log info "Safeboot Watchdog: Rebooting"
/system reboot
}
I love this idea and will watch this thread closely. I manage several mikrotik devices remotely, most vulnerable as a “single point of failure” due to tight budgets. 7.2 bricked a number of devices (at least 1 in every 10) that I upgraded although all of them would still boot fine, only to a corrupted config… I never thought of using partitions as a way to recover from that, but as long as these “partition watchdog scripts” themselves don’t get wiped by the corruption, it seems this could be a real lifesaver approach (albeit tedious to set up, and maybe prone to false-positives). Of course the best solution would be more reliable software/firmware update procedure from Mikrotik themselves, but some of us don’t have the time or luxury of waiting indefinitely for things that may never even happen Every update seems to introduce a myriad of new issues, which it would seem the upgrading process itself clearly isn’t immune from
I’m new to partitions on MikroTik though. Any pointers to a crash course for dummies ? Can’t find any video content on youtube at least
Huh. Seems like ping command is different in 7.2 vs 6.48.
In 6.48, I get a 0 or a 1 as a result from ping, so the if statement works.
[admin@RouterOS] > :global val [/ping address=8.8.8.8 count=1]
SEQ HOST SIZE TTL TIME STATUS
0 8.8.8.8 56 59 8ms
sent=1 received=1 packet-loss=0% min-rtt=8ms avg-rtt=8ms max-rtt=8ms
[admin@RouterOS] > :put $val
1
In 7.2, I don’t seem to get any result from ping, my if statement always takes the else path.
[admin@MT-RB4011] > :global val [/ping address=8.8.8.8 count=1]
Columns: SEQ, HOST, SIZE, TTL, TIME
SEQ HOST SIZE TTL TIME
0 8.8.8.8 56 60 9ms450us
[admin@MT-RB4011] > :put $val
[admin@MT-RB4011] >
It seems like ping behaves differently with storage to variable in v7.2? The stats line “sent=1 received=1 packet-loss=0% min-rtt=8ms avg-rtt=8ms max-rtt=8ms” is also missing.