Community discussions

MikroTik App
 
rounin
just joined
Topic Author
Posts: 21
Joined: Thu Mar 24, 2022 6:03 am

Failover to part1 script

Sat Apr 09, 2022 11:54 am

Here is an attempt at a simple failover rescue script.

Partition 0 considered "primary", partition 1 considered "safe-config". HW is a CCR-2116.

I want to switch to part1 if the router has booted, but the network appears to be down, ie if upgrade borks so it boots ok but an interface is lost / corrupted, which happened to me in a 7.2 upgrade.

Any thoughts on this concept of operation? Any best practices for deciding the current partition is healthy? Trying to keep it simple.
{
	:if( [/partitions get part1 active] ) do={
		:log info "Safeboot Watchdog: Already on part1, exiting"
		/quit
	}	

	:log info "Safeboot Watchdog: Waiting 180s"
	:delay 180s

	# Maybe we probe a web page instead of ping
	# :local fetchresult [/tool/fetch url="https://www.google.com" mode=https check-certificate=yes as-value output=user]
	# :if($fetchresult->"status" = "finished") do={

	:for i from=0 to=5 do={
		:log error "Safeboot Watchdog: Pinging 8.8.8.8"

		:if( [/tool/ping address=8.8.8.8 count=10] = 0) do={
			:log error "Could not ping 8.8.8.8"
			:delay 30s
		} else={
			:log info "Safeboot Watchdog: Ping ok"
			/quit
		}
	}

	# Maybe we probe another host, if the first one is down
	:for i from=0 to=5 do={
		:log error "Pinging 8.8.4.4"

		:if( [/tool/ping address=8.8.4.4 count=10] = 0) do={
			:log error "Safeboot Watchdog: Could not ping 8.8.4.4"
			:delay 30s
		} else={
			:log info "Safeboot Watchdog: Ping ok"
			/quit
		}
	}

	:log error "Safeboot Watchdog: Network seems to be down!"

	:log info "Safeboot Watchdog: Activating part1"
	/partitions {
		activate part1
	}

	:log info "Safeboot Watchdog: Rebooting"
	/system reboot
}


 
fragtion
Member Candidate
Member Candidate
Posts: 257
Joined: Fri Nov 13, 2009 10:08 pm
Location: Johannesburg, South Africa

Re: Failover to part1 script

Sat Apr 09, 2022 12:33 pm

I love this idea and will watch this thread closely. I manage several mikrotik devices remotely, most vulnerable as a "single point of failure" due to tight budgets. 7.2 bricked a number of devices (at least 1 in every 10) that I upgraded although all of them would still boot fine, only to a corrupted config... I never thought of using partitions as a way to recover from that, but as long as these "partition watchdog scripts" themselves don't get wiped by the corruption, it seems this could be a real lifesaver approach (albeit tedious to set up, and maybe prone to false-positives). Of course the best solution would be more reliable software/firmware update procedure from Mikrotik themselves, but some of us don't have the time or luxury of waiting indefinitely for things that may never even happen ^_^ Every update seems to introduce a myriad of new issues, which it would seem the upgrading process itself clearly isn't immune from

I'm new to partitions on MikroTik though. Any pointers to a crash course for dummies ? Can't find any video content on youtube at least
 
User avatar
rextended
Forum Guru
Forum Guru
Posts: 11967
Joined: Tue Feb 25, 2014 12:49 pm
Location: Italy
Contact:

Re: Failover to part1 script

Sat Apr 09, 2022 12:39 pm

On production device do not put 7.x, but keep 6.48.6 long-term, use 7.2 if the 7.x is the only version compatible with that device...

And about partitioning, not work with all device, or at least you need space inside, and partitioning 16M is a bad idea for working space...


And about the script, I don't have any to say, is functional.
 
rounin
just joined
Topic Author
Posts: 21
Joined: Thu Mar 24, 2022 6:03 am

Re: Failover to part1 script

Sun Apr 10, 2022 11:49 am

Huh. Seems like ping command is different in 7.2 vs 6.48.

In 6.48, I get a 0 or a 1 as a result from ping, so the if statement works.
[admin@RouterOS] > :global val [/ping address=8.8.8.8 count=1]     
  SEQ HOST                                     SIZE TTL TIME  STATUS           
    0 8.8.8.8                                    56  59 8ms  
    sent=1 received=1 packet-loss=0% min-rtt=8ms avg-rtt=8ms max-rtt=8ms 

[admin@RouterOS] > :put $val                                  
1
In 7.2, I don't seem to get any result from ping, my if statement always takes the else path.
[admin@MT-RB4011] > :global val [/ping address=8.8.8.8 count=1]
Columns: SEQ, HOST, SIZE, TTL, TIME
SEQ  HOST     SIZE  TTL  TIME    
  0  8.8.8.8    56   60  9ms450us

[admin@MT-RB4011] > :put $val

[admin@MT-RB4011] > 

It seems like ping behaves differently with storage to variable in v7.2? The stats line "sent=1 received=1 packet-loss=0% min-rtt=8ms avg-rtt=8ms max-rtt=8ms" is also missing.
 
rounin
just joined
Topic Author
Posts: 21
Joined: Thu Mar 24, 2022 6:03 am

Re: Failover to part1 script

Sun Apr 10, 2022 11:55 am

:global val [/tool ping address=8.8.8.8 count=1 as-value]
returns something sensible in v7. I'll update to that
 
rounin
just joined
Topic Author
Posts: 21
Joined: Thu Mar 24, 2022 6:03 am

Re: Failover to part1 script

Sun Apr 10, 2022 12:20 pm

Somewhat annoyingly the output of ping as-value is not very normalized.

A good ping could return
.id=*0;host=8.8.8.8;seq=0;size=56;time=00:00:00.008848;ttl=60
and a bad ping could return
.id=*0;host=8.8.8.9;seq=0;status=timeout
, ie, no status field on success.

Would be nice if status field was always there. So treating no status as a success, by checking :typeof ($pingres->"status") = "nothing"
{
	:if ( [/partitions get part1 running] ) do={
	:log info "Safeboot Watchdog: Already on part1, exiting"
	/quit
	}	

	:log info "Safeboot Watchdog: Waiting 180s"
	:delay 180s

	# Maybe we probe a web page instead of ping
	# :local fetchresult [/tool/fetch url="https://www.google.com" mode=https check-certificate=yes as-value output=user]
	# :if ($fetchresult->"status" = "finished") do={

	:for i from=0 to=9 do={
	:log info "Safeboot Watchdog: Pinging 8.8.8.8"

		:do {
			:local pingres [/tool/ping address=8.8.8.8 count=1 interval=2 as-value]
			:if ( [:typeof ($pingres->"status")] = "nothing" ) do={
				:log info "Safeboot Watchdog: Ping 8.8.8.8 ok"
				/quit
			} else={
				:log warning "Safeboot Watchdog: Could not ping 8.8.8.8"
				:delay 30s
			}
		} on-error={
			:log warning "Safeboot Watchdog: Ping error 8.8.8.8"
			:delay 30s
		}
	}

	:log error "Safeboot Watchdog: Network seems to be down!"

	:log info "Safeboot Watchdog: Activating part1"
	/partitions activate part1

	:log info "Safeboot Watchdog: Rebooting"
	/system reboot
}

 
kirstenmw
just joined
Posts: 1
Joined: Thu Sep 03, 2020 11:23 am

Re: Failover to part1 script

Mon May 30, 2022 1:09 pm

Huh. Seems like ping command is different in 7.2 vs 6.48.

It seems like ping behaves differently with storage to variable in v7.2? The stats line "sent=1 received=1 packet-loss=0% min-rtt=8ms avg-rtt=8ms max-rtt=8ms" is also missing.
Only with count=1 or =2. With count=3 (or more) you get received packages as return value as with 6.48.
 
armyofonegh
just joined
Posts: 1
Joined: Mon May 30, 2022 5:06 pm

Re: Failover to part1 script

Mon May 30, 2022 5:31 pm

Did the upper script work ?
 
User avatar
rextended
Forum Guru
Forum Guru
Posts: 11967
Joined: Tue Feb 25, 2014 12:49 pm
Location: Italy
Contact:

Re: Failover to part1 script

Mon May 30, 2022 6:54 pm

Those is a question?

Who is online

Users browsing this forum: No registered users and 18 guests