Restarting an interface when it is disconnected

hello
i am looking to restart an LTE interface if it is disconnected (no internet) for 10 seconds

is my script feasible please?

/system script
add name=CheckInternetLTE10s policy=read,write,test source="
:local targetIP \"8.8.8.8\"
:local interfaceName \"lte1\"
:local failCount 0

:for i from=1 to=10 do={
    :if ([/ping \$targetIP count=1 timeout=1000] = 0) do={
        :set failCount (\$failCount + 1)
    }
    :delay 1s
}

:if (\$failCount = 10) do={
    /log warning (\"[LTE Watchdog] No internet for 10s. Rebooting interface \$interfaceName\")
    /interface lte disable \$interfaceName
    :delay 5s
    /interface lte enable \$interfaceName
}
"

I would add the interface= to /ping command just for sure that the trafic originates on that interface:

 :if ([/ping \$targetIP interface=yourLTEInterfaceNameHere count=1 timeout=1000] = 0) do={
1 Like

Why do you need to use a loop? I think it’s too much of unneeded code and it could be reduced to this, but it would be vice versa, i. e. ping command will return successful attempts. Added @BartoszP suggestion either.

:local targetIP "8.8.8.8"
:local interfaceName "lte1"

:if ([/ping $targetIP interface=$interfaceName count=10 timeout=1s] = 0) do={
    /log warning ("[LTE Watchdog] No internet for 10s. Rebooting interface $interfaceName")
    /interface lte disable $interfaceName
    :delay 5s
    /interface lte enable $interfaceName
}

Also, I personally prefer to put a working part of a script into main infinite loop, so it will work on its own. The main reason is that script run count value is written to flash, so it’s to avoid excessive writes. To make sure it doesn’t fail, I’m using a scheduler, that checks, if a script is running, so it’s a kind of watchdog.

1 Like

what do you mean by this :“prefer to put a working part of a script into main infinite loop, so it will work on its own.”

also do i need a scheduler for your script ? i think it is cleaner than mine

First of all, let’s fix one issue in the code. I was inattentive and just copied timeout parameter from your code. There is no such parameter in ping command, it should be replaced by interval.

You need a scheduler anyway, doesn’t matter if you use your on mine code. Both of them will run only one time, when you execute them. So, scheduler is required to launch the script with some interval. Because the script execution time is floating (it could be little more than 10 seconds, when everything is fine and interface is not restarted, or it could be more than 15 seconds, when interface is restarted), you need about 20 seconds interval for your scheduler task. Or may be reduce it, but in such case it’s required to check in scheduler’s code, if a script is already running before executing it, to avoid multiple instances.

And one more very important thing. Because you are using lte interface, its startup time could be quite long. I don’t know, how fast is yours, but my lte interface startup time is about 15 seconds. In such case, if your script restarts lte interface after unsuccessful ping, it will then do pinging, using a not yet running interface. But even if the interface startup time is less, the script will anyway start pinging, when interface is not yet running. This will lead to infinite interface restarting condition, and your interface will never go up!

To solve this, the script should check, if the lte interface is running prior to pinging anything.

So, you need to add the following to your code:


:local state [/interface/lte/get $interfaceName running]
:if ($state = true) do={
    #ping command here
}

And your code will look like this:

:local targetIP "8.8.8.8"
:local interfaceName "lte1"

:local state [/interface/lte/get $interfaceName running]
:if ($state = true) do={
    :if ([/ping $targetIP interface=$interfaceName count=10 interval=1s] = 0) do={
        /log warning ("[LTE Watchdog] No internet for 10s. Rebooting interface $interfaceName")
        /interface lte disable $interfaceName
        :delay 5s
        /interface lte enable $interfaceName
    }
}

So, I would suggest 2 options for you:

  1. In addition to the code above, you need to add a scheduler task, that will periodically execute a script.

The code for scheduler will be like that:

:local sName "YourScriptName"

:local isRunning [/system script job print count where script=$sName]
:if ($isRunning = 0) do={
	/system script run $sName
}

With this code you can set any interval for scheduler within 10-20 seconds, it won’t allow more than 1 script instance to run. I would recommend 15 seconds. An of course set start-time parameter of your scheduler task to startup.


  1. Regarding main infinite loop, I mean this:
:local targetIP "8.8.8.8"
:local interfaceName "lte1"

:do {
    :local state [/interface/lte/get $interfaceName running]
    :if ($state = true) do={
        :if ([/ping $targetIP interface=$interfaceName count=10 interval=1s] = 0) do={
            /log warning ("[LTE Watchdog] No internet for 10s. Rebooting interface $interfaceName")
            /interface lte disable $interfaceName
            :delay 5s
            /interface lte enable $interfaceName
        }
    }
    :delay 100ms
} while=(true)

Here I’ve placed the working part of the code into infinite loop, so this script will run on its own and doesn’t need to be executed by scheduler every 15 seconds. It should be executed by scheduler only one time on router startup. You can use the same scheduler code as in option 1 for that, with start-time parameter set to startup and interval set to 0.

But for reliability I do recommend to use a scheduler as a watchdog, so it will periodically check, if a script is running. Even if your script is ideal, it could fail for many reasons, including RouterOS bugs. So, just set scheduler task interval to let’s say 30 seconds or 1 minute, or choose some other value you like. This will be the interval for checking if the script is running, and if for some reason it’s not running, it will be just started.

Also, I usually add something like that to the very beginning of the script:

:while ([/system/resource/get uptime] < 1m) do={
	:delay 10s
}

This will postpone script execution for the first minute after router startup. There is no any sense to check interface immediately after router startup, when interfaces are not yet started. You may adjust this value for your needs.

1 Like

The better question here is why not just use netwatch, and add the LTE disable/enable there. With the “icmp” check, you can define interval/timeouts/etc/etc (almost too many). And with netwatch the on-down script become simple with the enable/disable.

3 Likes

then i would only need a down script like the below , correct ?

:log warning “Interface disabled”;
/interface ethernet disable lte1
delay 10s;
:log warning “Interface enabled”;
/interface ethernet enable lte1

Yep.

The tricky part (if you use not the “simple” netwatch, but rather the newer, more powerful ICMP probe) is finding the right settings to trigger (or not trigger) the “down” script.

See:

@amm0
If I get this stuff right, the netwatch has a distinct advantage over “normal” scheduling, the “down” script is only run when the status changes from “up” to “down”, whilst a normally scheduled script would need to first check if the connection is not already down, otherwise if the disable/reenable doesn’t work (or if a connection is down for any other reason) the interface would be cyclically disabled and enabled every 20 seconds or so until the connection is restored (IF it is restored).

I know this goes against the usual advice to use the built-in tools when they are available, but for these sorts of things I usually use a custom script. That way you can tune how to detect failure (how many hosts pinged, how many times, how often, when should the link be considered failed) and what action to take, how long to wait after startup or restarting the link for it to come up, etc.

Once you have a script that’s adequately debugged it’s really not much of a hassle.

Correct, netwatch script only run when state changes. So when going from up to down, the script it called. If next fails, and previous failed, the script is not called.

Also correct. And that’s kinda the difference, in a /system/schedule script you have both determine the state & take action & track what actions were taken. With netwatch, you can focus on action and have a GUI (perhaps with complex variables) to define your “rules”.

The only side point on up-script={} and down-script={} netwatch scripts is possible you do get “flapping” – especially with icmp check – since “up” and “down” may transition based on latency or other “partially down” situations. These flaps may be just netwatch configured incorrectly or too aggressively… but assuming internet is just poor (test is right) and you want to something other than reboot, using netwatch get tricker with just “up” and “down” from a single test.

To that point, there is also the test-script={} option – this acts exactly like /system/scheduler except happens at the netwatch interval. But provides the results of the last test, so you’re starting with some data (vs. “pure” scheduled script where you’d have to calculate if up/down yourself).

And if you have multiple conditions you’re looking for before taking action, that gets little trickier. In scheduler, you can add more :ifs and :global lastValueXxxx things. In netwatch you’d have to use one netwatch script as the “master” with the test-script= that in it’s code reads values of other active netwatch monitors with the additional tests to achieve a some netwatch “parlay” (i.e. multiple values must be true or false for the action).

so what is the best advice ? just use a script to detect downtime and reboot interface or use a down script in netwatch ?

i just want to restart the interface if internet is down for like 5 minutes

It depends, you can use a custom script as the “down” script in netwatch or call from the “down” script your custom script.

The Netwatch is only (or can be only) the trigger mechanism, with - as explained before - some advantage over the alternative, which is the “normal” scheduler.

The custom script - independently from the mechanism that triggers it - can do further checks before resorting to disabling and enabling the interface (avoiding possible false positives) and - if used through the scheduler - should have provisions to avoid disabling/enabling loops.

1 Like

If you want simple, you’d use netwatch and ICMP check. For 5G/LTE, monitoring ICMP latency is helpful since congested towers or other issues manifest in higher latency and that comes “for free” in netwatch (i.e. you just config params that indicate a failure, or partial failure). And in “simple” scheme, you just reboot if LTE is only device for internet.

If you want to have some “tiered” scheme, where failure tries to enable/disable LTE interface to cause it “fix” a problem before resorting to a reboot. That also valid thing to do, but it comes with complexity since you have to be sure you do re-enable the interface in all case.

IMO, while there bugs in LTE/5G, they’re often fixed in future versions. So getting too creative looking for some condition can make things worse since you’d don’t know if future LTE/5G bugs would have same condition. Why I recommend just doing a /system/reboot in the netwatch down script, using an interval of 5m in network (see @jaclaz spreadsheet for help in setting ICMP value)

And, I’m not saying a /system/schedule or “doing more” is bad approach… it’s just you need more custom scripting for that, while with netwatch it’s only some complex parameters to figure out.

1 Like

in netwatch , when the ping is out , the down script is ran..what happens when the ping is working again ?

You rebooted, so hopefully the problem was fixed. If it really keeps rebooting, and enable/disable fixes it, that’s something worthy to report to MikroTik & indicated by device rebooting all the time.

Now you do have to be careful to configure the ICMP watch carefully as MikroTik’s defaults are a bit “aggressive” at detecting failure with LTE.

1 Like