Safe Config-as-Code deployment to MikroTik with rollback on failure

mabels · November 29, 2025, 1:27pm

This was historically correct, but today I use an AI to reduce the time/risk between test and prod. And in home lab environments or with emergency security updates, this lead time can be down to minutes. So I want an automation that at least does everything possible to reduce the risk of operational failures.

That’s why the schema could be a useful tool, and use instead of remove/add — set [find]. As also a
A recoverable configuration change, like “netplan try”, is the lowest shell of a safety net for quick infrastructure adaptation.

pe1chl · November 29, 2025, 3:01pm

The problem with @rextended is, he is not the guy to have a discussion about concepts. When you gave an example like that, he is not realizing that you just gave an example, and he will come back like “but advertise=yes is already the default! you stupid guy!”.

Rest assured, it has happened to me as well. He just does not get that.

About not being able to retrieve default values using some explicit action (like type value=+ and then TAB and it would expand to the default value, for example): that is sad but it is the same in many other manufacturer’s devices. E.g. in Aruba and Cisco it is the same. When you happened to set the default value it will disappear in the export, but when you have set something different and want to get back to the default, you have to use documentation or trial-and-error.

Amm0 · November 29, 2025, 3:58pm

In fairness, there is forum "process" problem in hijacking the thread... so it might be better to start a new thread

The underlying issue is RouterOS does not have true transactions. Yet most configuration is a series of related commands that all work in concert, even something like adding a operational VLAN is at least 2 commands (add interface, add IP address, like more) and predicated on a bridge existing.

@pe1chl has some good tips on how to write RouterOS script that applies a change. I'd add that using if ([:len [find ...]] = 0) do={add ...] else ={set ...} constructs are a typical approach for things like Terraform/etc. But basically it's about "defensive scripting", and does help avoid the underlying limitation of RouterOS's config system.

IMO, the "safest" way to deploy changes programmatically is just applying the entire configuration using /system/reset-configuration run-after-reset= .... The gets you to higher-level "transaction" in that a new configuration applied, or not – so you don't get partial failures and know multiple devices have identical base configuration. Downside is a reboot is required, and since you're providing the entire configuration there is higher chance one item may fail... so deployed config needed to be tested.

In terms of /console/inspect, the request=highlight could be useful if you're update the backend "plugin" for something like Terraform. What this can do is take some "push-able" config and indirectly "check" it's correct. So if you want to validate "/bad add name=evil" you can use:

/console/inspect request=highlight input="/bad add name=evil"

which will return an array of values. Normally these are used internally by CLI for colorizing the CLI... But since CLI does highlight errors in red, you can search the array for error or obj-invalid. If you have either of those texts in the array, the command could be rejected before sending in Terraform/etc. If you ran it on same router you're planning on deploying to, then it be perfectly matched to that version (i.e. if some attribute name changed between versions, this approach find that).

This is the approach my LSP takes to validate RouterOS script. See

and there is some TypeScript code inside the linked GitHub project that shows the checking, and forum discussion documents much of /console/inspect.

Finally, there is the CLI "safe mode". I'm not sure there been much investigation in how that could be used for Terraform/etc since it's pretty new. But might be worth some testing if this is area you're interested in, but since I have not tested it much... I'm not sure there aren't corner cases etc were some "CLI safe mode" scheme might fail.

mabels · November 29, 2025, 11:15pm

You are right about starting a new thread, but now it’s too late.

Thanks for your idea, it helps me to understand slowly that Mikrotik's strength is also its biggest weakness. And is this the easy interactive interaction, but that is the worst enemy to automation. And I’m unsure if it’s a good idea to try harder to work around these implicit weaknesses.
So there are multiple solutions possible:

switch os — but then lose very good small hardware support
abandon the scripting approach and move to an automated interaction with the router — that has been a wet dream in the past, but today, why not today, by using AI. And taking the risk of failures like humans also does. But with that, we should be able to go around the given limitation to automation of the ROS itself.

Left thinking — thanks a lot for the inspiration.

Amm0 · November 29, 2025, 11:26pm

In the spirit of not fighting RouterOS configuration scheme... e.g. RouterOS is more Code-As-Config (rather than the title Config-as-Code) — since there is no JSON/YAML/XML representation of the configuration (import/export are more script runner/generator than say some typical import/export of a "configuration file")

If the need to controlling VLANs across routers/switches, RouterOS recently support MVRP which allows VLAN "distribution". While not from Terraform/etc., but with the right MVRP configuration you get some of same goals without much modifing each router/switch.

Similar with stuff like CAPsMAN, and any routing protocol – you can always these standard networking things to "configure" stuff like routes/etc & then Terraform/etc is left merely to enable the desired protocol.

pe1chl · November 30, 2025, 10:05am

It largely depends on the kind of setup and its scale if this is a problem in practice.

Sure when you have a large network of many routers all configured similarly but not the same, it can look like a good idea to automate their configuration and changes. But when it is a single site with a large number of access points, it may not be required to go that path and instead use CAPsMAN as a centralized configuration solution.

There have been attempts to bring the matter under the attention of MikroTik, but it seems they do not consider it really important and it would probably also be a large change in their code. And they are already fighting the problem of code size relative to flash size in many models. Another topic in that discussion is “how do I migrate the configuration of my existing old MikroTik router to my exciting new device”. There really is no click-and-go solution for that, and some other manufacturers do have that.

roemer · January 3, 2026, 2:35pm

An idea might be to use the mode-button or the reset-button in case no mode-button exists for specific model.

/export compact file=latest
/system script add name=on-mode-button source={:log info message=("mode-button reverting latest"); /system reset-configuration keep-users=yes no-defaults=yes caps-mode=no skip-backup=yes run-after-reset=latest.rsc; }
/system routerboard mode-button set on-event=on-mode-button enabled=yes

pe1chl · January 3, 2026, 2:41pm

Unfortunately that has a high risk of failure.

no-defaults=yes will not give you an entirely empty config. e.g. RouterOS stubborningly adds a DHCP client.

when you have decided to keep that, it will be in your export, and the import of the exported config will fail due to the already existing DHCP client. And stupidly the import stops there, instead of continuing ignoring this minor error.

That has been brought to the attention of MikroTik many times, but they just ignore that.

roemer · January 3, 2026, 3:58pm

I don’t have any dhcp-client configured and it worked well so far.

But if reset-config fails with dhcp-client configured, /system backup save | load might work instead.