Safe Config-as-Code deployment to MikroTik with rollback on failure

In fairness, there is forum "process" problem in hijacking the thread... so it might be better to start a new thread

The underlying issue is RouterOS does not have true transactions. Yet most configuration is a series of related commands that all work in concert, even something like adding a operational VLAN is at least 2 commands (add interface, add IP address, like more) and predicated on a bridge existing.

@pe1chl has some good tips on how to write RouterOS script that applies a change. I'd add that using if ([:len [find ...]] = 0) do={add ...] else ={set ...} constructs are a typical approach for things like Terraform/etc. But basically it's about "defensive scripting", and does help avoid the underlying limitation of RouterOS's config system.

IMO, the "safest" way to deploy changes programmatically is just applying the entire configuration using /system/reset-configuration run-after-reset= .... The gets you to higher-level "transaction" in that a new configuration applied, or not – so you don't get partial failures and know multiple devices have identical base configuration. Downside is a reboot is required, and since you're providing the entire configuration there is higher chance one item may fail... so deployed config needed to be tested.

In terms of /console/inspect, the request=highlight could be useful if you're update the backend "plugin" for something like Terraform. What this can do is take some "push-able" config and indirectly "check" it's correct. So if you want to validate "/bad add name=evil" you can use:

/console/inspect request=highlight input="/bad add name=evil" 

which will return an array of values. Normally these are used internally by CLI for colorizing the CLI... But since CLI does highlight errors in red, you can search the array for error or obj-invalid. If you have either of those texts in the array, the command could be rejected before sending in Terraform/etc. If you ran it on same router you're planning on deploying to, then it be perfectly matched to that version (i.e. if some attribute name changed between versions, this approach find that).

This is the approach my LSP takes to validate RouterOS script. See

and there is some TypeScript code inside the linked GitHub project that shows the checking, and forum discussion documents much of /console/inspect.

Finally, there is the CLI "safe mode". I'm not sure there been much investigation in how that could be used for Terraform/etc since it's pretty new. But might be worth some testing if this is area you're interested in, but since I have not tested it much... I'm not sure there aren't corner cases etc were some "CLI safe mode" scheme might fail.