Effective Backup Strategy for a MikroTik Router in Disaster Recovery Scenarios

I aim to develop a robust backup and restore strategy for a critical MikroTik router in my network, with a relatively complex configuration. Recently, I had to replace the router (fortunately without an actual disaster) with a different unit of the same model and realized that my planned recovery approach was inadequate—especially if executed under pressure and/or by a colleague with limited knowledge of the network’s specific details.
I would appreciate any advice from those who have faced similar challenges in the past.

The Scenario

Imagine I have a MikroTik Router CCR2004-16G-2S+ managing inter-VLAN traffic for numerous VLANs, managing queuing and load balancing across three ISPs for outbound internet traffic, while also running containers on an external USB SSD and implementing relatively complex firewall rules, bonding/bridging and additional configurations.

Each night, I create a backup using an appropriate method, which is yet to be defined.

The Challenge: Proper Backup and Restoration on a Different Unit

Now, a real disaster strikes: the hardware fails, and someone with little or no knowledge of the intricacies of my network needs to restore it as quickly as possible on another CCR2004-16G-2S+, which we managed to find. Since our entire network relies on this router, recovery is urgent.

Issues Encountered During Restoration

In my testing, I encountered the following issues:

  • Restoring a backup on a different unit does not seem to work properly. Inter-VLAN traffic is not allowed, firewall rules cause numerous dropped packets, containers are restored but do not start, and Ethernet interfaces have different names, just to mention a few of the issues I observed. Strangely, internet access to the router via ZeroTier still functions.
  • Restoring from an exported configuration also fails to fully recover the system. Users and groups are missing, containers are not restored, inter-VLAN traffic is non-functional, and potential issues may arise—such as problems with ZeroTier identities or container settings.

On the simulation day, after trying both approaches, my access to the router was mostly limited to the console port, restricting my ability to diagnose the situation thoroughly.

Seeking a Reliable Recovery Method

While I can invest more time in debugging the restoration process now, I recognize the immense pressure of a real-life disaster recovery scenario. I hope a more straightforward, click-and-restore backup method is available.

for backup and config. version-history Unimus has covered me for quite some time

but this mainly just covers backing configurations up. no user or certificates (which are only automatically stored in the binary backups which are device specific sadly)

Another thing Export doesn’t catch…

Mac addresses seem to be an important issue when replacing routers.

Windows computers think they are on a different network when the gateway mac address changes; this can mess with Windows firewall / file sharing operation.

Upstream ISPs might also filter by mac address in their switch or whatever simulated layer2 system they use, even though we know it’s not providing any security.

You’d have to alter mac addresses manually in the spare router before changing. They could be gathered with SNMPwalk on the old router and/or manually documented depending on how many routers you care for.

We’ve also experienced the same pitfalls with config export and re-import to replacement hardware. We cannot simply copy/paste config like you can other vendor(s)…

We use Unimus for config backups. We found when deploying replacement hardware, we have to line by line copy and insert what we need.

Copy in interface settings [ exclude mac address]
copy in firewall rules
copy in other misc config settings [hostname, DDNS, DNS, static entries, disable service ports, etc etc]
copy in bridge settings [clear out admin-mac] – temp loss of connectivity if have bridge filtering enabled…
re-create users..

Still issues with certificates and others. We’ve not yet used containers.

If need is “cold standby”… having an identical unit, and use the .backup file to restore is pretty straightfoward. Obviously only ONE router can be online at same time when using .backup file, and may require cable swap (or VLAN re-assigment, etc) in the disaster scenario.

Alternatively, you can do some scripting of the config export to use variables and make sure stuff like user/certificates are added by the modified config… then always use your custom script to bring ANY router by changing some variables in config export (which is still just a RouterOS script, so you can use variables or have other if/else things).

If “warm standby” is needed, the approach be to add BGP and VRRP to topology, but this obviously way more involved.

@Amm0 – great point with the cold stand-by. This is common with other vendors [for the price point of the given hardware].

However, other vendors [Cisco, etc]. When moving to new hardware platform – or different switch models; even those – we still manually copy/paste interface config. At times is not 1:1 interface mapping [same goes for MikroTik].

The real pain with MikroTik is the bridge settings [admin-mac], bridge-vlan and the vlan-filtering - having to recreate or carry over. Along with other items you mentioned are more of a chore.

Love to see a better approach by MikroTik. I feel their days are feeling numbered or “old” in their approaches. Given other vendors with cloud controllers, or cloud management do this much better. Even Cisco/Meraki figured out how to do hardware replacement [we’ve done the procedure many times.]

We’ve talked about a cloud/controller but the discussion there has primarily been around provisioning and managing access points. But DR for the router makes sense as well. Problem with deploying to different hardware esp. the router would be a challenge.

I tried with .backup but it doesn’t works: ether interface gets different names (ether17… instead of ether1…), interface list membership and other things messed up, firewall drops everything, containers are recreated but refuse to start, just to name what I noticed from console/serial access.

In the past few days, I attempted to import the backup configuration into the new “identical” router with the same ROS version. However, I encountered several issues that caused the import process to stop:

  • The old CCR2004 apparently had two serial ports, while the new CCR2004 has only one. I had to remove the configuration lines referring to the second /port.
  • The old CCR2004 included references to paths like /usb/ros-data/logs/diskFirewall, but the new CCR2004 no longer accepts /usb/… and instead requires usb/…
  • ZeroTier instance names are not restored automatically and must be manually changed before import.
  • Handling users and groups is extremely challenging. My initial idea was to restore the .backup, reset the configuration while preserving users, and then reimport the config. However, this process requires extensive manual adjustments both before and after the import.
  • Containers cannot be recreated from the configuration import as-is, as the container images are not found.

After multiple attempts, several hours, and a few days of troubleshooting, I find myself stuck at this point and in need of a solution

Fortunately, the old router is still partially operational, and the business is temporarily closed. However, I can only imagine the chaos of having to deal with this during a real failure at peak business hours.

Is this a common challenge with all router brands? How do small business owners with MikroTik routers handle disaster recovery?

As a matter of course, I maintain an Excel workbook that tracks every configuration change. This comes out of change control procedures for all systems. If I had to commission a new router, I’d work through the configuration document. Not ideal but would get a new router probably back online in 30 mins.

unfortunately there is no single Mikrotik tool to make backups on one router and then restore that backup on a replacement router ;(
:frowning: :frowning: :frowning: :frowning: :frowning: :frowning:

There are multiple discussions in these Mikrotik forum topics about this issue and work-arounds.

For starters, I suggest always having two backups.

  • 1 ) a Mikrotik backup ( winbox → files → backup
  • 2 ) in the cli at the parent folder , export to a file
  • 3 ) copy the backup and the export to a safe place where you want to keep it.
    ** You have some backups you can use if/when needed.

Restoring either backup may have some issues , so read the forums for a procedure.

North Idaho Tom Jones

Yes, in addition to keeping a configuration change document, I usually backup and export after any configuration change. This is a fragment of my home router configuration. It contains enough detail that even somebody not very familiar with RouterOS could follow it:

There is one big flaw though with the above. Consider the scenario where the router was updated from v6 to v7 later on. Many of those instructions won’t work as written anymore, esp. around Wi-Fi and CAPsMAN. My home router has always been v7 but this topic has just prompted me to suggest reconfiguring a couple of client routers from scratch if they’ll allow me the down time.

http://forum.mikrotik.com/t/router-crashes-are-wiping-the-config/149189/7

At the time there were no containers, but basically you need to back up and sync:

files on device,
eSIM,
containers,
the dude database,
the dude files,
user-manager database,
user-manager files
certificates (CA for generate new certs not longer work on another router, as expected),
ssh host-key,
license,
configuration export (with sensitive!!!, do not contain user config),
user export (with sensitive still not export passwords, obviously)
binary backup (containing all except license, containers and files)
not tested if eSIM config is exported on backup and/or on export