Best practices for keeping RouterOS updated on production networks?

Hi everyone,

I'm curious about how others handle RouterOS updates on devices that are actively being used in production environments.

Do you typically install stable updates as soon as they're released, or do you wait a certain amount of time to see if any issues are reported by the community first?

I'm especially interested in hearing about update strategies for small business and home-office setups where downtime needs to be minimized. Do you have a testing process, backup routine, or rollback plan before upgrading?

Looking forward to hearing how others approach this.

Thanks!

Hi,

There's no grand unified theory of how to update, but I think most agree with these points.

Preparation:

  • always be prepared for something to go wrong; be prepared for a possible netinstall
  • going with this, I always create both a binary backup (files->backup) and text export (/export show-sensitive file=whatever.rsc) and copy them off-router; I also always include the exact version on which these were made in the filename - if a restore is necessary, it's usually the most likely to work on the exact same version
  • backup any keys and certificates that may be important to you

This sounds dire, but in my experience, and especially for simpler setups, updates rarely go very wrong.

On version selection:

  • don't install beta/rc in production unless absolutely necessary
  • I would suggest the long-term branch; it doesn't include the latest features, but it's guaranteed to receive security related updates
  • regardless of whether you choose stable or long-term, it's best to wait a few days before installing it and to at least skim the release topic for any issues that may become known
  • some people wait for at least the ".1" version to install a stable version (so they don't install 7.19, but wait for 7.19.1)
  • there is absolutely no rule that the latest version would be mandatory - if one works for you, there may be no reason to update
  • especially if you're not on the latest in whichever release channel you choose, keep an eye on CVEs, but don't be scared: usually vulnerabilities only affect a single, often rarely used functionality, which you may not even have enabled, plus there are often mitigations

Additionally:

  • don't forget to update the firmware as well (system->routerboard)
  • if you're administering more than one device, try to keep them on the same version (of course this doesn't imply that staggered updates are a bad idea.)

Oh, and lets not forget:

  • if your device has enough flash (at least 128 MB) you can use partitions - they're the best for quickly rolling back

And don't forget, boards with LTE-modems you could also upgrade them.

In addition, if downtime absolutely has to be minimized, have 2 routers. Update 1 of them first and make sure that everything is working properly before updating the second. This will allow you to switch over to the backup (working) router while you get the failed one working again.

This might not be a common arrangement but sometimes backup hardware is the way to go.

  1. If work, do not fix it (= update only for true security reason or critical error on what function you use)
  2. Automatically, en masse, as soon as updates are released, in production is one of the stupidest things there is.
  3. Try to have the same version everywhere... You fight with the same flaws everywhere, and they all behave uniformly.
  4. If you update the CPEs... first update the closest ones, then wait a month, and if you see that everything remains stable, increase the distance until you gradually update them all.
  5. "Upgrade" non-CPE devices by physically replacing them with devices that are already installed and configured...
    You always have spare devices... right?
  6. Test at least a full month before putting it into production, only on non-critical machines and for which you leave the replacement "en place"...
  7. Stop by the forums, there's always some idiot who immediately pushes the latest builds into production, so you can see they haven't even tested them for literally 5 minutes... They just run one for 1 minute, then they update it 40... and after just 5 minutes all devices crashes... (recent events)...
  8. Don't trust users who write bulleted lists
  9. Don't trust anyone who tells you not to trust someone.
  10. Wait 7.23.5 (long-term)

LOL!
:rofl:

You sure could do a how to write (Mikrotik scripts) well like Umberto Eco did for Italian:
https://www.aessecommunication.it/regole-umberto-eco-scrivere-bene/
(here is an English version):
https://gioclairval.blogspot.com/2010/02/umberto-ecos-rules-for-writing-well.html

I would generally agree on this if MT is fully transparent in version changelogs where in some version is fix for some security issue, doesn't need to be stated which and if is already discovered (like 0-day, CVE, etc...), but we have usually some ambiguous changelog points like "improved/fixed service stability", etc . We can see statements on forum from MT staff like "Keep ROS up-to-date to have to have latest security patches", etc.. for non-metioned security parches in changelogs. By this logic every new version has security fix...

rule 34

  1. Do not indulge in archaic forms, apax legomena and other unused lexemes, nor in deep rizomatic structures which, however appealing to you as epiphanies of the grammatological differance (sic), inviting to a deconstructive tangent – but, even worse it would be if they appeared to be debatable under the scrutiny of anyone who would read them with ecdotic acridity – would go beyond the recipient's cognitive competencies.

causes "Sytem Error, shutting down, please restart"
:slight_smile:

- Keep RouterOS updated to the latest version (v7) :stuck_out_tongue: