v7.18.2 [stable] is released!

I believe that a (better) system should require a “commit” action to save changes. When making many configuration changes - such as adjusting firewall rules extensively - it is the final result that matters, not the many steps taken to get there. If the end result is just three changes to existing rules, the system should not save every intermediate step automatically.

When I updated four mangle rules today using “/ip/firewall/mangle/set connection-state=new [find where action=mark-connection]” it created four entries in the history. So far so good. Afterwards, I executed “/undo” four times, but only three changes were reverted. The first of the four mangle rules remained in its last state. “/system/history/print” told me: all 4 got reverted. In situations like this, I find these “auto-commits” quite frustrating.

Hi guys,

You can use safe mode Here.
You can undo all changes in one step.

I was about to say that but it has a limit of 100 commands.

It would be enough if the safe mode was automatic(*) and asked for confirmation of changes before exiting…

(*) Increased from 100 to ad libitum

I have to say: similar also happened approx. 1 or 2 years back while I was in safe mode. I enabled safe mode, did at max. 10 changes or like. Then I left safe mode. It did undo most - but not all changes I have made while I was in safe mode. I guess it is because safe-mode utilizes the undo/history system to accomplish the task. This is why I do not use safe-mode. I was so baffled by this experience that I tried to find a “step by step” to reproduce this bug and report to Mikrotik support - but I was not able to find a simple command sequence to trigger the bug.

No way to compare commit, commit trial, commit dry-run, commit full, rollback… With that thing called “safe-mode”.

I used safe mode without any issues, but it has it’s limitations, for example bridge changes that lead to short disconnects. However, if I have to do something that is really … dicey , I make a full binary backup and then assign a restore via the scheduler with enough time in advance. Another way for very important systems is to use partitions and a fallback partition with a reboot task with enough time in advance to make the changes. ALWAYS backup before any significant change.

Yes Safe mode can be seen as some kind of quasi-commit like system but not completely equal to this.
Fortinet-like configuration integrity enforcement creates obviously some overhead and with our package sizes constrained to ingenious “16 MB should be enough for everything” strategy it can be a problem. Typical Fortigate device install image can be up to 100 MB in size but then again Fortigate has many functionalities that basic router like Mikrotik does not have…

A crucial difference between “Safe mode” and also “apply changes only in RAM and require explicit save” and a transaction-like system is that with the latter you can apply a series of configuration changes and do an APPLY at the end of it.
It is possible in RouterOS cmdline mode, by typing a { and then entering multiple commands, close with } and then they are applied.
But in some other systems the same functionality is available in the GUI.
You change a couple of things, and there is a TEST button, when you click that all open changes are applied, and when you do not hit a KEEP button within 3 minutes they are all rolled back.
(so when you make a change that locks you out, they are discarded)
Better than the “Safe mode” of RouterOS, because that immediately rolls back changes when your session disconnects, which may be expected to happen e.g. when you change something in the VPN that you are using to access the device.
In the mentioned method you can re-connect and when that succeeds (within 3 minutes) you can save the changes.

I was already giving the same advice a few years ago, it’s how I work…

  1. If it’s a bug like the one reported above(¹), you’re fu–ed anyway…

  2. Make a binary backup

  3. Make one export

  4. Schedule the reload on A) REBOOT and on B) after xx minutes
    (It depends on how long you think it will take to make the changes, usually 10 minutes is enough)
    NOTE: This do not do reboot-loop for reload the backup, since on binary backup reloaded is not present the scheduled reload at reboot.

  5. Do the work

  6. Make again one export

  7. Compare prev/next export for review the differencies

  8. If all is OK remove scheduled reload on point 3)
    (¹) https://forum.mikrotik.com/viewtopic.php?p=1131405#p1131111

I have added a Linux VM with BIND9 as a resolver and offer it as a second DNS resolver to DHCP clients.
What I see on this machine is that while Windows machines usually use UDP queries, at some point a Windows machine decides to switch to TCP and sets up several (3 in most cases) TCP sessions to port 53 to do its queries. After a while they are disconnected again.
I guess that Windows does that when it receives no reply to a UDP query in time, which can happen in DNS for some domains (slow server).
Anyway, the resolver in the router has to be prepared to service a large spike in TCP queries, certainly after reboot, but it could occur at any time.
Please remove any restriction related to “possible SYN flooding on tcp port 53”. I would say “on local networks”, but of course that is difficult to determine.

This sounds as horrible as the old technique of operating old Cisco IOS without Commit, where you would schedule a reboot and if by the time of the reboot you didn't cancel that reboot or if you didn't do a "copy running-config startup-config" you might be saved by the scheduled reboot.
I also used scripts in TCL for situations where I needed to change interface IPs and would lose access during the commands... A huge GAMBIARRA.

I think it's worth considering the possibility of having 2 options...

  • The first being "safe-mode" centric, as most current MikroTik users are used to.
  • The second being "commit/rollback" centric, as most users of more robust equipment are used to.

Just to give you an example, Arista's EOS gives you 2 options!

  • configure terminal, where the commands take effect immediately after being applied.
  • configure session, where the commands only take effect at the time of commit.

Why can't MikroTik consider a way to deliver both possibilities and leave it up to the operator?

FYI for those using RSTP. There appears to be a bug in the Switch code for the QCA8337 switch used in the RB750Gr2. Since at least as early as v7.16.2 it has a habit of passing received BPDU packets through the switch to other connected devices. For v7.17 a work around is to disable hardware switching and use the software bridge and Fast Forwarding instead. This fix appears to break in v7.18.1 along with other related quirks.
I’m discussing this at length with MikroTik support in SUP-179002, but thought the community should know.

Update: see https://forum.mikrotik.com/search.php?author_id=45699&sr=posts

That switch is used on:
RB750Gr2 (hEX)
RB962UiGS-5HacT2HnT (hAP ac)
RB960PGS (hEX PoE)
RB960PGS-PB (PowerBox Pro)
RB3011 (all series)
RB OmniTik ac (all series)

I have not made any attempt to see if it’s an oversite exclusive to the RB750Gr2 or if the flaw is present on all QCA8337 based devices. But if you are using RSTP and having problems with ports flapping that are connected to any of the devices rextended mentioned above, you may be suffering the same problem I’m seeing. In my case, it was a RB2011 that was flapping the connection to a RB750Gr2. Wireshark proved that BPDU packets from the other side of the RB750Gr2 were reaching the RB2011 along with the proper BPDU packets originating from the RB750Gr2. When the RB2011 was running v7.13.5, this did not bother the RB2011. But once upgraded to v7.16.2 or higher, the mix of BPDU packets was causing the port on the RB2011 to fail to decide on a RSTP role.

Could this be related? http://forum.mikrotik.com/t/connection-to-capsman-suddenly-interrupted/182482/1

I have a bunch of LTAP mini with EU200A modem which 7.18+7.18.1 broke.
I have couple setups and some of them use pap authentication to special APN.
when I am using the network default APN or without authentication the connection is fine.
However when I am trying to use LTE+APN+authentication The device cannot receive an IP address on 7.18.X
The last version which I consider to be stable is 7.16.2 and until now at least 20 devices in my stock got the same issue with 7.18.X while downgrading to 7.16.2 everything works as expected.

Can you explain why you dropped AGE property from /interface/bridge/host print in ROS v7.7?

Your D53G-5HacD2HnD logs look a lot like the kind of mess I was seeing in my logs on the RB2011. Note that the RB2011 was not directly to blame. By upgrading it to v7.16.2 or higher, it became intolerant of the problem that was actually in another box. Look at the hardware connected to ether2 on your D53G-5HacD2HnD.
More in your topic.

I only experienced it in 7.18.1 using RB5009UG+S+IN, CCR2004-1G-12S+2XS and CCR2216-1G-12XS-2XQ.