v7.18.2 [stable] is released!

philipwillemse · February 27, 2025, 2:49pm

Thank you for the feedback, you had similar issues and when you downgraded to 7.15.3 the issue have gone away permanently?

pe1chl · February 27, 2025, 2:57pm

I have all kinds of BGP issues that were introduced with 7.16, reported, but not yet fixed.
In version 7.15.x it worked much better. But I cannot downgrade because I require other fixes.

philipwillemse · February 27, 2025, 3:03pm

Thank you for the feedback, I am going to wait for Mikrotik to respond before updating as it might introduce other issues and downgrading might not fix the issues that I am having where pings to a PPPoE client stops working and my frontend goes blank etc.

oreggin · February 27, 2025, 3:34pm

And there is a good reason for this. I using BTRFS only where there is no need FS level RAID or at most I using RAID0 for data and RAID1 for metadata if data loss is accepted. As I read correctly, NVMe disks connected directly to the PCIe fabric so there is no HW RAID controller in this pizza-box. I don’t know ROSE storage solution, what kind of possibilities we have if we would like to use RAID like operation? BTRFS RAID is unfortunately a dead end however RAID0 or RAID1 us usable with some constraints, but if I need RAID5 or RAID6 for some reason, how could I do that without ZFS? For comparison, I have a NAS at home, with four 8TB disks in RAID5 setup. I was clingy for BTRFS for a long time till I started a scrub on it and it tells me it will take more than a week. I copied all data from that volume, recreate it with ZFS RAIDZ and copied all data back. Now, if I start a scrub, it take 9 hours on the same HW. Not to mention that ZFS is much more mature.

Paternot · February 27, 2025, 4:39pm

I don’t KNOW what ROSE uses, but I do know that the Linux kernel have support for software RAID (mdraid) for quite some time already. It’s stable, it’s mature and it’s dependable. Can’t say if they use it, but it’s (would be?) an easy “turn key” solution.

pe1chl · February 27, 2025, 5:03pm

Disadvantage of kernel RAID: when a single block error occurs the entire device is removed from the array and no longer updated. So when you have two disks in RAID-1 each with a block error at a different location, you lose all your data. BTRFS balance raid1 does not have that problem, it keeps both disks online and combines the good blocks from each of them, not losing any data.

Advantage of kernel RAID: you can remove a disk from a RAID-1 and replace it, or even put the same disk back, and it will automatically copy the data without much hassle. In BTRFS that is extremely difficult, because you cannot remove a disk from a 2-disk RAID-1, you first need to downgrade it to a SINGLE, then you can add or re-add the disk and convert it to RAID-1 again (which will copy the data). But when the system cannot read a block from the disk when copying over, you are in trouble, even though the block likely still exists on your “broken” disk.

In the scenario where a disk goes “soft” offline (which has happened to me twice now with one of the M.2 WD_BLACK SN850X devices in my home system), i.e. it becomes inaccessible but is not broken, an extremely dangerous situation occurs: when the system is rebooted and the disk comes back online, the filesystem is totally toast because one of the members has not been kept up-to-date for a while and there is NO WAY to tell BTRFS that you want to re-sync that device nor to not use it for a while. Only thing that is possible is: physically remove de M.2 that was down, boot the system from USB, mount the filesystem(s) with “-o degraded”, convert the RAID1 on the remaining device to SINGLE, shutdown and put the M.2 in again, boot the system from USB, clear the partition then add the partition to the filesystem again and convert back from SINGLE to RAID1.

It is totally broken. There should be some sequence number on devices, when one of the devices in a RAID1 gets inaccessible, the number should be incremented on the other one, so that when the system boots with both again available the system knows which one is the more up-to-date and re-sync to the other. I think kernel RAID does that, but BTRFS raid for sure does not. That makes it unusable without a watchful eye. And it is also insane that it does not allow to remove an (inaccessible) disk from a configured RAID-1, bringing it effectively back to SINGLE. At least kernel RAID does allow that.

From what I understand the U.2 slots in the ROSE storage device are hot-swappable. That is a disaster waiting to happen with BTRFS.

sirbryan · February 27, 2025, 5:34pm

@pe1chl, I have an idea. How many peers do your routers have?

I think some of the BGP bugs in 7.15.x and 7.16.x have to do with the number of peers, along with the proclivity for routes to come and go.

For example, my borders peer with five different upstream peers and three downstream peers, as well as with each other. Routes are coming and going as network conditions change both internally and across the world. Since all of them announce everything to each other, that’s a lot of updates across 8-10 peers, and with 7.16.x I found many routes in the RIB didn’t match what was actually happening (“stuck routes”). I’d eventually have to reboot the router because traffic would loop up or stall, or the router itself began to slow down.

With the internal routers, on 7.15.3, they were running into memory issues and randomly rebooting every week or so. Upgrading them to 7.16.2 solved that problem. Routes don’t seem to get stuck, despite internal network changes (60GHz links going up and down, other random wireless drops). But then, each of those routers only has two peers: the route reflectors.

My route reflectors were also showing odd behavior on 7.16.x; once I moved them to 7.15.3, they’ve been pretty solid, and their memory utilization is holding at 280MB (they’re CHR’s with 1GB of RAM).

Bottom line is that 7.16.x with a low number of BGP peers seems to work fine, and 7.15.3 with lots of peers seems to work fine, but uses more memory than 7.16.x on small-memory routers/switches.

oreggin · February 27, 2025, 5:41pm

As far as I can remember, I replaced a faulty disk in BTRFS only once, there is a replace function, there is no need to convert it to anything.
https://wiki.tnonline.net/w/Btrfs/Replacing_a_disk

pe1chl · February 27, 2025, 5:45pm

But you cannot replace a disk with the same disk, that is the problem.

toxicfusion · February 27, 2025, 5:49pm

v7.18 “stable” published and with this:
[*]Rose Data Server Confluence page was published 1 hour ago.
https://help.mikrotik.com/docs/spaces/UM/pages/298975330/ROSE+Data+Server
The device supports RouterOS software with version v7.18 or above.

I stand corrected and validated regarding the point I made surrounding the rush release of v7.18 in support of their RDS hardware announcement.

Also, BTRFS is dead, PITA to manage groups of disks. MikroTik, if you F(&*& listening… implement OpenZFS instead of BTRFS… This will be better received, more support and benefits.

MikroTik will also need to create a way, or within Winbox UI for the DISK MANAGEMENT. Functions of adding disks to RAID or if they instead goto ZFS.. create vdevs and zpool. Without proper disk management, this is DOA. … what about blinking LED of disk for locating? If you want to be a NAS.. you need the features. DOA DOA

RouterOS == stick to routing.
StorageOS [ROSE] == focus on storage.

The idea of combining solutions is cool… but need to friggen up your game on software quality, this is bullshit mikrotik. Trying to do what Synology and QNAP has done for YEARS [DHCP server, AD, etc] on their NAS

What about AD/LDAP join for user management? Enterprise my ass.

Also, I concur with question of how they will prioritize the routing processing over storage, renice -19 is an idea. Or dedicate CPU resources to speciifc items. Similar to what Palo Alto does…

Larsa · February 27, 2025, 5:55pm

n/a

nipfel · February 27, 2025, 5:55pm

I managed to repro issue with locking.
There was default rule:
chain=input action=drop in-interface-list=!LAN log=no log-prefix=""
I was not able to log in to inbox thru input until I added another one exactly the same rule after that default.
Adding supout here. I believe that there is a hidden issue with defaults.
supout.rif.zip (278 KB)

oreggin · February 27, 2025, 6:10pm

Yes, it needs hotplug, but I dunno if it is supported in RDS. An enterprise storage has hotplug support, so crew can replace faulty disks on-the-fly.

Larsa · February 27, 2025, 6:14pm

@oreggin; the Wiki describes a planned disk replacement, but @pe1chl ran into an unexpected failure. Since Btrfs RAID1 doesn’t have automatic resync, it might be unreliable when disks go offline (kernel RAID handles resyncing differently)

Beyond what @pe1chl mentioned, there are plenty of other risks that haven’t even been brought up in the forum. With no recovery or repair kits available and a lack of documentation, one might wonder if MT did any real risk assessment at all before choosing Btrfs and (IMO, prematurely) released the RDS2216.

Two essential questions MT must be able to answer:

What happens if the Btrfs file system fails?
How do you perform a full system backup?

Check this out: http://forum.mikrotik.com/t/newsletter-123-february-2025/182258/23

sinisa · February 27, 2025, 6:34pm

I think that you understood perfectly what I wanted to say.
Yes, I know that every device which works with files has to have a file system, but I don’t want to worry about that, not even think about that, on a ROUTER (btw: what fs is used for ROS rootfs?). And by removing unnecessary things like fs modules MT could free precious space on 16MB devices (and there is plenty of them still being sold). I don’t know how much space they consume on ARM(64), but on my x86_64 Linux RAID, brtfs, ext4, and smb server modules take more than 2MB zst-compressed, and 2 MB is plenty for a 16MB device. There are probably more things that could be removed/made optional on a ROUTER without hurting it’s function.

And I also don’t like to have on my router: BTH, Zerotier, nor any other “cloud” service that I don’t control in full (but I understand that not everyone knows how to set up their own VPN). Now, ZT is optional, so that’s fine, BTH is not configured, so hopefully not active…

leonardogyn · February 27, 2025, 6:37pm

After finding out about the VPN problems with CBC enc, I have updated my VPN network (I was looking forward to adjust that anyway) to now use AES-256-GCM instead of CBC, as well as using SHA256 hash instead of SHA1.

Everything works fine now with the v7.18 release, the problem really seems to have been CBC related only.

However, in some RB4011s only (not happening for other MK models I also have), my VPN server is showing lots of “AEAD Decrypt error: cipher final failed” errors. Example

Feb 27 15:12:28 unifi openvpn[1746]: cliente-client1-01/177.X.X.X:34159 AEAD Decrypt error: cipher final failed
Feb 27 15:12:40 unifi openvpn[1746]: cliente-client2-01/186.X.X.X:55423 AEAD Decrypt error: cipher final failed
Feb 27 15:12:45 unifi openvpn[1746]: cliente-client3-01/201.X.X.X:53728 AEAD Decrypt error: cipher final failed

.
And, at first look, this is happening only for RB4011 VPN clients …

Searching for that, on OpenVPN forums, it indicates something related to fragmentation. But if that’s the case, I believe all MK models should be presenting that, not specifically RB4011s only. And while OpenVPN do have some parameters to handle fragmentation better, they are not available on the RouterOS implementation

Doing some further investigation, but seems a minor problem indeed specifically to RB4011s.

infabo · February 27, 2025, 6:57pm

btrfs is part of rose package. Dont worry. But there is ext4, fat32 and exfat support in routeros package AFAIK. Some devices have USB ports and some people want to use USB sticks. When not for SMB, then just for e.g. container storage and or log storage.

And btw, I think this controversial discussion should be continued over here: http://forum.mikrotik.com/t/new-exciting-features-for-storage/181774/1
Because once 7.19 shows up, this topic is going to be locked. All discussion lost.

antiqued4 · February 27, 2025, 7:46pm

On the subject just above BGP, on a CCR2216, with several Peers, I noticed a behavior, in versions above 7.15.3, currently on 7.17.2
It turns out that depending on what you change, do an update of routes something like that, you can notice in the LOG that the OSPF sessions all drop and come back, then the BGP sessions drop and come back, but the internet stops working, basically it appears that all the routes are “stuck”, after a reboot everything goes back to normal, CPU is more or less at 36%.

I will check in the near future, with this version.

KiwiBloke · February 27, 2025, 7:49pm

I could be wrong, but I think you need to upgrade to 7.13 first, then to 7.18. There were some major changes to the way packages were handled requiring the step to 7.13 first. Maybe someone could correct me if I’m wrong.

elmemis · February 27, 2025, 9:36pm

If you use

set/append bgp-communities no-export/no-advertise/no-peer

in filters, it does not work, and the filter stops processing the prefix.