v7.16.2 [stable] is released!

I talked to Mikrotik support, they told me to try, in the sessions that receive fullrouting and where fullrouting is sent, to use it alone. Other sessions use input.affinity=main output.affinity=input.
I haven’t tested it yet, I’ve disabled all fullrouting and so far without any drops, I’ve also made some adjustments to OSPF. I will reactivate fullrouting and test if the sessions still crash.

My routers do not receive internet routing tables but only a couple of local networks (company networks routed over VPN), but the problem is the same. It for sure is not related to large routing tables.
I have support ticket SUP-159987 open for it since 23/Jul/24 but there is no real progress…

Netwatch bug: On script $ip provide the wrong IP.
For example, if www.example.com is used, instead of 93.184.215.14
is returned 249018461, that is 0E D7 B8 5D = 14.215.184.93, the IP on reversed bytes position…

/tool netwatch
add disabled=no host=www.example.com type=dns
up-script=“:log info "$ip ==>> $(0.0.0.0 + $ip)"”
down-script=“:log info "$ip ==>> $(0.0.0.0 + $ip)"”

I did not test that yet, but is it really so that in the $ip variable a 32-bit numeric value is returned instead of a dotted quad string?
And you could convert that to a string by adding 0.0.0.0 to it?
I would expect a function like inet_ntoa() to be required for that conversion.
The IP address will normally be in “network byte order” and the CPU architecture can have the reverse byte order, causing such bugs.
(which may not be present on a different architecture)

For the netwatch bug I have already opened SUP-168169.

I always done on that way, “:toip” do not convert numbert to IP…

Well, it is not surprising (to me) that it does not work. Maybe there has been a strange workaround in the network code that made it work before and is now removed?
It is completely normal in Linux that the byte order is “wrong” when you look at the bare 32-bit address that way, at least when you have a little-endian machine like x86/amd64.
(of course one reason to prefer big-endian processors in routers is that this problem does not occur there)

Would be interesting to see what

interface/ethernet/monitor numbers=0 duration=1

says, maybe WinBox issue…

There is still a bag in MSTI status data, if we change MSTP bridge priority from 0x7000 to 0x6000 for example:

interface/bridge/print proplist=name,protocol-mode,priority 
Flags: X - disabled, R - running 
 0 R name="bridge" protocol-mode=mstp priority=0x6000

MSTI status data for dynamic instance 0 displays old value:

interface/bridge/msti/print where identifier=0
Flags: D - DYNAMIC
Columns: BRIDGE, IDENTIFIER, PRIORITY, VLAN-MAPPING
#   BRIDGE  IDENTIFIER  PRIORITY  VLAN-MAPPING
0 D bridge           0  0x7000    1-6

Although bridge does advertise correct value as can be seen on peer switches:

interface/bridge/monitor numbers=0
                     ;;; defconf
                  state: enabled
    current-mac-address: 48:A9:8A:7C:63:30
            root-bridge: no
         root-bridge-id: 0x6000...

The only way to change displayed value is to reboot the switch, which is not optimal since it will obviously cause traffic disruption.
This bug exists from ROS 7.14 as far as I can tell.

@bratislav: interface/ethernet/monitor numbers=0 duration=1
name: ether1
status: link-ok
auto-negotiation: done
rate: 2.5Gbps
full-duplex: yes
tx-flow-control: no
rx-flow-control: no
supported: 10M-baseT-half,10M-baseT-full,100M-baseT-half,
100M-baseT-full,1G-baseT-half,1G-baseT-full,
2.5G-baseT
advertising: 10M-baseT-half,10M-baseT-full,100M-baseT-half,
100M-baseT-full,1G-baseT-half,1G-baseT-full,
2.5G-baseT
link-partner-advertising: 10M-baseT-half,10M-baseT-full,100M-baseT-half,
100M-baseT-full,1G-baseT-full

The problem is not in WinBox)

I too had my VRF setup break completely moving from 7.15.2 to 7.16.1

Glad these posts were here to give me an idea of where to look.

Sadly not sure where mine is broken since I already had scripts to build the routes for my DHCP “add default route” problems.

I reviewed all the literature at https://help.mikrotik.com/docs/spaces/ROS/pages/328206/Virtual+Routing+and+Forwarding+-+VRF#VirtualRoutingandForwardingVRF-VRFinterfacesinfirewall and compared it to the information at https://wiki.mikrotik.com/Manual:PCC where I originally based my PCC/VRF setup for the two DHCP ISP WANs from. I don’t see where I went wrong in my config.

Looks like this might be the first time I ever downgrade. Very disappointing.

I checked couple of mines too, and some do not give anything for partner on SFP+ ports, but nevertheless connect at 10Gbps…

                               name: sfp-sfpplus1
                             status: link-ok
                   auto-negotiation: done
                               rate: 10Gbps
                        full-duplex: yes
                    tx-flow-control: no
                    rx-flow-control: no
                          supported: 10M-baseT-half,10M-baseT-full,100M-baseT-half,100M-baseT-full,1G-baseT-half,1G-baseT-full,1G-baseX,2.5G-baseT,2.5G-baseX,
                                     5G-baseT,10G-baseT,10G-baseSR-LR,10G-baseCR
                      sfp-supported: 1G-baseT-full,1G-baseX,2.5G-baseT,2.5G-baseX,5G-baseT,10G-baseCR,25G-baseCR
                        advertising: 1G-baseT-full,1G-baseX,2.5G-baseT,2.5G-baseX,5G-baseT,10G-baseCR
           link-partner-advertising:

So did you change everything to input=main, output=input?

After upgrading my borders and core (5 2116’s in total) to 7.16, I get random problems with BGP sessions disconnecting and reconnecting. Eventually the router no longer prints routes, and I have to reboot. Happened yesterday after shutting down some external BGP sessions. Unfortunately I forgot to get a supout file.

I also had one case where the router showed a route in the routing table as going via one interface, but traffic was continuing to traverse the old interface. I tried shutting down the old interface, but instead of rerouting traffic (as expected), traffic stopped flowing and the router started acting really sluggish.

The majority of my BGP sessions are using “alone” (IPV4/IPV6 full-table peers or internal “most-table” peers, the rest that weren’t explicitly assigned were at defaults).

Yes there sure are terrible issues with BGP in current releases, but it is difficult to debug them.
This week I had a case where one BGP peer connects, routes are received OK (remote is RouterOS v6 so none of that stupid “routes are sent only after one keepalive timer” rubbish), then after “holdtime” the session disconnects with a “HoldTimer expired” and re-connects, this in an endless loop.
After disabling/re-enabling the BGP connection it works again as normal.
Sent a supout to support, they cannot see anything wrong.

Also, I regularly see things that are obviously wrong in the routing table. Wrong route selected, wrong AS-Path, that kind of thing.
But that seems related to the winbox connection. When I close winbox and re-open it, the routing table looks different than before, and often more correct. This is especially apparent after a change in routes has occurred, e.g. I disabled a BGP connection but routes related to it still are shown active in the routing table.

Then I have problems with the limit on the number of alternative routes stored. But that seems to always have been then case, only it became apparent after topology changes in my network. Sometimes “prefix count” in a BGP session remains at zero even though the prefixes are being received, only because there are other paths where the same prefixes are received. However, those paths can fail and then there is no route available, although it should have been received and stored before.
There appears to be a hard-coded limit on number of alternative routes stored, I would want to increase it.

Yes bgp loaded with large routes can make the whole router get stuck (freeze) , even supout failed to be created, so its hard to explain this situation with MT, happen in 7.
16.1 and ticket has been created and still no answer.

After updating to 7.16.1 I am observing a strange behavior regarding the default route coming from a DHCP lease. I at least think it could have something to do with ROS 7.16.1 as there is no indication something else in my environment changed.
I updated on October 22. This first happened on October 25 and again on October 28 - in those two data points it is always three days later. Until now I only tried disabling and enabling the /ip/dhcp-client entry which immediately solved the problem as the default route was in the output of the routing table.

/routing/route/print where dst-address="0.0.0.0/0"
/ip/dhcp-client/disable numbers=0
/ip/dhcp-client/enable numbers=0
/routing/route/print where dst-address="0.0.0.0/0"

I have not yet tried a release or renew of the lease.

When searching the forum I couldn’t find any relating post to this. Is it nevertheless something known or that could be investigated?

I had a similar situation here with IPv6. It was first time since update for PPPoE WAN connection to reconnect and after this DHCPv6 client had no prefix. Release/renew commands did not help but disabling and re-enabling DHCPv6 client brought prefix back immediately.

To be clear: my issues have nothing to do with large route tables. They do occur in a private network with only 22 prefixes and a partial mesh between 8 routers.

I experience on RB5009 (RouterOS 7.16.1) “out of memory condition was detected” errors, then the router reboots. It looks like something slowly leaks memory, about 300Mbyte/24h.
I have a CAP AX (cAPGi-5HaxD2HaxD) with the same software and it does not happen (mem use is about 350M constant).

How can i check the process tree to see what is taking up memory ?