Intermittent Link Drops on Windows Clients – MLAG Stack with CRS354/CRS328

We have a customer who has recently started experiencing a network issue that we haven’t been able to pinpoint yet. The customer is running an MLAG stack with two CRS354-48P units as core switches, and connected to these are about 3–4 CRS328-24P switches deployed throughout the factory. This setup has been running without issues for several years.

Earlier this spring, we upgraded to the latest RouterOS version available at the time (7.16). The next day, we encountered severe packet loss issues within the MLAG stack and with 802.3ad bonding towards a server. We rolled back to a previous version, which resolved that particular issue.

Around the same time (or possibly coincidentally), some users/clients began to intermittently lose “internet” connectivity once or twice a day. All clients are Windows machines. What happens is that Windows drops the link entirely and indicates no network access. After 1–2 minutes, connectivity is restored without any user intervention.

We’ve tried moving affected clients to different switches, but the problem persists regardless of switch port or switch model.

Does anyone have an idea what could be causing this? Could it be client-related or something in the switching infrastructure?

Try latest 7.19beta8 and see if the error remains?

You can also try to for the devices using LAG towards your MLAG Mikrotiks to disconnect one of the cables to find out if the error is physical or logical.

Logical as in something getting incorrectly blocked by your MLAG Mikrotiks but will work if there is only a single physical path to choose from.

Also a dump of current configs would be helpful something along with “/export terse”.