We have recently experienced a situation where 4x 1gb nics in a 802.3ad bonded setup to a cisco backend with vlans defined on top of bonded interface results in 3-4% packet loss.
The setup is a single LACP trunk and our vlans defined on the Mikrotik x86 6.38.3.
We see this loss to local networks and we have not been able to get any guidance from support yet as to what can be the cause.
Our ISP has assured us everything is healthy from the cisco perspective and they see NO errors on any of the interfaces. (and neither do I)
I have tried removing interface by interface and swopping them around yet the result is always the same. At this point im suspecting issues with the MT OS itself.
ALL nics used in the bonded setup are Broadcom NX2 Ethernet based.
How could a 6.38 release candidate from 2016 fix his problem on 6.38.3?
We have noticed packet loss on aggregate traffic levels above 1Gbps on x86 and Tile since 6.38. Mikrotik support have been provided supout files and other information with ZERO feedback so we’ve reverted to 6.37.4 (bugfix channel).
We always run VLANs on bond interfaces, some bonds are 802.3ad (Mikrotik should really rename this to IEEE 802.1AX) but most are active/backup bonds with predictable primaries to reduce unnecessary switch backplane overhead between redundant switching stacks (especially on 10Gbps).
Tried to test the theory with these cards and tried a nx3031 HP card. Unfortunately this is not supported, picks up within the resources/pci section but no driver as the ports are not available…
I think we have finally managed to locate the source of the issue.
In this machine we had 4x dual port cards which was picked up as Intel 82571EB cards and these were the cards in use.
We finally found out that the system also got progressively worse as more traffic flowed through it with loss being 0-1% with minimum traffic. (I’d say up to the 40-50M mark) after that it increased to the 3-4% mark anything above that threshold with some hosts going as high as 11%.
What I did also notice is with the dual port cards if I disabled the cards ports and re-enabled it seems the one port will also start flapping going up and down repeatedly until the entire system was rebooted. (I tested this result with the other cards to exclude a faulty card and the results was the same.)
This is a sample during one of the tests:
This lead me to try bonding only the two onboard cards namely the BCM5708 cards and all the issues went away!!! It seems the problem is indeed as you suspected the drivers used for the Intel 82571EB cards!!!
Here is a snip of the cards in the system at this point in time. (I have swopped the one dual port 82571EB card with a single port 82574L to see if its maybe an issue with certain revisions??? , but have yet to test)
I also obtained another 4 port card but this would seem is not compatible?
Could you perhaps let me know if there are known issues like this with certain revisions of these cards and which ones I should avoid.
You mentioned many an issue will be resolved hopefully with version 7 of the OS but after much research it seems everyone is waiting desperately for it for years?