Hello Folks,
TL;DR :
- I have unexplained 1-5% packet loss on a CRS326 on both the WAN and server IP addresses.
- All WAN and server interfaces are ports on the HW layer 2 bridge.
- All WAN and server interfaces are 802.3ad bonds.
Network Topology / Configuration
- This is all installed in a data center environment.
- The router is a CRS326-24G-2S+.
- The WAN uplink to the colo provider is via an 802.23ad bond on the two SFP+ ports.
- There are 5 servers connected via RJ45 to the router. Each router has a single physical connection. The link is an 802.23ad bond.
- The second physical link and side of the 802.23ad bond is connected to a second CRS326 WHICH IS CURRENTLT POWERED OFF. In the future the intention is to configure this with MLAG to provide redundancy for the server connections.
- There is one layer 2 bridge on the device all the bonds are ports on that bridge with PVIDs of 3016.
- There is a layer 3 VLAN interface with VLANID 3016 and the public IP address of the router assigned.
- Servers get their public IP addresses from DHCP server running on the layer 3 VLAN interface.
The problem
I am getting 1-5% packet loss when I ping the server and router public IP addresses. I have excluded the upstream provider as the cause of the issue.
Please help. I am 15 year Mikrotik user and advocate and I am under increasing pressure to move to Arista, which I refuse. If more information would be helpful please ask.
Thank you
what type of 802.3ad / LACP bonding are you using ? ( balance-xor , balance-rr , active-backup , broadcast , balance-tlb ).
I myself prefer to use active-backup bonding ( one active and the second is fall-back ).
Some of the other above methods can be CPU intensive ( a CRS326 is not a CPU power house - it’s mostly a switch with some ARM-32bit 800-MHz CPU capabilities ).
Are you seeing ethernet interface errors ?
Are your links over 10-Gig fiber or something else ?
Hello,
Thank you for your response.
See attached screenshot. It’s just 802.3ad, this was specified by the upstream colo provider. They are using Arista gear and cal it “active/active”.
Links are over 10G fiber using FS transceivers https://www.fs.com/au/products/230697.html
I’m not seeing any Ethernet errors, no.


On your Mikrotik CRS , using Winbox
Click on Interfaces , then click on sfp interface , then click on the SFP tab.
Look at your Tx power and your Rx Power ( verify both sided of your links have a receive Rx-Power that is within spec of your fiber sfp+ module
also
Click on Interfaces , then click on sfp interface , then click Rx-Stats then Tx-Stats , do you see any errors there ? ( check both sides of the fiber links ).
And and look for anything else on that page that might not be normal.
Clear your counters and see what counters are growing.
Check MTU on both sides of your links.
Check your logs
edit - if you drop one of your interfaces then up it later then later drop the other interface, do the errors go away when ether interface was shut ?
1- what routeros version are you running?
2- where are the packet drops occurring?
- inbound at the LAG interface (rx-drop)
- outbound to your network (tx-drop)
this could be buffer exhaustion at the CRS326, see this new discussion
http://forum.mikrotik.com/t/crs3-5-packet-buffer-size/181738/16
and some old discussion:
http://forum.mikrotik.com/t/crs317-and-tx-drops-maybe-a-workaround/157420/8
CRS3xx and 5xx have hardware offloading for some 802.3 compatible modes, see:
https://help.mikrotik.com/docs/spaces/ROS/pages/30474317/CRS3xx+CRS5xx+CCR2116+CCR2216+switch+chip+features#CRS3xx%2CCRS5xx%2CCCR2116%2CCCR2216switchchipfeatures-Bonding
@TomjNorthIdaho
I reviewed all your suggestions and I cannot see any issue. I have attached screenshots to confirm.
Thank you


@guipoletto, thank you for your input.
I am running RouterOS v7.18.1
The packets are being lost inbound at the moment. That is all I have tested. I will now test outbound.
Is your suggestion that the CRS326 is not appropriate for this use case and I would have a better experience with a fully fledged router?
@guipoletto, thank you for your suggestion.
I have ran some mtr tests and after 25,000 packets the results are the following:
- Outbound to 1.1.1.1 = 0% packet loss
- Inbound to the router’s public IP address = 0.3% packet loss
- Inbound to one of the servers public IP addresses = 3.8% packet loss
I meant the ethernet interface counters, at the switch:
from CLI:
/interface/etnernet/print stats
or for a more compact view:
interface/ethernet/print stats proplist=rx-too-short,rx-too-long,rx-pause,rx-error-events,rx-fcs-error,rx-fragment,rx-overflow,rx-jabber,tx-pause,tx-underrun,tx-collision,tx-late-collision,tx-drop
you can add “default-name” after tx-drop to show the interface names, but this might make reading more difficult, as with 26+ interfaces, there might not be enough space to fit the table in a single line
check for very high, or increasing TX-drop or RX-drop counters (relative to troughgput / total packets)
some drops are normal, but if during your test you see a huge increase in drops, then lack of available buffers might be the cause
Hello and thank you.
@guipoletto, I ran the command you suggested, this is the output:
rx-too-short: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
rx-too-long: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
rx-pause: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
rx-error-events: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
rx-fcs-error: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
rx-fragment: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
rx-overflow: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
rx-jabber: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
tx-pause: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
tx-underrun: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
tx-collision: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
tx-late-collision: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
tx-drop: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
The device uptime is 1d17h and I have been running mtr test for the last 8 hours.