CRS317 and TX-drops (maybe a workaround?)

I’ve been dealing with TX-Drops and some level of packet loss since the dawn of the 317’s on my network

This happens when there is a transition from 10G->1G, and affects the whole switch (as i’m led to believe, due to the nature/organization of the switch-chip packetbuffer)
So far, i have tried to not mix 1G and 10G ports on the switches (not always possible), or concentrating the 1G stuff in a single switch
The second option wields weird results, as the Uplink ports of the “mixed” switch still present TX-Drops (when viewed from the “Uplink interface” in a Not-Mixed switch)

There may be other stuff affecting this scenario, but this is what i observed so far. (this is a production network, not a controlled lab scenario)

Recently i implemented a topology change in ~ half the network, building a “ring” of bonded interfaces, “two-by-two” (a 20gb ring using CRS317’s, V6.49.3)
Well, i eventually noticed that there were no TX-Drops in that half of the network

So that got me a little hopefull, and i moved the Uplink port for the whole network(~8gb), from an unbonded port at a switch outside the “ring”, to a bonded port in the “just adjacent” ring switch. (the interconnect between those two was not changed)

My overall packet loss from WAN → LAN dropped from ~~6% to ~~3% instantly (as measured in one of the webscale portal/tools, cannot say more due to NDA)

I then changed all the connected 1G ports from a “bare interface”, to a “single slave”, bonded, 802.3ad port
That caused there to be no more TX-Drops


So, either this is a cosmetic thing (drops still occur, but ROS does not populate the field in the statistics)
Or there is really some quirk of the CRS317 Switch-Chip that makes a bonded port behave differently than a “bare” port, mitigating the lack-of-buffer situation.

Does anyone see similar behavior?

Thanks for writing up these observations…don’t have a specific answer but you have me thinking.

I wonder if putting a port into bonding changes the allocation of buffer resources in any way?

Also what version of ROS are you using? Have you tried this on ROSv7?

ROS is v6.49.3

As i have no remote access to this devices, i have refrained from updating any of those (for now)

I have a similar problem with the CRS317 switch:

  • I have an LACP with 4 ports (sfp5, 6, 7 and :sunglasses: operating at 1Gbps each. I have another port (SFP12) without LACP operating at 1Gbps.
  • LACP and SFP12 port are on a bridge, all have Hardware Offloading active.
  • I have a VLAN 2000 between LACP and SFP12.

Problem: TX DROP is occurring on the SFP12 port.
When I disable LACP and configure only one port without LACP between 1036 and CRS317 (keeping the VLAN), the TX DROP of SFP12 ends.

I cannot understand why.

Note: I changed the SFP12 port to the SFP16 port, still the problem remained on the SFP16 port

Characteristics:
CRS317 running ROS 3.48.6
CRS317 with CPU at 0% usage
CRS317 with HW Offloading only
Only 400Mbps traffic
CRS317 only layer2
LACP CRS317.png

please post your 1036 and 317 config to properly understand your scenario (please edit any confidential/private info)

Hi
I simplified/updated the settings to see if the problem ended, however, even with only 2 ports on LACP the TX DROP continues

1036 -----------------------------------------------------------------------------------------------------------------------------

apr/28/2022 00:12:39 by RouterOS 6.48.6

model = CCR1036-12G-4S

/interface bridge
add name=IpLocal

/interface ethernet
set [ find default-name=sfp1 ] advertise=10M-full,100M-full,1000M-full comment="LACP3"
set [ find default-name=sfp2 ] auto-negotiation=no comment="LACP3"

add comment="LACP3" mode=802.3ad name=LACP3-R65 slaves=sfp2,sfp3 transmit-hash-policy=layer-2-and-3

/interface vlan
add comment="" interface=LACP3-R65 name=pppoeservice1 vlan-id=101
add comment="X" interface=LACP3-R65 name=pppoeservice2 vlan-id=103
add comment="X" interface=LACP3-R65 name=pppoeservice3 vlan-id=104
add comment="X" interface=LACP3-R65 name=pppoeservice4 vlan-id=105
add comment="X" interface=LACP3-R65 name=pppoeservice5 vlan-id=106
add comment="X" interface=LACP3-R65 name=pppoeservice6 vlan-id=107
add comment="X" interface=LACP3-R65 name=pppoeservice7 vlan-id=401
add comment="X" interface=LACP3-R65 name=pppoeservice8 vlan-id=402
add comment="X" interface=LACP3-R65 name=pppoeservice9 vlan-id=403
add comment="X" interface=LACP3-R65 name=pppoeservice10 vlan-id=404
add comment="X" interface=LACP3-R65 name=pppoeservice11 vlan-id=405
add comment="X" interface=LACP3-R65 name=pppoeservice12 vlan-id=406
add comment="X" interface=LACP3-R65 name=pppoeservice13 vlan-id=407
add comment="X" interface=LACP3-R65 name=pppoeservice14 vlan-id=408
add comment="X" interface=LACP3-R65 name=pppoeservice15 vlan-id=601
add comment="X" interface=LACP3-R65 name=pppoeservice16 vlan-id=602
add comment="X" interface=LACP3-R65 name=pppoeservice17 vlan-id=603
add comment="X" interface=LACP3-R65 name=pppoeservice18 vlan-id=604
add comment="X" interface=LACP3-R65 name=pppoeservice19 vlan-id=605
add comment="X" interface=LACP3-R65 name=pppoeservice20 vlan-id=606
add comment="X" interface=LACP3-R65 name=pppoeservice21 vlan-id=607
add comment="X" interface=LACP3-R65 name=pppoeservice22 vlan-id=608
add comment="X" interface=LACP3-R65 name=pppoeservice23 vlan-id=801
add comment="X" interface=LACP3-R65 name=pppoeservice24 vlan-id=802
add comment="X" interface=LACP3-R65 name=pppoeservice25 vlan-id=803
add comment="X" interface=LACP3-R65 name=pppoeservice26 vlan-id=804
add comment="X" interface=LACP3-R65 name=pppoeservice27 vlan-id=805
add comment="X" interface=LACP3-R65 name=pppoeservice28 vlan-id=806
add comment="X" interface=LACP3-R65 name=pppoeservice29 vlan-id=807
add comment="X" interface=LACP3-R65 name=pppoeservice30 vlan-id=808

/ip firewall connection tracking set enabled=no

/interface pppoe-server server
add authentication=chap,mschap1,mschap2 disabled=no interface=pppoeservice1 max-mru=1492 max-mtu=1492 mrru=1600 one-session-per-host=yes service-name=pppoeservice1
add authentication=chap,mschap1,mschap2 disabled=no interface=pppoeservice2 max-mru=1492 max-mtu=1492 mrru=1600 one-session-per-host=yes service-name=pppoeservice2
add authentication=chap,mschap1,mschap2 disabled=no interface=pppoeservice3 max-mru=1492 max-mtu=1492 mrru=1600 one-session-per-host=yes service-name=pppoeservice3
add authentication=chap,mschap1,mschap2 disabled=no interface=pppoeservice4 max-mru=1492 max-mtu=1492 mrru=1600 one-session-per-host=yes service-name=pppoeservice4
add authentication=chap,mschap1,mschap2 disabled=no interface=pppoeservice5 max-mru=1492 max-mtu=1492 mrru=1600 one-session-per-host=yes service-name=pppoeservice5
add authentication=chap,mschap1,mschap2 disabled=no interface=pppoeservice6 max-mru=1492 max-mtu=1492 mrru=1600 one-session-per-host=yes service-name=pppoeservice6
add authentication=chap,mschap1,mschap2 disabled=no interface=pppoeservice7 max-mru=1492 max-mtu=1492 mrru=1600 one-session-per-host=yes service-name=pppoeservice7
add authentication=chap,mschap1,mschap2 disabled=no interface=pppoeservice8 max-mru=1492 max-mtu=1492 mrru=1600 one-session-per-host=yes service-name=pppoeservice8
add authentication=chap,mschap1,mschap2 disabled=no interface=pppoeservice9 max-mru=1492 max-mtu=1492 mrru=1600 one-session-per-host=yes service-name=pppoeservice9
add authentication=chap,mschap1,mschap2 disabled=no interface=pppoeservice10 max-mru=1492 max-mtu=1492 mrru=1600 one-session-per-host=yes service-name=pppoeservice10
add authentication=chap,mschap1,mschap2 disabled=no interface=pppoeservice11 max-mru=1492 max-mtu=1492 mrru=1600 one-session-per-host=yes service-name=pppoeservice11
add authentication=chap,mschap1,mschap2 disabled=no interface=pppoeservice12 max-mru=1492 max-mtu=1492 mrru=1600 one-session-per-host=yes service-name=pppoeservice12
add authentication=chap,mschap1,mschap2 disabled=no interface=pppoeservice13 max-mru=1492 max-mtu=1492 mrru=1600 one-session-per-host=yes service-name=pppoeservice13
add authentication=chap,mschap1,mschap2 disabled=no interface=pppoeservice14 max-mru=1492 max-mtu=1492 mrru=1600 one-session-per-host=yes service-name=pppoeservice14
add authentication=chap,mschap1,mschap2 disabled=no interface=pppoeservice15 max-mru=1492 max-mtu=1492 mrru=1600 one-session-per-host=yes service-name=pppoeservice15
add authentication=chap,mschap1,mschap2 disabled=no interface=pppoeservice16 max-mru=1492 max-mtu=1492 mrru=1600 one-session-per-host=yes service-name=pppoeservice16
add authentication=chap,mschap1,mschap2 disabled=no interface=pppoeservice17 max-mru=1492 max-mtu=1492 mrru=1600 one-session-per-host=yes service-name=pppoeservice17
add authentication=chap,mschap1,mschap2 disabled=no interface=pppoeservice18 max-mru=1492 max-mtu=1492 mrru=1600 one-session-per-host=yes service-name=pppoeservice18
add authentication=chap,mschap1,mschap2 disabled=no interface=pppoeservice19 max-mru=1492 max-mtu=1492 mrru=1600 one-session-per-host=yes service-name=pppoeservice19
add authentication=chap,mschap1,mschap2 disabled=no interface=pppoeservice20 max-mru=1492 max-mtu=1492 mrru=1600 one-session-per-host=yes service-name=pppoeservice20
add authentication=chap,mschap1,mschap2 disabled=no interface=pppoeservice21 max-mru=1492 max-mtu=1492 mrru=1600 one-session-per-host=yes service-name=pppoeservice21
add authentication=chap,mschap1,mschap2 disabled=no interface=pppoeservice22 max-mru=1492 max-mtu=1492 mrru=1600 one-session-per-host=yes service-name=pppoeservice22
add authentication=chap,mschap1,mschap2 disabled=no interface=pppoeservice23 max-mru=1492 max-mtu=1492 mrru=1600 one-session-per-host=yes service-name=pppoeservice23
add authentication=chap,mschap1,mschap2 disabled=no interface=pppoeservice24 max-mru=1492 max-mtu=1492 mrru=1600 one-session-per-host=yes service-name=pppoeservice24
add authentication=chap,mschap1,mschap2 disabled=no interface=pppoeservice25 max-mru=1492 max-mtu=1492 mrru=1600 one-session-per-host=yes service-name=pppoeservice25
add authentication=chap,mschap1,mschap2 disabled=no interface=pppoeservice26 max-mru=1492 max-mtu=1492 mrru=1600 one-session-per-host=yes service-name=pppoeservice26
add authentication=chap,mschap1,mschap2 disabled=no interface=pppoeservice27 max-mru=1492 max-mtu=1492 mrru=1600 one-session-per-host=yes service-name=pppoeservice27
add authentication=chap,mschap1,mschap2 disabled=no interface=pppoeservice28 max-mru=1492 max-mtu=1492 mrru=1600 one-session-per-host=yes service-name=pppoeservice28
add authentication=chap,mschap1,mschap2 disabled=no interface=pppoeservice29 max-mru=1492 max-mtu=1492 mrru=1600 one-session-per-host=yes service-name=pppoeservice29
add authentication=chap,mschap1,mschap2 disabled=no interface=pppoeservice30 max-mru=1492 max-mtu=1492 mrru=1600 one-session-per-host=yes service-name=pppoeservice30

/system resource irq rps
set sfp1 disabled=no
set sfp2 disabled=no
set sfp3 disabled=no
set sfp4 disabled=no
set ether1 disabled=no
set ether2 disabled=no

Switch --------------------------------------------------------------------------------------------------------------------------------------------------

mar/10/1970 10:52:43 by RouterOS 6.48.6

model = CRS317-1G-16S+

/interface ethernet
set [ find default-name=ether1 ] comment=gerencia
set [ find default-name=sfp-sfpplus6 ] advertise=10M-full,100M-full,1000M-full auto-negotiation=no comment="LACP3"
set [ find default-name=sfp-sfpplus7 ] advertise=1000M-full auto-negotiation=no comment="LACP3"
set [ find default-name=sfp-sfpplus12 ] auto-negotiation=no
set [ find default-name=sfp-sfpplus15 ] advertise=10000M-full comment="Dir SP-JQ"
set [ find default-name=sfp-sfpplus16 ] advertise=1000M-full

/interface bridge
add frame-types=admit-only-vlan-tagged ingress-filtering=yes name=bridge2 pvid=3 vlan-filtering=yes

/interface vlan add interface=bridge2 name=XXXXX vlan-id=2001

/interface bonding
add mode=802.3ad name="LACP3" slaves=sfp-sfpplus6,sfp-sfpplus7 transmit-hash-policy=layer-2-and-3

/interface bridge port
add bridge=bridge2 frame-types=admit-only-vlan-tagged ingress-filtering=yes interface="LACP3"
add bridge=bridge2 frame-types=admit-only-vlan-tagged ingress-filtering=yes interface=sfp-sfpplus15
add bridge=bridge2 frame-types=admit-only-vlan-tagged ingress-filtering=yes interface=sfp-sfpplus12

/interface bridge vlan
add bridge=bridge2 tagged="sfp-sfpplus15,LACP3" vlan-ids=101
add bridge=bridge2 tagged="sfp-sfpplus12,LACP3" vlan-ids="103,104,105,106,107,401,402,403,404,405,406,407,408,601,602,603,604,605,606,607,608,801,802,803,804,805,806,807,808"

/ip address
add address=XXXXX interface=gerencia network=XXXXXXXXX

/ip route
add distance=1 gateway=XXXXXXX

/system identity
set name=Switch

/system routerboard settings
set boot-os=router-os

/system routerboard reset-button
set enabled=yes

/system swos
set address-acquisition-mode=static allow-from-ports=
p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11,p12,p13,p14,p15,p16,p17 identity=CRS1

maybe try using sfp12 as a member of another bonding interface to see if tx-drops are reduced

I also have TX Drops in CRS317-1G-16S+.
Hardware offload on.
CPU load 0%.
Only 5 VLAN’s
3 SFP+ 10G no tx drops
2 Mikrotik S-RJ01 1GB a lot of tx drop (also try ubiquiti UF-RJ45-1G, same result)
ROS 6.49.6 (also try ROS 7.x, and routerboard firmware.)
Apparently it’s the mix of 10G and 1GB. I read in another post, with swos this problem go away. So the problem is software not hardware?

For the slower ports/interfaces try and set the ingress / egress port speed (switch interface) to close to what the interface can actually support. In the past when I used a 2.5 gb sfp+ adapter I had to set the rate to 2400 to get basically no retries and full speed (basically when sending traffic to my device)

i will try this

i think we have to measure how much is “a lot”

i was checking on some crs3xx switches with tx-trops counters but i have not find any with even close to 0.01% of total frames on interface

will be nice to follow this trend based on clear numbers to measure it properly

I personally have seen “surges” of TX-Drops when instantaneous loads get close to line-capacity (such as running a speedtest)

This causes test results to be unpredictable (even though low-troughput applications, such as web browsing remain mostly unaffected)

Mostly what i’ve done is bond the network to 20Gb wherever possible, and added single interfaces to a 802.3ad “one-legged-bond”, with good results.

TX-Drops seem to kick in when port load is >70%, though i have no way to know the actual instantaneous load (in winbox, what we see is the average of the last 1-2 seconds)

Since the switch chip has (extrapolating from its big brother) maybe a 2MB packet buffer, the low performance when dealing with congestion is to be expected…
The interesting stuff is the fact that adding to a “bond” alleviates the problem. That was unexpected.

(source: https://isp-tech.ru/en/switch-asic/ )

i hope queue management on MikroTik switches is under development, for example there is no support for multiple output queues per port and that is a standard feature in the industry

i think some tx-drop on congestion is normal, and internet applications are prepared to deal with in moderate ammounts

at least mikrotik now is doing this kind of tx-dropping on outgoing interface i think to avoid HOL blocking, in CRS3xx early days rx-overflow happened on inbound interface with worse consequences