I’m planning to upgrade a 1G company network to a mixed 10/2,5/1G environment using MT switches (two servers with 10G NICs and maybe 30-40 active users doing mostly SQL and SMB). There are some posts about excessive packet drop because of too small buffers that leave me somewhat discouraged though. I hope I wasn’t too optimistic.
Actual numbers for the buffers are really hard to find. But with qos-hw enabled, there seems to be a way to output the buffer size. Are those values accurate?
It’s supposed to only have 2 instead of 3MB. There’s something in a recent changelog about fixing buffer limits for this switch chip, so maybe the page is just out of date.
What are the values for other models? The ones I’m interested in are:
CRS312-4C+8XG-RM - 2MB?
CRS317-1G-16S+RM
CRS326-24S+2Q+RM - some other vendor has a switch with the 98DX8332 listed with 3MB
CRS354-48G-4S+2Q+RM - I really hope the 98DX3257 didn’t inherit the 1.5MB from the 98DX3236..
CRS326-4C+20G+2Q+RM - 98DX8332 again
CRS518-16XS-2XQ-RM
Consider carefully recent RouterOS 7.17 and 7.18 changes before deploying to locations where physical device access has high labor, time, or travel cost.
For this devices i use 7.16.2, taken from qos monitor i have data of following devices:
CRS-326-24g-2s+ 1.5 Mbyte
CRS-317 4 Mbyte what a champion!
CRS-309 2 Mbyte
CCR-2116 1.5 Mbyte
CCR-2216 8 Mbyte
CRS-520 12 Mbyte
Access to these stats requires enabling “advanced QOS” in the whole switch
My question to mikrotik is: how are buffers distributed when operating without new “qos-hw” mode?
in other words: which “qos-hw” mode is a direct equivalent to previous “qos-hw-disabled” behaviour?
Also, what would be the “next preferred” MTU size after 1500 (from a buffer conservation / maximization perspective)
i think is a good question, i dont know the answer, but, from my experience i think that buffer distribution has changed some times in CRS3xx since they exists, for example in early days CRS-326-24g-2s+ had several drops when downloading from an uplink 1g interface towards a 100m interface, later on that problem was mitigated, since that experience i avoided to mix speeds in other switches like crs317, i think that policy avoided me some headaches
i remember another user posted some years ago about a drop issue on crs317 solved by bonding 2 interfaces i think that was related to buffer too
Hardware Resources
The hardware (switch chips) has limited resources (memory). There are two main hardware resources that are relevant to QoS:
Packet descriptors - contain packet control information (target port, header alternation, etc).
Data buffers - memory chunks containing the actual payload. Buffer size depends on the switch chip model. Usually - 256 bytes.
And you at #13, with some inside information regarding contention on ingress interfaces, vs just dropping excess packets on the egress (lower speed) interface
and for reference, RouterOS now has tx-drop counters for both bonding-interfaces and bonding-members
in a recent comparison i made, bonding significantly reduces tx-drops, but it’s not zero (was zero at the time of that post, because the fields were not being populated then)
yeah i that previous topic i was refering to CRS-326-24g-2s+ situation, when downloading from an uplink 1g interface towards a 100m interface
now i am remembering better
the switch in that scenario was not droping on egress, the droping was at ingress uplink interface as overflow errors, because of that i was suposing a HOL blocking situation, in that case all switch performance was degraded
some time later i dont see overflow errors anymore on CRS 3xxx switches now only tx drop mostly on slower outbound interfaces
currently i am testing on some scenarios i think with some sucess, new qos GUI on winbox available on 7.16 helps a lot
on CRS317 with some interfaces not used, setting offline tx manager in that unused interfaces frees some buffers wich are added to other interfaces, changing shared buffer percentage also has an impact on interface fixed and shared buffers
in some cases i have reduced drops close to zero increasing in use interface buffers doing this 2 changes, i think this was tunned under de hood across 7.15.x and 7.16.x versions i have better results with 7.16.2 than 7.15.3 with same configs
The retransmission stats i extracted from a hosted appliance that transits the network (it’s a proxy measure, not a direct / reproducible on demand metric)
at the time the delta was from 3% down to 1,5% reported retransmissions, and that includes a “wifi” jump at the customer’s house
i still see occasional increases in drop counters, but since those are not exposed in SNMP, it’s hard to correlate when they occur
i remember seeing a dramatic increase in the drop frequency wnen a port was over ~ 70% utilization
possibly due to the statistical distribution nature of “port capacity/usage” sampling
the reported speed over “N seconds” is actually the integral of the instantaneous rate over $sampling_period
and inside that $sampling_period, the port very likely saw instantaneous packet-rates nearing or exceeding the physical limit
CCR2216 and CRS518 both use the 98DX8525. Nice! ..although that would stretch the budget a bit.
I thought all the 98DX21XX switches would be similar, but the CRS317 seems to be quite different. That also explains why it sticks out in the L3HW Support table. The CRS317 also seems to be one of the few switches with decentish interconnect between the CPU and switch chip (3Gb). I’m kinda curious if would be possible to reduce congestion by unloading known bursty traffic onto the CPU.
So not even 3MB after all, although the packet-cap is 11480 like on the 312.. weird.
If they aren’t needed, the QSFP sub-ports can be set “offline” to free a decent chunk of port buffers, which can then be reallocated to the shared pool.