AES67 Audio CRS3xxx Switches

sogorman · Mon Jun 27, 2022 9:50 pm

Hello everyone, looking for some ideas on working through some AES67 audio and PTPv2 issues that I am having with my CRS3xxx swithces.

Topology

AES67 Device 1G -> CRS32824P4SRM -> TWO 1G RTSP UPLINKS -> CRS3171G16S+ -> TWO 1G RTSP UPLINKS -> CRS32824P4SRM -> 1G AES67 Device

Basically, I have a AES67 PTPv2 device going into a hub CRS32824P4SRM switch with two RSTP uplink ports to our core CRS3171G16S+ switch and then a remote device hanging off another CRS32824P4SRM.

Issue

Every minute or so I am getting PTP offsets of up to 3000ns (3ms) between the two AES67 devices despite taking only two layer 2 hops. (see attached device error log). Everything is on the same VLAN and there is no layer 3 routing. I have ben looking into DSCP QOC but in all honesty it looks like a ball of wax on the current Mikrotik firmware.

Any thoughts as to what I can do to make this settle down on what is essentially just a layer2 'dumb' network?

Tue Jun 28, 2022 2:49 am

CRS32824P4SRM

Surely you mean the CRS328-24P-4S+RM? If you want to abbreviate, say "CRS328" if it doesn't matter if you're talking about the 20S or 24P version, or "CRS328-24P" if it does matter, as when PoE is involved. That is, use the shortest meaningful prefix. That can go down to "CRS3xx" or "CRS" in some contexts.

Spelling it out fully can be helpful. Those dashes and pluses are meaningful to those of us who know how to read MikroTik's product naming scheme. In this particular case…

TWO 1G RTSP UPLINKS -> CRS3171G16S+

…an informed reading of the product naming scheme calls into question why you aren't using 10G fiber links between those switches. You've got four in the 328, and 16 in the 317. Why copper, in that situation?

Do you really mean "RTSP" here, presumably part of AES67, to provide media stream negotiation?

I ask for two reasons.

One, you use "RSTP" elsewhere in your message. They're very much not the same thing!

Two, when a networking geek like me sees "two 1G uplinks" between two switches, we immediately think of redundant links and the need for a technology like RSTP to prevent loops. If that's what's going on, RSTP election fights could explain your symptom. I'm normally a fan of enabling RSTP everywhere, but in a hard-real-time application like this one, I think you can't afford even "rapid" spanning-tree negotiations. For such a simple topology, loop-protect should suffice.

If the reason for two links is some hope of redundancy, what real-world conditions do you achieve that goal under? How often do you get a single port dying, and not an 8-port cluster, or the whole switch? How often does a cable cut affect one cable and not the others that run in the same cable chase?

The only case I can see where it might help is the highly unlikely one where someone manually unplugs one of the two cables while the system is "live." If that's a substantial risk in your situation, are you employing untrained cable monkeys, or what? "Don't do that" should go without saying.

If I can't talk you out of redundant 1G links and RSTP, such as because you're using the SFP+ interfaces for something else, and you must be able to interoperate with non-MikroTik switches that don't understand RouterOS's loop protection scheme, at least configure the bridge priorities to force the root election to go the same way every time, as long as both links are present.

If the problem isn't RSTP, then you might be running into the SFP port flapping issue in the CRS328, fixed in 7.4beta2. Alas, that fix isn't in any stable release yet, but I've been running the beta on my CRS328 for 3+ weeks now, and it seems fine. My uptime is back down to less than a day due to the beta5 release earlier today, but the logs have been quiet since.

see attached device error log

Try again…

I have ben looking into DSCP QOC

What's "QOC"? Do you mean QoS?

in all honesty it looks like a ball of wax on the current Mikrotik firmware.

Really? One switch rule should suffice:

/interface/ethernet/switch/rule
add dscp=0 ports=ether1,ether2,ether3… rate=100M switch=switch1

That is, everything on the potentially-conflicting ports without a DSCP tag is limited to 10% of the maximum throughput of the presumed 1G links on those ports, so that it'd take ten of them in concert to swamp a 1G uplink. In the presumably more common case of just one bad actor, 90% of the uplink rate is dedicated to traffic carrying DSCP tags.

Obviously there is plenty of room for adjustment here. Apply with care and sensibility for your local needs.

Incidentally, I can't help but point out that the scheme as-presented is what you'd get for free by using 10G fiber uplinks between the switches: it'd take ten bad actors on 1G ports to swamp the uplink in that case. Even if you need no more than 1G between any two endpoints, flow aggregation may end up being a substantial benefit in your situation.

sogorman · Tue Jun 28, 2022 5:23 am

@tangent you took me to school on a number of things and I respect and appreciate that. Some background, I am a broadcast audio engineer with a 'decent' background in networking which has served me well in these days of RVON, Dante, and AES67. That all being said I am an audio engineer and not a network engineer. If you want to talk about compression, time alignment, and or comb filters I am your man. If you want to talk BGP, i'm not your man. That all being said, I appreciate your feedback and 'criticism', it is. rightfully deserved.

-- RTSP / RSTP... in my post everything should be RSTP. We use RSTP for the network to recover form fiber outages in a some what 'rapid' manner. This has served us well with our Dante audio streams (basically AES67 & PTP v1). We have a hub and spoke network with redundant link to all the spokes. To your question on 'how often does a link go down', well it's not that often but when it does happen the network has to recover rapidly as this is 'live tv'.

-- I am with you on eliminating RSTP as the cause. I tuned down on of the redundant ports on the switches which unfortunately resulted in no change in behavior.

-- I will give using 10G as an interconnect between the hub and spokes as well as upgrading to 7.4beta2 and see if there is any change in behavior.

Again, I appreciate all the insight, ideas and feedback on the networking side of audio engineering.

AES67 Audio CRS3xxx Switches

AES67 Audio CRS3xxx Switches

Re: AES67 Audio CRS3xxx Switches

Re: AES67 Audio CRS3xxx Switches

Who is online