Multicast issue

Hi,

I’m trying to solve a weird issue with multicast traffic. Maybe you could help me.

The setup:
I have uPNP/DLNA traffic between a media server (Synology NAS), a media renderer (Cambridge Audio CXNv2, an audio player) and a remote app deployed on a mobile phone. All three devices are on three different VLANs. The media server and renderer are connected to a CRS326-24G-2S+RM. which acts as the router, and I have a hAP AC2 connected to the CRS326 which offers wifi connectivity.

The issue:
At first sight, the configuration does not work, as I only see the media server in the 239.255.255.0. However if I reboot the CRS326, while the renderer is turned on then everything starts working fine: both the media server and renderer join the RP, I can browse the media server content from the renderer and from the mobile app. The media renderer reads the media with no issue in other words it works as expected… until I turn the media renderer off.

When I turn it off then turn it back on, it stops seeing the media server… When I check Winbox,the only difference I see is in PIM configuration: the renderer is not part of 239.255.255.0 anymore. I can wait 1 minute or 1 day, it won’t join and the only thing that makes it work again is to reboot the CRS326.

My analysis so far:
I suspect that at the startup of CRS 326 the media renderer is able to join the multicast group as soon as the ethernet port is available and before the “complete activation of its configuration”, however after that, as “something starts blocking it” the media renderer cannot join the multicast group anymore.

What could cause such a behavior (block) ? How can I diagnose what’s going on (torch ? but what should I look for) ?

Some more information:

  • there is a single bridge on the CRS326
  • the bridge has the “multicast-router=permanent” option set
  • the bridge has the “igmp-snooping=yes” option
  • ethernet ports used by the two devices are part of this bridge
  • bridge ports used by the devices have the “multicast-router=permanent” option set
  • VLANs used by the media renderer and media server are different
  • all three VLANs (including the one used by the mobile device) are tagged on the bridge
  • all filters rules to “drop packet” are configured to print a log message, but nothing appears in the logs
  • all devices get an IP through DHCP (static one for the media server and media renderer)
  • option 121 is set for DHCP servers for the coresponding VLANs (multicast route)
  • the RP is “close” to the media server (gateway IP)
  • a mangle rule is set on the CRS326 in order to increase the TTL of multicast udp packets on port 1900
  • a mangle rule is set on the CRS326 in order to increase the TTL of igmp packets
  • upnp is enabled for the two vlan interfaces, with the vlan interface used by the media_server as “internal” and the one used by the media renderer set as “external”
  • IPV6 is desactivated

Your help is greatly appreciated !

Kind regards,

Michel

What happens to the symptom when you put them all on the same LAN?

Don’t guess, find out. If the symptom recurs, it isn’t VLAN, routing, or PIM related. If it disappears, then the inverse is true, and we’ve divided the potential problem space in half.


I have a hAP AC2 connected to the CRS326 which offers wifi connectivity.

Is that relevant?

I do see you saying something about a “mobile device”, implying that you’re trying to use it to start playback, but I also see that this Cambridge Audio player has a front-panel UI. Can you play files using only its UI, leaving the WiFi leg of this out of the problem?

Pare the problem down to the simplest possible configuration, then build it back up until it fails. The last bit of complexity you added before that point is the source of the problem.


I only see the media server in the 239.255.255.0.

Are you saying that the DLNA server’s announcement always gets across the network so the audio receiver offers it as an option for selection, regardless of this reboot hack you’re using in your diagnosis?

If so, can you always browse the music as well, or only after the reboot hack?

Is the only failure point the playback step in absence of the reboot hack, or does the failure condition have broader effects?


When I turn it off then turn it back on, it stops seeing the media server…

At this point, I’d want to see network sniffer results (e.g. Torch, Wireshark…) showing a lack of DLNA broadcasts, IGMP group joins, etc.


I suspect that at the startup of CRS 326 the media renderer is able to join the multicast group as soon as the ethernet port is available and before the “complete activation of its configuration”

Since configuration activation likely takes a fraction of a second after bootup, faster than you can work through the playback UI with your fingers, I see only two ways that is true:

  1. It happens before any of the configuration is activated, which means the CRS326 is a dumb switch before this point. This I do not believe, as it would make the thing an awfully leaky router. Reboot it and it becomes transparent for dozens of seconds? Naaah. If we were dealing with something along the lines of a NetGear or DLink “smart” switch, I’d believe it to be a possibility, but not a proper router.

  2. There is some time-based option that engages after the configuration is applied. My guess? The IGMP querier, which you haven’t said a thing about.

Further to point #2, the output of “/export” has higher information density than prose, and it doesn’t forget to mention key details.

If this is ROS 6, be sure to hide-sensitive.

Beware editing anything out on relevance grounds: if you knew everything that was relevant, would you be having these troubles? If in doubt, leave the details in.


How can I diagnose what’s going on (torch ? but what should I look for) ?

Trace all IGMP traffic. Capture traffic on the relevant DLNA ports, then use Wireshark to analyze the messages.

You may have to learn the DLNA and IGMP protocols to a fairly deep level to work out what they’re trying to say and where they’re getting tripped up.


the bridge has the “igmp-snooping=yes” option

Does the symptom go away if you turn that off?

And before you object, yes, I do realize this makes it flood multicast. The question stands. This is a diagnosis step, not a solution.


a mangle rule is set on the CRS326 in order to increase the TTL of multicast udp packets on port 1900

Why? I see a maximum of 3 hops from what you’ve said. Is the DLNA server or the player actually sending out packets with TTL of 1 or 2?

I don’t believe this to be relevant, but it’s quite rare to see TTL below 4 from the initiating host.

Unless you’re seeing ICMP TTL timeout messages, I’d pull that out of your configuration. Every rule adds processing time, increasing CPU load and decreasing network performance. Pointless rules do that to no compensating benefit.

Inversely, this being a CRS3xx class device, it’s possible everything’s being hardware-offloaded, so your firewall rules are having no effect. This will also affect your ability to use Torch to investigate the problem. In CRS3xx devices, they try to do everything they can to make sure no packet transits the CPU.


increase the TTL of igmp packets

I doubt that’s helping. IGMP is a LAN-bound protocol, so any TTL ≥ 1 should always work. If it were otherwise, we wouldn’t need PIM.

Speaking of PIM, have you tried the IGMP proxy as a simpler alternative?

Thank you for your long answer tangent, it allowed me to narrow the issue to “igmp-snooping” (see below).
I can’t prepare the config export now, I will do it later.


Browsing of NAS files from the audio receiver works fine with the 2 devices on the same VLAN.


a mangle rule is set on the CRS326 in order to increase the TTL of multicast udp packets on port 1900

Unless you’re seeing ICMP TTL timeout messages, I’d pull that out of your configuration.

I removed it as you suggested.


My guess? The IGMP querier, which you haven’t said a thing about.

I forgot to activate it indeed… However once activated on the bridge with default configuration, there is no change: only the RP appears in the MFC as source (media server and audio reiceiver do not appear as sources).


the bridge has the “igmp-snooping=yes” option

Does the symptom go away if you turn that off?

Yes if I turn off igmp-snooping, everything seems to work fine (MFC shows media server and audio receiver sources with correct incoming/outgoing interfaces).


Speaking of PIM, have you tried the > IGMP proxy > as a simpler alternative?

I tried it in the beginning, but it wasn’t working. I just tried it again and I’m facing the exact same issue as with PIM: if I set igmp-snooping=yes then it’s not working.If I set it to “no” everything works fine.

I was hoping for the reverse: you had one set up and didn’t tell us. An IGMP querier can only make this problem worse, since its job is to pinch off unwanted streams. My best unstated hypothesis was that it was getting fooled into doing that pinch-off when it shouldn’t have. Thus the minutes of working configuration until the querier stepped in and broke things.

I’m not advocating that you turn it back off, but turning it on cannot help. It’s something to turn on as an optimization after you get things working and wish to prune unwanted abandoned streams.


if I turn off igmp-snooping, everything seems to work fine

That suggests the IGMP group join message isn’t getting from one VLAN to the other, so the IGMP snooping doesn’t set up the forwarding, so PIM doesn’t know to transport the media stream across the routing barrier.

I’d use a sniffer to check: does the player send the IGMP group-join message, and does it appear on the NAS VLAN, coming from the other side of the RP? If not, there’s your problem. Find out who’s dropping the group-join message.


I’m facing the exact same issue as with PIM: if I set igmp-snooping=yes then it’s not working.If I set it to “no” everything works fine.

To me that says you don’t need PIM. The IGMP proxy will do just fine, once you get the rest of your problem sorted.

Hi again,

I did as you suggested and spent some time using torch. With igmp snooping turned on, multicast querier turned off, and after having “flipped” UPnP configuration (set NAS VLAN to “external” and audio receiver VLAN to “internal”) I see the following:


  • I see IGMP traffic from the audio receiver VLAN arriving to the RP on the NAS VLAN.
  • I see UDP traffic from Media Server to 239.255.255.0 on NAS VLAN, then from 239.255.255.250 on audio receiver VLAN
  • MFC content “makes sense”, i.e. audio reciever source IP uses RP, incoming interface is its VLAN, outgoing interface is mobile device VLAN / media server sourc IP uses RP, incoming interface is NAS VLAN, outgoing interfaces are the audio receiver and mobile device
  • MFC is being correctly updated, and sources + outgoing interfaces reappear.
  • If I “standby” the audio receiver then “un-standby” it, everything’s fine.
  • If I completely turn the audio receiver off then turn it on, I stop seeing UDP traffic from media server to 239.255.255.250. As a result, the audio receiver is not able to find/browse the media server anymore. Waiting for 1 hour does not change anything.
  • If I then turn on then turn off multicast querier on the bridge, everything starts working as expected again after at most 1 min (delay between announcements)…

Here are the commands I run to make things work again…

/interface bridge set [find name="bridge"] multicast-querier=yes
/interface bridge set [find name="bridge"] multicast-querier=no

So far this ugly workaround works, but I really don’t understand what causes this behavior.

I attach my current configuration file,maybe it can help you help me (I only removed lines which are clearly unrelated):
mikrotik_router1_20220319_dlna.rsc (12.7 KB)

It sounds like the querier is causing more problems than it’s worth. It’s best used with continuous long-running streams like IPTV or OS image updates.

In its absence, the worst that’s likely to happen is that if the receiver is restarted without being able to send out an IGMP group-leave packet first, the NAS will continue sending the stream out until it finishes. It’s wasteful, but it costs, what 1-2 minutes or so on average for a song to complete?