I have a device that is (at the moment) connected directly to a port on my CCR1009-7G-1C-1S+. From time to time (it could be a week or it could be a few hours) the single TCP link from that device to a server on the internet fails. I believe I have traced the issue to the device not being able to obtain an IP address. I know... I know, it could be the cable... two different, brand new CAT7, tested cables. or the brand new CAT6A patch cables... cable. But hear me out.
The network in general works well, and all devices get a DHCP server-assigned IP, except for this one, from time to time, to the best of my knowledge.
But. Why do I see the issue manifest itself in this way?
I have started a sniff session on the CCR1009:
Code: Select all
only-headers: no
memory-limit: 100KiB
memory-scroll: yes
file-name:
file-limit: 1000KiB
streaming-enabled: yes
streaming-server: 192.168.10.30:37008
filter-stream: yes
filter-interface: ether2
filter-mac-address: 00:60:34:36:AD:61/FF:FF:FF:FF:FF:FF
filter-mac-protocol:
filter-ip-address:
filter-ipv6-address:
filter-ip-protocol:
filter-port:
filter-cpu:
filter-size:
filter-direction: any
filter-operator-between-entries: and
running: yes
What I see is the device sending DHCP Discover requests trying to re-obtain an IP:
And I can see the DHCP server on the CCR1009 claiming it is trying to provide the last held IP to the device:
The question is: Why can I not see the CCR1009 sending out a DHCP Offer packet on ether2 to 00:60:34:36:AD:61? The wire is pretty much irrelevant if the DHCP Offer reply packet does not leave the router.
Yes, some sniffer packets could be lost. But surely not all of them and not just when the device has trouble obtaining an IP. So this does not seem likely.
The problem goes away (for a while) if I reset the device. It does not go away if I re-plug the device, hopefully causing a link-fail detection and a complete DHCP renegotiation (I guess I am hoping for an internal state change in the network state of the device... DHCP is just DHCP after all).
The device is actually supposed to be connected to an "infrastructure switch" that is connected to the CCR1009. Today it stopped communicating while connected to that switch. But it also did not help when I unplugged the device from the infrastructure switch and plugged it directly into the ether2 on CCR1009. The CCR1009 should detect the device moved from the downstream switch and is not directly connected to ether2. So replying to the DHCP request should not be an issue even in this case.
Ah, I discovered why I cannot see the replies from the DHCP. It is because Replies from the HDCP server are sent to a broadcast MAC address. So this can be fixed by adding the router's MAC address to the sniffer filter and keeping the direction filter set to any.
Now I can see a hopeless Discover-Offer loop. So, yeah, it could be the wire.
But. A wire would provide random intermittent results. Otherwise, I can not explain why only a reset of the device would fix it. If this particular pattern would cause the particular EM interference... then the DHCP should not work in most cases. But not every time - until I reset the device, after which it works fine for days.
The DHCP offer packets from the router are identical, according to my Eye MK I, and my computer agrees. The only difference is the Transaction ID (and that is expected): Just that after the reboot, the device responded with a DHCP Request packet... And before the reset, it kept sending another DHCP Discover.
To be a bit more specific - this is an embedded device that is having issues - a boiler IP interface. (I am avoiding the word smart because I find it dumb to always send traffic to a server on the internet to control a device on the local LAN, and yes, if I wanted remote access to it would require a VPN, or data to be sent to the said server in this case only... But I digress.).
So... My question changed to:
- Is this still a cable issue or can we agree that it is not likely in this case
- And if it is not a cable issue, is it likely there is something wrong with the boiler's network stack?