Community discussions

MikroTik App
 
omahena
just joined
Topic Author
Posts: 3
Joined: Wed Sep 28, 2016 3:07 pm

DHCP offering lease without success

Fri Apr 21, 2023 11:58 am

Before everyone yells: "CABLE!"... And before I tear the house apart... I wanted to understand the information I am seeing in trying to diagnose the issue.

I have a device that is (at the moment) connected directly to a port on my CCR1009-7G-1C-1S+. From time to time (it could be a week or it could be a few hours) the single TCP link from that device to a server on the internet fails. I believe I have traced the issue to the device not being able to obtain an IP address. I know... I know, it could be the cable... two different, brand new CAT7, tested cables. or the brand new CAT6A patch cables... cable. But hear me out.

The network in general works well, and all devices get a DHCP server-assigned IP, except for this one, from time to time, to the best of my knowledge.

But. Why do I see the issue manifest itself in this way?

I have started a sniff session on the CCR1009:
                     only-headers: no
                     memory-limit: 100KiB
                    memory-scroll: yes
                        file-name: 
                       file-limit: 1000KiB
                streaming-enabled: yes
                 streaming-server: 192.168.10.30:37008
                    filter-stream: yes
                 filter-interface: ether2
               filter-mac-address: 00:60:34:36:AD:61/FF:FF:FF:FF:FF:FF
              filter-mac-protocol: 
                filter-ip-address: 
              filter-ipv6-address: 
               filter-ip-protocol: 
                      filter-port: 
                       filter-cpu: 
                      filter-size: 
                 filter-direction: any
  filter-operator-between-entries: and
                          running: yes
So, I asked the router to send me packets transmitted on the ether2 port and related to the mac address of the device 00:60:34:36:AD:61 to my wireshark instance in my computer.

What I see is the device sending DHCP Discover requests trying to re-obtain an IP:
sniff.PNG

And I can see the DHCP server on the CCR1009 claiming it is trying to provide the last held IP to the device:
dhcp.PNG
The question is: Why can I not see the CCR1009 sending out a DHCP Offer packet on ether2 to 00:60:34:36:AD:61? The wire is pretty much irrelevant if the DHCP Offer reply packet does not leave the router.

Yes, some sniffer packets could be lost. But surely not all of them and not just when the device has trouble obtaining an IP. So this does not seem likely.

The problem goes away (for a while) if I reset the device. It does not go away if I re-plug the device, hopefully causing a link-fail detection and a complete DHCP renegotiation (I guess I am hoping for an internal state change in the network state of the device... DHCP is just DHCP after all).

The device is actually supposed to be connected to an "infrastructure switch" that is connected to the CCR1009. Today it stopped communicating while connected to that switch. But it also did not help when I unplugged the device from the infrastructure switch and plugged it directly into the ether2 on CCR1009. The CCR1009 should detect the device moved from the downstream switch and is not directly connected to ether2. So replying to the DHCP request should not be an issue even in this case.

Ah, I discovered why I cannot see the replies from the DHCP. It is because Replies from the HDCP server are sent to a broadcast MAC address. So this can be fixed by adding the router's MAC address to the sniffer filter and keeping the direction filter set to any.

Now I can see a hopeless Discover-Offer loop. So, yeah, it could be the wire. :lol:

But. A wire would provide random intermittent results. Otherwise, I can not explain why only a reset of the device would fix it. If this particular pattern would cause the particular EM interference... then the DHCP should not work in most cases. But not every time - until I reset the device, after which it works fine for days.

The DHCP offer packets from the router are identical, according to my Eye MK I, and my computer agrees. The only difference is the Transaction ID (and that is expected):
dhcp_diff.PNG
Just that after the reboot, the device responded with a DHCP Request packet... And before the reset, it kept sending another DHCP Discover.

To be a bit more specific - this is an embedded device that is having issues - a boiler IP interface. (I am avoiding the word smart because I find it dumb to always send traffic to a server on the internet to control a device on the local LAN, and yes, if I wanted remote access to it would require a VPN, or data to be sent to the said server in this case only... But I digress.).

So... My question changed to:
  • Is this still a cable issue or can we agree that it is not likely in this case
  • And if it is not a cable issue, is it likely there is something wrong with the boiler's network stack?
Thank you for your time. I kept my path of discovery to help other users understand and trace similar issues.
You do not have the required permissions to view the files attached to this post.
 
holvoetn
Forum Guru
Forum Guru
Posts: 5481
Joined: Tue Apr 13, 2021 2:14 am
Location: Belgium

Re: DHCP offering lease without success

Fri Apr 21, 2023 12:17 pm

Nice analysis !
Waiting here for further responses, I'm intrigued now.

FWIW I have an IoT device at home (water softener) which makes it a habit to connect to my IoT Wifi network, send out it's data and then "sometimes" completely disconnects all network traffic.
But 5 minutes later it's there again:
- if needed, connect to Wifi
- request DHCP (it gets the same, duh !)
- connection request outbound to the central server of that company
- send data
- but sometimes (not sure what scheme it follows) it disconnects Wifi (gone from registration table on AP and it's not that it moves to another AP, that's not the case)
 
biomesh
Long time Member
Long time Member
Posts: 562
Joined: Fri Feb 10, 2012 8:25 pm

Re: DHCP offering lease without success

Fri Apr 21, 2023 2:37 pm

I have only seen this on an older IP camera. I am sure there is a bug in the dhcp client or networking stack on the camera that causes this. Setting the device to a static IP avoids this issue for me.
 
omahena
just joined
Topic Author
Posts: 3
Joined: Wed Sep 28, 2016 3:07 pm

Re: DHCP offering lease without success

Sun May 07, 2023 10:44 pm

Here is a follow-up after a few weeks of sniffing... :D It's a slow process so there was not much to add so far. Perhaps there is precious little to add even now.

But I may have partial answers to my questions.

After a few more failures over first several days, I investigated the original ethernet cable that shipped with this device, which was also deployed by the installer and was in use during the initial failures of my device. It is the "standard length" of "high quality" CAT5e U/UTP cable. :o Now... My work is not focused on cables and I am not a professional installer either. But... In my book, CAT5e U/UTP is pretty much the lowest-quality patch cable you could find. If a rat farts in the basement there is bound to be some signal distortion on that wire. It gets weirder... When I talked to support about the situation with my boiler the support engineer brought up this checklist:
The lack of stability is often caused by the components of the network.
Please consider the following options:
- A fixed IPv4-adress
- Communication is based on TCP (not UDP)
- The connecting cable must be UTP – not STP or FTP

That should stabilize the module.
When I responded:
The building infrastructure these days uses S/FTP cables pretty much as a rule to shield from EM interference. Can you explain why an Unshielded Twisted Pair (UTP) cable should be used? I used the original patch cable provided to connect the KM200 to the Ethernet socket installed by the electrician. It should be OK.
To my surprise, he responded:
the UTP cable is mandatory to prevent the EM interference hit the KM200 directly.
Too much interference might confuse the module. Your analysis sees to show exactly that.

For the last mile from the socket to the KM200 please use a UTP cable.
Can anyone... See any reason, whatsoever, to use UTP cable in this case... or any other?

I have had nothing but trouble from UTP cables... and by definition they are unshielded. Why would anyone say that
UTP cables are mandatory to prevent EM interference
.

Of course... when I checked what type of cable was shipped with the device and discovered it was indeed U/UTP... I changed it for a CAT6A S/FTP cable and indeed the situation did stabilize. I now did not see the connection drop in more than a week... perhaps two. So I seem to have shown empirically that shielded cables are indeed... more shielded from interference than unshielded.

But... Any idea if UTP would be desirable in certain situations? I can't think of one.

So... Yes, it was very likely the cable, again. :lol: I should not have trusted that the manufacturer provided adequate cable for their device. But the devil is in the details and even an unshielded cable should only cause intermittent problems.

So to my second question, I do not have a definitive answer. But thinking about the situation... Let's say that a packet would indeed get corrupted by interference and unfortunately, that packet would be the DHCP lease offer from the router. The device discards the corrupted packet and sends out a new HDCP request packet. To which the router responds with the bit-for-bit identical DHCP lease offer that for some reason now always gets corrupted and the device is caught in the HDCP request-offer loop without ever ack-ing the offered IP. I would think such "reliable interference" caused by the UTP cable to be quite unlikely. So, there is probably also something wrong with the network stack on the device. Perhaps the interference messed up the receiving capability of the device altogether.

@holvoetn These IoT devices can be pretty strange. I guess when you try to build something on a tight budget it is easy to make one compromise too many. But a boiler controller and water softener device are not exactly a 5EUR Chinese WiFi lightbulb. At costs of several hundred EUR or even over 1000EUR... I would expect a little more from them.

@biomesh Thank you for sharing your IP camera experience. Perhaps both our devices share the same embedded network stack baked into an ethernet "bridge-type" chip. Or maybe it is the Ethernet PHY chip getting overwhelmed by a small voltage spike that then disconnects the receiver part and does not notify the upper stack about the failure state.

I agree a static IP would likely make the connection loss issue less frequent or even make it go away altogether. But - there is no way to configure it that I am aware of. I think my device used to have a web interface that I believe got removed from the firmware in the recent releases (or on my hw revision). So I can't do more than make the device's DHCP server lease static - always offering the same IP.
 
User avatar
mkx
Forum Guru
Forum Guru
Posts: 11598
Joined: Thu Mar 03, 2016 10:23 pm

Re: DHCP offering lease without success

Mon May 08, 2023 12:03 am

Re shielded cables: in principle they should fare better than unshielded. But the devil is in details: if shield is connected to ground on both ends and both connected devices don't share ground potential (because one or both are not properly grounded or grounding point is not the same ... quite often if the connection runs between two buildings), then a ground loop can happen ... and some (serious) stray currents can flow over that shield. Which at least causes excessive corosion of any metallic part. But can disturb poorly designed electronic devices which provide "grounded" RJ45 socket but fail to ground them properly. (And electric boiler is something that can become a source of stray currents if the heating element gets slightly damaged).
The same is usually not a problem with UTP cable ... data pins are isolated, receivers can handle ten or twenty volts. But with connected shield, a few volts are enough to drive some serious current.

There are a few solutions. One is the one mentioned by the support engineer (use unshielded TP cables). The better one is to use shielded TP but only grounded on one end. The third one is to modify one of connected devices (boiler comms module or switch/router) to break ground connection of RJ45 jack. Etc.

Who is online

Users browsing this forum: Bing [Bot], kusterh, nizce, Question and 105 guests