DNS problem - with Kasa smart plugs

I have a small network with an RB750Gr2 as the main router and several RB951G-2HnD’s as APs controlled by Capsman on the 750.
All are running 6.47.8

Recently I added a couple of TP-link Kasa KP105 smart switches to the network but whilst they configure and connect fine, they are reverting to a “local only” state and not playing nicely with google home. I can control them fine on the app on the same lan.

While looking into it I did a few packet captures and noticed that when the router is configured to simply pass DNS requests (to google in this case but the same happens with opendns) after about 5 minutes I see requests from the Kasa device but no reply.
When I configure the RB750 to proxy the requests all DNS requests seem to work fine.
I don’t see this behaviour or have any problems with other devices on the network, including a sonoff device that does pretty much the same thing using different servers

I have two Kasa devices, connecting to different AP’s and both are doing the same thing.
They receive their IP address and DNS info from RB750 DHCP server

I have opened an issue with TP-link but I rather suspect the DNS may not be their problem but something on the Mikrotik side.
(I checked the format of the outgoing DNS packets and it seems fine…, wireshark decodes them perfectly so I don;t think there is an error in the request.
The DNS request arrives on the wireless interface and on the bridge in the router but seems to go no further.

Anyone come across a similar problem?

Good day, coding machine,
One possibility
MT does something strange with DNS queries, it changes their format if incorrect to the proper format.
This can break connectivity because devices sending out an initial communication query with a domain name of Kasa,
will get a return from the MT DNS resolver with domain name of kasa.

Technically the software coders for the plugs were incorrect in using upper case letter, so MT corrected it.
The problem is now the plug itself, says oh, I don’t recognize that domain I will not accept the return from MT, and thus reverts to local.
Or something like that.

Suggest that on the dhcp server or server network that the plugs are on, where it says gateway=192.168.x.1 and dns-server=192.168.x.1 change the dns-server=1.1.1.1
for example.
Then it should work if the above is the issue. The reason being is that other dns resolving sites simply regurgitate the same domain back to the originator.

Thanks anav,

I had a look and the Kasa device DNS request has no caps in it
the failing one happens to be n-devs.tplinkcloud.com

approx. 432 seconds earlier, exactly the same request was resolved correctly

Screenshot 2020-12-09 125145.png
strangely it is the requests that should be passed through without MT involvement that are not working… when the MT is the proxy it works.
I can;t see any case change in either case

Do you have any firewall rules that restrict traffic for these devices and DNS traffic?

I have over 20 tp-link kasa devices(103/105) that work fine - but I don’t restrict the devices. If my DNS servers are down, then the plugs will be in “local” only mode.

Also look at this post - the kasa devices don’t like certain subnets:

http://forum.mikrotik.com/t/tp-link-smart-plug-minis-not-keeping-connection-to-tp-link-cloud/142504/1

Stick with 192.168.x.x subnets.

Thanks Biomesh
That’s great info… And glad to know you have them running well.

I have no firewall rules that block anything and have no DNS issues with any other devices. (That I know of :wink:

The network is in 192.168.16.x
(The fact that they don’t pay nicely with 10.x.x.x doesn’t fill me with confidence)

The local mode was what prompted me to look into DNS as that sort of made sense as a possible culprit.

The strange thing is that they seem to connect occasionally and then lose connection again..(I occasionally see the devices from a 4G connection on my phone). Which I assume means that some of the dns requests eventually get through… I’ll have to leave packet capture on for longer to test that.

What firmware are you running?

I am running the latest stable on all of my devices (routers/swicthes/APs) 6.47.8.

I have been using these devices for a long time, so I doubt it is a firmware issue on the routeros side. If you can get a lan trace of one of the kasa devices of about 10-15 minutes it should give you a good idea. You can post it here if you want me or someone else to look at it too.

well, I ran a capture on the wifi interface closest to one of the KP105s using its mac address to pick up everything from boot time
this is with googles dns servers set via DHCP for most devices in the network

the Kasa is definitely getting the first few dns requests back correctly and then appears to be settling into a period of making groups of dns requests that get to reply

in the attached pcap for example there is a gap of nearly 17 minutes between packet 101 (4.6s) and packet 379 (1045s).. and all the 178 packets between are dns requests.

192.168.16.50 is the Kasa KP105 I’m interested in
192.168.16.52 is the other KP105 exhibiting the same behaviour
192.168.16.254 is the router
192.168.16.14 is a google home mini

i turned the kasa switch (.50) off and on first by the kasa app and then via google home app toward the end of the capture

am i missing something obvious?
every other DNS request on the lan seems to be being responded to quickly and correctly.

I have opened a case with TP-link… but they are far from quick :wink: (not that I expect amazing technical service on a £20 device)
kasa full wifi mac15min.zip (36.4 KB)

  • Packet 2: I see a 10 minute lease time in the trace. Most devices will not operate well with such a low lease time. I suggest to make it at least a few hours or a day.
  • Packets 43 & 48: The device cannot ping 8.8.8.8. This could be due to your firewall settings or it could be your ISP. I can do this on my network (via Comcast)

I took a trace from one of my plugs - in this case a HS105, and I don’t see any attempt to ping the DNS server or other devices. This could be due to different hardware / firmware levels on the plugs.

I would increase the lease time as 10 minutes is only really useful in large wifi environments where devices will be joining / leaving the network often. I set my lease time to 10 minutes on my subnet and it did not prompt a change in the plugs behavior.

I would then investigate why the pings are failing.

Thanks for the help biomesh :slight_smile:

I had not altered the lease time.. it was set to whatever MT had as default; in the manual that appears to be 10min
https://wiki.mikrotik.com/wiki/Manual:IP/DHCP_Server

I’ll try altering to an hour and see if it has an impact

Pings not working is also strange… there are no filters on icmp and i can ping 8.8.8.8 just fine from a wired desktop and wifi connected phone on the same network

a few months ago i had trouble with a Siemen VoiP base station not reaching the VoIP server. When i changed it from DHCP to statically assigned IP/Gateway/DNS it started working well. It had all worked fine up until 6.47.1 and reverting back to 6.46.6 fixed the issue immediately with no other changes… I did wonder about what changes had been made to DHCP and/or DNS and if I was just in some corner case where there was an “adverse event”.. The Siemens box has been rock solid since… and no other network changes

I may give 6.46.6 a try as a shot in the dark if i don’t make progress

it may me worth noting that when DNS requests are not answered I see them arrive on the wireless interface but not in the bridge. Whenever they are successful i see the packet hit the bridge as well. Again, strange that identical packets are behaving differently.

You might want to use the tplink tools here to see if the plug is reporting anything odd:

https://github.com/softScheck/tplink-smartplug

thanks. the python script seems to work locally and I can switch off/on and get info but I can see nothing yet that indicates what the problem is
… glad there are some hackers out there taking all these things apart.

still no joy.. rebooted everything, disabled and re-enabled fasttrack and they still eventually go to local only

I noticed that the kasa app seems to not indicate a problem until i make an attempt to turn on/off a device, at which point it appears to make a dns request for which it does not get a reply and then the app immediately indicates local only and there is suddenly a flurry of unanswered DNS requests
On google home i lose status info and it indicates the device as offline/not responding but commands to turn the devices off and on still work

In that state, when i go to mobile data I cannot control the devices in the kasa app but can in google home (although home still says offline and not responding… so commands get there but nothing comes back?

in the meantime i have reverted to using the dnsproxy making all devices go via the MT for DNS… that seems to fix DNS responses to the plugs, however the devices still go offline as before. I am guessing that it it may not only be the DNS packets that are being dropped but by using the proxy I am reducing one of the likely problems.

You mentioned capsman - are you using local forwarding or capsman forwarding?

I am also guessing that you updated the firmware on the plugs as well. (it normally does this when you first set them up)

no local forwarding.. i wanted all the wireless traffic to go via the router as there are few devices and with APs running 20Mhz 2.4Gig channels connected via gigE the main router running capsman was not going to be the roadblock and i prefer to see what is going on just for times like this :wink:

yes, firmware on plugs is latest 1.0.5

If you have client to client forwarding enabled in capsman then I am sure this is a tplink issue. I tried to duplicate everything you had but mine worked with no pings from the device at all. The firmware doesn’t seem to be common between their products so it could be a defect on their side.

many thanks for all the help biomesh

for clarity… i am not using local forwarding but it should not matter in this case as the Kasa device I am testing is generally not on the same AP as the things controlling it so there would not be any locally forwarded packets anyway … unless I’m missing something

it feels more likely to be a tp-link thing given the KP105 is a newish product (1yr) vs basic dns/dhcp/routing on a mature platform like routeros

… that said however, I don’t understand why i would be seeing DNS requests come in a capsman interface and apparently be dropped (unless for some reason the packet capture is not working properly.. but that seems very unlikely… ). I’m open to it being a config thing naturally but its only intermittent so that feels strange.. and almost everything in my firewall is defconf rules, while capsman is doing little more than a simple config and bridging packets

for the moment the improved DNS using MT proxy makes it better than it was but still losing connection

I don’t think it is a capsman issue, but I just wanted to mention that option in case it helped. I have client-to-client forwarding enabled. Per the wiki:

client-to-client-forwarding – controls if client-to-client forwarding between wireless clients connected to interface should be allowed, in local forwarding mode this function is performed by CAP, otherwise it is performed by CAPsMAN.

I use local forwarding as well.

You seem to have networking issues, can be locally or ISP, suspect more ISP side.

I see many DNS requests and DNS retransmissions, but nothing coming back from 8.8.8.8 or 8.8.4.4.

I suspect the reason it behaves better when using Router as DNS is router will cache the address for a while.

Suggest you capture packets on the 750 to see if the 750 receives the DNS requests from Kasa device, as well as replies from DNS servers. If all ok at 750, work backwards

thanks CZfan

there are indeed lots of unanswered DNS requests.
however, the only devices that seems to be happening with are the two Kasa ones.

It also happens regardless of the DNS settings… i.e. they don’t get a reply from my ISPs dns servers, or google, or opendns so it seems unlikely that any of those providers are not replying to a DNS request (unless my ISP is somehow losing or blocking DNS requests but again, that would affect all devices)
I’ve not had any DNS resolution issues with other devices and have run both Namebench and GRC

The idea that having a locally cached copy is what improved things is a good one (and it look s like the MT default cache time is 7 days) but the fact that it is only these devices is confusing and points to some other problem.
When i capture packets from all interfaces and view them in winbox I can see the DNS request come into the wireless interface then immediately go to the bridge and then its passed to my ISP. In the case of unresolved requests the DNS request packet does not get passed to the mikrotik bridge (or if it is, the packet sniffer is not capturing it).

I’ll keep looking and see if TP-link comes up with anyhting

Let me rephrase what you wrote: the DNS query packets that get responded are seen at both the wireless interface and the bridge, but those that don’t get responded are only seen at the wireless interface, not at the bridge?

If this is the case, what is the destination MAC address of those that reach the bridge, and what is the destination MAC address of those that don’t?