DNS Cache - Broken (6.38.3 + RC) [SOLVED!]

EDIT: So others may benefit, the answer to the problem has been placed at the end of this post.

Playing 6.39rc41 on a RouterBoard 951G 2HnD, I ran in to an issue with DNS, where the router would resolve correctly, and the client would time out. I stepped back to 6.38.3, where the problem persisted.

I am able to ping the interface and query REMOTE DNS servers. When configured to use alternate DNS servers, I am able to browse the web.

Steps taken to troubleshoot:

  1. Turn OFF Windows Firewall
  2. Verify Allow Remote Requests is checked.
  3. Rebuild bridged interface.
  4. Rebuild DHCP Server.
  5. Rebuild DHCP Pool.
  6. Vanilla config to dumb the issue down.
  7. Downgrade to stable.
  8. Repeat 2-5.
  9. Hard reset RouterBoard.
  10. Rebuild entire config.
  11. Build counter for DNS packets.
  12. Post to the forums.

The simplified config creates a bridge, places ether2 in that bridge, provides a DHCP address, pulls an IP from the lte interface, then nats ANY packet not intended for the local network.


The config:

/interface lte
set [ find ] mac-address=36:4B:50:B7:EF:DA name=lte1
/interface bridge
add name=LOCAL-BRIDGE
add name=REMOTE-BRIDGE
/ip pool
add name=LOCAL-POOL ranges=192.168.35.101-192.168.35.199
/ip dhcp-server
add add-arp=yes address-pool=LOCAL-POOL authoritative=yes disabled=no interface=LOCAL-BRIDGE lease-time=1h name=LOCAL-DHCP
/interface bridge port
add bridge=LOCAL-BRIDGE interface=ether2
add bridge=REMOTE-BRIDGE interface=ether3
add bridge=REMOTE-BRIDGE interface=ether4
add bridge=REMOTE-BRIDGE interface=ether5
/interface wireless cap
# 
set discovery-interfaces=ether1 enabled=yes interfaces=wlan1
/ip address
add address=192.168.35.1/24 interface=LOCAL-BRIDGE network=192.168.35.0
/ip dhcp-client
add dhcp-options=hostname,clientid disabled=no interface=ether1
add default-route-distance=4 dhcp-options=hostname,clientid disabled=no interface=lte1
/ip dhcp-server network
add address=192.168.35.0/24 dns-server=192.168.35.1 gateway=192.168.35.1 netmask=24
/ip dns
set allow-remote-requests=yes servers=8.8.8.8,8.8.4.4
/ip dns static
add address=192.168.35.1 name=inside.local
/ip firewall filter
add action=passthrough chain=input dst-port=53 in-interface=LOCAL-BRIDGE protocol=udp
add action=accept chain=input in-interface=LOCAL-BRIDGE
/ip firewall mangle
add action=mark-packet chain=prerouting in-interface=LOCAL-BRIDGE new-packet-mark=LOCAL-ROUTE passthrough=no
/ip firewall nat
add action=masquerade chain=srcnat dst-address=!192.168.35.0/24 packet-mark=LOCAL-ROUTE
/system clock
set time-zone-name=America/New_York

Ping, showing connection and name resolution on the router.

[admin@MikroTik] > ping www.google.com count=2
  SEQ HOST                                     SIZE TTL TIME  STATUS                                                                                                 
    0 216.58.217.68                              56  53 31ms 
    1 216.58.217.68                              56  53 77ms 
    sent=2 received=2 packet-loss=0% min-rtt=31ms avg-rtt=54ms max-rtt=77ms 

[admin@MikroTik] >

DNS attempts from the client, with changing to Google DNS working, proving NAT…

C:\Users\katamba-host>nslookup
DNS request timed out.
    timeout was 2 seconds.
Default Server:  UnKnown
Address:  192.168.35.1

> google.com
Server:  UnKnown
Address:  192.168.35.1

DNS request timed out.
    timeout was 2 seconds.
DNS request timed out.
    timeout was 2 seconds.
DNS request timed out.
    timeout was 2 seconds.
DNS request timed out.
    timeout was 2 seconds.
*** Request to UnKnown timed-out
> server 8.8.8.8
DNS request timed out.
    timeout was 2 seconds.
Default Server:  [8.8.8.8]
Address:  8.8.8.8

> google.com
Server:  [8.8.8.8]
Address:  8.8.8.8

Non-authoritative answer:
Name:    google.com
Addresses:  2607:f8b0:4004:80b::200e
          216.58.217.78

> server 8.8.4.4
Default Server:  google-public-dns-b.google.com
Address:  8.8.4.4

> google.com
Server:  google-public-dns-b.google.com
Address:  8.8.4.4

Non-authoritative answer:
Name:    google.com
Addresses:  2607:f8b0:4004:80b::200e
          216.58.217.78

Thanks for taking a look.

Answer:

The caching nameserver doesn’t play well with other caching nameservers. In this case, the LTE interface was providing a local address and cached DNS services. Pointing the router to Google and unchecking the box to use peer DNS resolved the issue.

Seems to be OK on my RB2011 6.39rc41 (10.0.1.1). Tested from Mac OS X 10.8.4. First time resolving katamba.com.

; <<>> DiG 9.8.3-P1 <<>> katamba.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 4181
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 13, ADDITIONAL: 0

;; QUESTION SECTION:
;katamba.com.			IN	A

;; ANSWER SECTION:
katamba.com.		3600	IN	A	46.101.152.103

;; AUTHORITY SECTION:
.			261616	IN	NS	j.root-servers.net.
.			261616	IN	NS	i.root-servers.net.
.			261616	IN	NS	l.root-servers.net.
.			261616	IN	NS	a.root-servers.net.
.			261616	IN	NS	d.root-servers.net.
.			261616	IN	NS	k.root-servers.net.
.			261616	IN	NS	g.root-servers.net.
.			261616	IN	NS	c.root-servers.net.
.			261616	IN	NS	e.root-servers.net.
.			261616	IN	NS	b.root-servers.net.
.			261616	IN	NS	f.root-servers.net.
.			261616	IN	NS	m.root-servers.net.
.			261616	IN	NS	h.root-servers.net.

;; Query time: 317 msec
;; SERVER: 10.0.1.1#53(10.0.1.1)
;; WHEN: Tue Mar  7 01:25:25 2017
;; MSG SIZE  rcvd: 256

Looks like your firewall might be the issue. Try disable firewall rules and test again.

First, thanks soonwai for the sanity check on the config.

Going with the idea that it could be firewall rules, I disabled the rules.

[admin@MikroTik] > /ip firewall filter print
Flags: X - disabled, I - invalid, D - dynamic 
 0 XI  chain=input action=passthrough protocol=udp dst-address=192.168.35.1 dst-port=53 log=no log-prefix="" 

 1 XI  chain=input action=accept in-interface=LOCAL-BRIDGE log=no log-prefix="" 
[admin@MikroTik] >

That didn’t work, evening following a reboot of the router and desktop.

Turning the turning the packet counting rule back on (passthrough accept on udp 53) to get metrics, I see the counter generally isn’t climbing.

It’s strange, but the RouterBoard isn’t seeing the request. Wireshark confirms that requests are leaving.

Leaving Wireshark running for an extended period (15 minutes), there is an even stranger result. A very few DNS requests, ie. Microsoft background stuff, do end up getting answered.

[admin@MikroTik] > /ip firewall filter print stats from=0
Flags: X - disabled, I - invalid, D - dynamic 
 #    CHAIN                                                                                ACTION                            BYTES         PACKETS
 0    input                                                                                passthrough                         523               7

I am posting from that test system, simply pointing to another DNS server.

Could this be an issue with the mangle and nat statements. have you tried disabling your mangle rule marking tracffic. and changing your nat rule to a basic rule like

add action=masquerade chain=srcnat comment="Masq WAN" dst-address=0.0.0.0/0 out-interface=wan src-address=192.168.35.0/24

change wan to the out interface you connect to internet on and then test again.

Thanks for the suggestion, I tried…still no luck. I took the suggestion to extreme levels after testing the specific masquerade rule.

In the new config, I’ve removed all bridges, putting the IP and DHCP services on ETH2, which I wire directly in to. There are no longer any tagging rules, or anything to distract the router from answering DNS.

  1. I get a DHCP address.


  2. I can ping the gateway by IP, 192.168.35.1


  3. I can ping Google DNS by IP, 8.8.8.8


  4. I can not look anything up by name utilizing 192.168.35.1


  5. I can look anything up by name utilizing 8.8.8.8

The config in the last test was:

/interface lte
set [ find ] mac-address=36:4B:50:B7:EF:DA name=lte1
/interface ethernet
set [ find default-name=ether1 ] disabled=yes
set [ find default-name=ether3 ] disabled=yes
set [ find default-name=ether4 ] disabled=yes
set [ find default-name=ether5 ] disabled=yes
/ip pool
add name=LOCAL-POOL ranges=192.168.35.101-192.168.35.199
/ip dhcp-server
add add-arp=yes address-pool=LOCAL-POOL authoritative=yes disabled=no interface=ether2 lease-time=1h name=LOCAL-DHCP
/interface wireless cap
# 
set discovery-interfaces=ether1 enabled=yes interfaces=wlan1
/ip address
add address=192.168.35.1/24 interface=ether2 network=192.168.35.0
/ip dhcp-client
add default-route-distance=4 dhcp-options=hostname,clientid disabled=no interface=lte1
/ip dhcp-server network
add address=192.168.35.0/24 dns-server=192.168.35.1 gateway=192.168.35.1 netmask=24
/ip dns
set allow-remote-requests=yes servers=8.8.8.8,8.8.4.4
/ip dns static
add address=192.168.35.1 name=inside.local
/ip firewall nat
add action=masquerade chain=srcnat dst-address=0.0.0.0/0 out-interface=lte1 src-address=192.168.35.0/24
/system clock
set time-zone-name=America/New_York

Additionally, I took the following steps on the host, then did another restart.

ipconfig /flushdns
ipconfig /registerdns
ipconfig /release
ipconfig /renew
NETSH winsock reset catalog
NETSH int ipv4 reset reset.log
NETSH int ipv6 reset reset.log

The results remain exactly the same.
edit: Formatting error.
edit: Additional steps.

can you torch the port. show protocol udp and port 53 see what is hitting it. Just confirm you connected directly. I have tried to replicate your results but so far have failed. Only difference I have from your setup is I am runnig the bugfix version.

Torching the interface shows the DNS packets arriving. Wireshark does not show them returning. I’ve also visually confirmed the cable.

in your dhcp have you set the domain suffix.in some point they suggest the adding of the suffix for local domain. I dont see how this would affect things but may be worth looking at.

I didn’t have one, and added it just because. I agree with you here, I don’t see it mattering. In end, it doesn’t. The same results.

I’m truly scratching my head here… I unplug the cable, then plug it in to the CRS-226 sitting on the desk, which trunks back to a CCR-1036, and the desktop works flawlessly (with much more complicated configs). Something in this 9516 just doesn’t like DNS. :slight_smile:

have you tried removing the static entry from dns. have you tested with another device ie linux, switch or any other operating system just to narrow it down.

Removing the static didn’t work. I dropped the simplified config on to a CRS109, with the same result.

It does, however, look like the problem is limited to Windows. OSX worked fine. I’m almost willing at this point to call it a bug, the problem is where the problem lays… I’d love to put a finger on it, but name resolution is working absolutely fine on the CCR.

the only other thing i can think of is to try is to add a firewall rule . allowing all traffic from lan to lan

add action=accept chain=input comment="LAN Traffic" dst-address=192.168.35.0/24 src-address=192.168.35.0/24
add action=accept chain=output comment="LAN Traffic" dst-address=192.168.35.0/24 src-address=192.168.35.0/24

the only reason I am saying add this is you not geting a response back from router.
the other thing you may check is the settings under

/ip settings
export

otherwise as you say may be a bug

Nope, no go. Someone else posted a similar problem in general. It looks like there is a bug. Thanks for the effort in trying to track this one down.

http://forum.mikrotik.com/t/dns-server-stop-working-from-time-to-time/106826/1

Set to resolved, and put the answer in the original post. Again, thanks everyone for taking the time to troubleshoot.

I suspected something like that but did not mention it because I did not see anything in your export.