Broken DNS

I set up my main router as a DNS server so that I could add static entries (with an internal TLD, call it “.WXYZ”) for all my subscribers. Other than that, it just passes along DNS requests to my provider’s DNS servers.

Anyone using this DNS encounters sporadic failures, where the browser claims it can’t find certain hosts despite the fact that they actually exist.

Originally, I thought the problem was that the default packet size was too small and increased it. For a while, I thought the problem had gone away, but it hadn’t.

Today, I encountered the same error trying to invoke the “all account activity” report at PayPal. I was told that the host “history.paypal.com” could not be found. I changed my PC’s DNS to my provider’s DNS and the problem immediately went away. Then I logged into the main router and found this entry in the DNS cache:

    Name                            Type     Data           TTL
N   history.paypal.com.22.33.44.55  unknown  0.0.0.0        23:54:17

“22.33.44.55” is my provider’s first DNS server. I can’t find any documentation to tell me what the “N” means in the first column, but I assume it’s not good. No other entry in the cache has anything in that column. (This is in Winbox, not Terminal.)

I can’t see how this could be caused by anything I did in my DNS. My settings are

                servers: [provider's first DNS],[provider's second DNS],[common public DNS]
  allow-remote-requests: yes
    max-udp-packet-size: 8192
             cache-size: 2048KiB
          cache-max-ttl: 1w
             cache-used: 1509KiB

I have about 100 static entries, all of which are simple strings (no regexps), and all of which end in “.WXYZ”. I have the following firewall rule:

33   ;;; DNS
     chain=input action=accept protocol=udp in-interface=!1-WAN dst-port=53

Other than that, I’m doing nothing strange or complicated.

I also can’t see how this could be caused by a bug in my provider’s DNS, since manually setting my PC’s DNS to my provider’s DNS instead of my own gives me proper results right off the bat.

What does the N mean, and how might I go about tracking this problem down?

Maybe not the exact same problem, but I think actual DNS caching implementation needs some attention. Performance are really poor. For example, an HTTP request from an LAN host for a target that has it’s DNS record cached by a 450G is taking typically 5 seconds.

@macsrwe
Don’t mix DNS servers from different providers.
Set provider only or public only servers.
OpenDNS is a good choice.
Also set ‘cache-max-ttl’ parameter to lower value, eg. 12h.

HTH,

In RouterOS 5.18 the changelog mentioned changes to DNS whose bearing on this problem was not crystal clear, but I think they must have been, since I haven’t seen it crop up since then.

I will update this to say that I have seen the problem since, but it is very, very rare now. All my subscribers (and I) have been using my internal DNS now for over six months, with no problems reported by them and only one or two instances of failure seen in my own usage.

I find it interesting that no one has come forward to explain what “N” means in that first column. It certainly does not appear anywhere in the wiki.

N - negative

HTH,