split DNS setup problem

I have a site-to-site connection between two routers over wireguard.

Site A: router.lacinet address 192.168.14.254/24
Site B: router.kavicsnet address 192.168.18.254/24

Split-DNS is not working. Example:

[gandalf@router.lacinet] > /ping 192.168.18.254
  SEQ HOST                                     SIZE TTL TIME       STATUS
    0 192.168.18.254                             56  64 39ms412us
    1 192.168.18.254                             56  64 41ms493us
    2 192.168.18.254                             56  64 38ms510us
    sent=3 received=3 packet-loss=0% min-rtt=38ms510us avg-rtt=39ms805us max-rtt=41ms493us

[gandalf@router.lacinet] > /ping borika-pc.kavicsnet
invalid value for argument address:
    invalid value of mac-address, mac address required
    invalid value for argument ipv6-address
    while resolving ip-address: name does not exist
[gandalf@router.lacinet] > :put [/resolve borika-pc.kavicsnet]
failure: dns name does not exist
[gandalf@router.lacinet] > :put [/resolve borika-pc.kavicsnet server=192.168.18.254]
192.168.18.199
[gandalf@router.lacinet] >

So router.lacinet can see router.kavicsnet. DNS works when I specify the server directly. But it does not work when I do not specify the server.

Here is my split DNS setup:

[gandalf@router.lacinet] > /ip/dns/static/
[gandalf@router.lacinet] /ip/dns/static> print detail where type=FWD
Flags: D - dynamic; X - disabled
 0    regexp=".*\.visznet" type=FWD forward-to=192.168.5.254 ttl=1d

 1    ;;; visznet
      regexp=".*\.5\.168\.192.\in-addr\.arpa" type=FWD forward-to=192.168.5.254 ttl=1d

 2    regexp=".*\.kavicsnet" type=FWD forward-to=192.168.18.254 ttl=1d

 3    ;;; kavicsbanya-base
      regexp=".*\.18\.168\.192.\in-addr\.arpa" type=FWD forward-to=192.168.18.254 ttl=1d

It is even more disturbing that it works with other site-to-site networks. Example:

[gandalf@router.lacinet] /ip/dns/static> :put [/resolve sanyi-pc.visznet]
192.168.5.104

But with this particular network, it does not want to work. Since the direct request to the remote DNS server works, I think we can rule out any firewall or connection problem.

What is happening here?

IMO the problem is that MT’s DNS server doesn’t perform recursive lookups. In your case it would have to because of FWD record. Any other DNS client seeing this record would know to contact next hop DNS server, but MT doesn’t.

In short, DNS server in ROS is very limited in functionality and if trivial functions are not enough, you should install proper DNS server (e.g. pihole running on a raspberry pi) in your network.

If that is true, then why it is working for the other network (and other FWD record)?

Error message mentioning MAC address makes me think problem might actually involve routing config … an obscure one for sure. But without knowing all the config (pissibly of both routers) and exact network layout it’s a guessing game.

It cannot be a routing problem, because a direct DNS request succeeds. It also precludes any firewall config error.

[gandalf@router.lacinet] > :put [/resolve borika-pc.kavicsnet server=192.168.18.254]
192.168.18.199
[gandalf@router.lacinet] > :put [/resolve borika-pc.kavicsnet]
failure: dns name does not exist

There are many similar sites connected, and only this one is not working as it should

[gandalf@router.lacinet] /ip/dns/static> /ip/dns/static/print detail where type=FWD
Flags: D - dynamic; X - disabled
 4    regexp=".*\.visznet" type=FWD forward-to=192.168.5.254 ttl=1d

 5    ;;; visznet
      regexp=".*\.5\.168\.192.\in-addr\.arpa" type=FWD forward-to=192.168.5.254 ttl=1d

 6    regexp=".*\.kavicsnet" type=FWD forward-to=192.168.18.254 ttl=1d

 7    ;;; kavicsbanya-base
      regexp=".*\.18\.168\.192.\in-addr\.arpa" type=FWD forward-to=192.168.18.254 ttl=1d

 8    regexp=".*\.sznet" type=FWD forward-to=192.168.13.254 ttl=1d

 9    ;;; sznet-base
      regexp=".*\.13\.168\.192.\in-addr\.arpa" type=FWD forward-to=192.168.13.254 ttl=1d

10    regexp=".*\.eger.magnet" type=FWD forward-to=192.168.19.254 ttl=1d

11    ;;; base-eger.magnet
      regexp=".*\.19\.168\.192.\in-addr\.arpa" type=FWD forward-to=192.168.19.254 ttl=1d

12    ;;; vlan-eger.magnet
      regexp=".*\.19\.10.\in-addr\.arpa" type=FWD forward-to=192.168.19.254 ttl=1d

40    regexp=".*\.miskolc.magnet" type=FWD forward-to=192.168.20.254 ttl=1d

41    ;;; base-miskolc.magnet
      regexp=".*\.20\.168\.192.\in-addr\.arpa" type=FWD forward-to=192.168.20.254 ttl=1d

42    ;;; vlan-miskolc.magnet
      regexp=".*\.20\.10.\in-addr\.arpa" type=FWD forward-to=192.168.20.254 ttl=1d

I can send the whole config but it is quite long.

The MAC address message comes from ping, and not resolve.

[gandalf@router.lacinet] /ip/dns/static> /ping borika-pc.kavicsnet
invalid value for argument address:
    invalid value of mac-address, mac address required
    invalid value for argument ipv6-address
    while resolving ip-address: name does not exist
[gandalf@router.lacinet] /ip/dns/static> :put [/resolve borika-pc.kavicsnet]
failure: dns name does not exist
[gandalf@router.lacinet] /ip/dns/static>

Does resolving of borika-pc.kavicsnet work for clients, connected to problematic router’s LAN segment? With router set as DNS server? If yes, what does wireshark trace show, who does recursive queries, client or ROS DNS server?
Is it possible that local router received negative answer from forwarding server for this particular host name and is using cached negative answer which didn’t expire yet (could be it’s using forwarder record TTL for that).

Try if this gives you some useful info:

/system logging add topics=dns

It does not work. Example:

╭─gandalf@laci-desktop nkp-dbeger-laci ~  
╰─$ host borika-pc.kavicsnet 192.168.14.254                                                                                             1 ↵
Using domain server:
Name: 192.168.14.254
Address: 192.168.14.254#53
Aliases: 

Host borika-pc.kavicsnet not found: 3(NXDOMAIN)
╭─gandalf@laci-desktop nkp-dbeger-laci ~  
╰─$ host borika-pc.kavicsnet 192.168.18.254                                                                                             1 ↵
Using domain server:
Name: 192.168.18.254
Address: 192.168.18.254#53
Aliases: 

borika-pc.kavicsnet has address 192.168.18.199



I think it is not possible, because the 192.168.18.254 router has borika-pc.kavicsnet added as a static DNS entry. Just to make sure, I have changed the ttl of all FWD records to 1 minute, but it has no effect.

Looks like it does not even try to forward the question:

08:38:59 dns,packet question: borika-pc.kavicsnet.:A:IN 
08:38:59 dns query from 10.14.10.105: #51485 borika-pc.kavicsnet. A 
08:38:59 dns done query: #51485 dns name does not exist 
08:38:59 dns,packet --- sending reply to 10.14.10.105:47309: 
08:38:59 dns,packet id:103b rd:1 tc:0 aa:0 qr:1 ra:1 QUERY 'name error' 
08:38:59 dns,packet question: borika-pc.kavicsnet.:A:IN 
08:39:00 dns,packet --- got query from 192.168.14.100:1742: 
08:39:00 dns,packet id:f76e rd:1 tc:0 aa:0 qr:0 ra:0 QUERY 'no error' 
08:39:00 dns,packet question: borika-pc.kavicsnet.:A:IN 
08:39:00 dns query from 192.168.14.100: #51486 borika-pc.kavicsnet. A 
08:39:00 dns done query: #51486 dns name does not exist 
08:39:00 dns,packet --- sending reply to 192.168.14.100:1742: 
08:39:00 dns,packet id:f76e rd:1 tc:0 aa:0 qr:1 ra:1 QUERY 'name error' 
08:39:00 dns,packet question: borika-pc.kavicsnet.:A:IN

Whereas query for another network is forwarded to the forwarder:

08:41:22 dns,packet question: dbserver.visznet.:MX:IN 
08:41:22 dns query from 192.168.14.100: #51646 dbserver.visznet. MX 
08:41:22 dns,packet --- sending udp query to 192.168.5.254:53: 
08:41:22 dns,packet id:b807 rd:1 tc:0 aa:0 qr:0 ra:0 QUERY 'no error' 
08:41:22 dns,packet question: dbserver.visznet.:MX:IN 
08:41:23 dns,packet --- got answer from 192.168.5.254:53: 
08:41:23 dns,packet id:b807 rd:1 tc:0 aa:0 qr:1 ra:1 QUERY 'no error' 
08:41:23 dns,packet question: dbserver.visznet.:MX:IN 
08:41:23 dns done query: #51646 dns name exists, but no appropriate record 
08:41:23 dns,packet --- sending reply to 192.168.14.100:4263: 
08:41:23 dns,packet id:28ca rd:1 tc:0 aa:0 qr:1 ra:1 QUERY 'no error' 
08:41:23 dns,packet question: dbserver.visznet.:MX:IN

I think when I switched from ros6 to ros7, the forward DNS didn’t work for me.
Try edit FWD regex appeding “.?$” for matching ending dot in the query, looks like
regexp=“.*.visznet.?$” type=FWD forward-to=192.168.5.254 ttl=1d

I get “dns name does not exist” logged when there’s already cached negative answer. So it could be that there was query for that before you added FWD record, that got cached, and you need to either wait until it times out of flush cache.

It is possible. One and a half days passed, and right now it is working:

╭─gandalf@laci-vivobook-linux.lacinet okt-dbrep-laci ~  
╰─$ host borika-pc.kavicsnet
borika-pc.kavicsnet has address 192.168.18.199
╭─gandalf@laci-vivobook-linux.lacinet okt-dbrep-laci ~  
╰─$ host borika-pc.kavicsnet.
borika-pc.kavicsnet has address 192.168.18.199
╭─gandalf@laci-vivobook-linux.lacinet okt-dbrep-laci ~  
╰─$

For me, it seems to be working with or without an ending dot. This was always that way, even though the regexp does not match names with an ending dot:

add forward-to=192.168.5.254 regexp=".*\\.visznet" ttl=1m type=FWD

If that was the real problem, then there is a conclusion for me - I’ll never leave the default ttl for FWD records, because it makes DNS fragile. An intermittent connection error to the forwarder might make a whole subdomain unavailable. It is not probable, but it is possible and can cause lots of problems.

Thank you!

If you’re using your DNS system only for a few (tens?) devices, then amount of DNS requests won’t be huge and you can safely use TTLs with length in order of a few minutes. Number of DNS queries still won’t DDoS your DNS servers.

AFAIK the ending dot is local thing, it doesn’t go into DNS packets. If you want to make sure that regexp matches only TLD and not something in the middle of hostname, end it with $.

And I don’t think that FWD record’s TTL should affect anything. It’s not real record, only instruction for resolver what server to use. It has TTL probably only because they added it as fake record and all other real records have TTL.

Thanks for Nagyizs and Sob resolving my error in trailing dot problem.

The FWD record TTL is equal the successfully resolved DNS cached name TTL and begin counting down.
You can overwrite this with Cache Max TTL when the value lower than FWD record TTL.
DNS resolve request not forwarded until name record is cached, servicing is performed from the cache until the record expires.

I thought the same. However, the problems went away only after setting ttl=1m on the FWD records, and then waiting one day. I suspect that when a forwarder fails, then the failure is cached with the ttl of the forwarder. E.g. if the FWD record has ttl=1d and the forwarder is not available at the moment, then the NXDOMAIN is cached for a whole day. It is not documented anywhere (or at least I could not find it), but it seems to be working that way.

If the forwarder resolves the name, then it returns the address and its own TTL. E.g. it should not be equal to the TTL of the FWD record, because it has its own TTL. If the forwarder is not available, then NXDOMAIN is cached, and its TTL will be equal to the ttl of the FWD record. This is my experience - I cannot check this, because NXDOMAIN cache entries are not listed under /ip/dns/cache. Can somebody please confirm this?

If the forwarder resolves the name, then it returns the address and its own TTL. E.g. it should not be equal to the TTL of the FWD record, because it has its own TTL.

My domain is forwarded to a bind9 nameserver.

$TTL 86400
@       IN SOA  ns0 hostmaster (
        202201269  ; serial
        604800     ; refresh (1 week)
        86400      ; retry (1 day)
        2419200    ; expire (4 weeks)
        300        ; minimum - Negative Cache TTL (5min)
        )

Its correct, not the FWD TTL but also response TTL will be equal the cached value. I think the mikrotik DNS server sends the response along with the name record TTL.

When i stopped the bind9 and resolve a domain name in terminal, get the error message:
failure: dns server failure
and mikrotik dns cache not changed.
When dns is working (bind9 is runnig) and probe resolve an FQDN which is not exist in domain, bind9 send NXDOMAIN and negative cache TTL i.e. 5 minutes.
This is cached by mikrotik dns cache.
To list NXDOMAIN type entries try in terminal:

/ip/dns/cache/all print where negative

Sometimes help is one:

/ip/dns/cache/ flush

Sob written:

And I don’t think that FWD record’s TTL should affect anything. It’s not real record, only instruction for resolver what server to use. It has TTL probably only because they added it as fake record and all other real records have TTL.

Today it went wrong again, but with a different hostname.

I followed your advice and I found the host in the negative cache:

[gandalf@router.lacinet] /ip/dns> /ip/dns/cache/all print where negative
Flags: N - NEGATIVE
Columns: NAME, TTL
#   NAME                        TTL
0 N _LDAP._TCP                  8h8m17s
1 N channel.status.request.url  10h44m51s
2 N local                       11h39m53s
3 N mw40.home                   21h33m30s
4 N stun.ideasip.com            11m8s
5 N PENZTAR-PC.VISZNET          23h16m39s
6 N wpad.lacinet                23h59m11s
7 N stands-app.lacinet          23h59m11s

The problematic one is penztar-pc.visznet.

I don’t understand, why it is having 23h16m ttl? That record has ttl=1m on the authoritative server:

[gandalf@viszfuvar.visznet] /ip/dns/static> print detail where name~"penztar.*"
Flags: D - dynamic; X - disabled
23    ;;; #DHCP
      name="penztar-pc.visznet." address=192.168.5.176 ttl=1m

And the FWD record also has 1m ttl:

[gandalf@router.lacinet] /ip/dns/static> print detail where regexp~".visznet"
Flags: D - dynamic; X - disabled
 4    regexp=".*\.visznet" type=FWD forward-to=192.168.5.254 ttl=1m

Where is this 23h coming from??? (I guess its initial value was 1d, because I first experienced the problem about an hour ago.)