Community discussions

MikroTik App
 
cowpoke
just joined
Topic Author
Posts: 4
Joined: Thu Oct 06, 2022 11:52 am

7.6rc1 cached DNS CNAME responses break with glibc/Linux

Thu Oct 06, 2022 2:54 pm

Hi all,

I don't have permission to reply to the announcement thread, so I'm posting here. The DNS server in 7.6rc1 serves cached CNAME responses in a way that causes getaddrinfo(3) to error out in at least recent versions of glibc. This causes most Linux and many IoT applications to break when trying to resolve CNAMEs, which most heavily affects CDN type stuff. It looks like at least one other person has seen the issue.

In addition to the normal response for A records, after the query has been cached AAAA queries return additional records for the IPv4 addresses. An example response:
Domain Name System (response)
    Transaction ID: 0x12fc
    Flags: 0x8180 Standard query response, No error
    Questions: 1
    Answer RRs: 1
    Authority RRs: 0
    Additional RRs: 4
    Queries
        api.twitter.com: type AAAA, class IN
            Name: api.twitter.com
            [Name Length: 15]
            [Label Count: 3]
            Type: AAAA (IPv6 Address) (28)
            Class: IN (0x0001)
    Answers
        api.twitter.com: type CNAME, class IN, cname tpop-api.twitter.com
            Name: api.twitter.com
            Type: CNAME (Canonical NAME for an alias) (5)
            Class: IN (0x0001)
            Time to live: 1366 (22 minutes, 46 seconds)
            Data length: 19
            CNAME: tpop-api.twitter.com
    Additional records
        tpop-api.twitter.com: type A, class IN, addr 104.244.42.66
            Name: tpop-api.twitter.com
            Type: A (Host Address) (1)
            Class: IN (0x0001)
            Time to live: 261 (4 minutes, 21 seconds)
            Data length: 4
            Address: 104.244.42.66
        tpop-api.twitter.com: type A, class IN, addr 104.244.42.2
            Name: tpop-api.twitter.com
            Type: A (Host Address) (1)
            Class: IN (0x0001)
            Time to live: 261 (4 minutes, 21 seconds)
            Data length: 4
            Address: 104.244.42.2
        tpop-api.twitter.com: type A, class IN, addr 104.244.42.130
            Name: tpop-api.twitter.com
            Type: A (Host Address) (1)
            Class: IN (0x0001)
            Time to live: 261 (4 minutes, 21 seconds)
            Data length: 4
            Address: 104.244.42.130
        tpop-api.twitter.com: type A, class IN, addr 104.244.42.194
            Name: tpop-api.twitter.com
            Type: A (Host Address) (1)
            Class: IN (0x0001)
            Time to live: 261 (4 minutes, 21 seconds)
            Data length: 4
            Address: 104.244.42.194
    [Request In: 2]
    [Time: 0.015108684 seconds]
And here's how
socat
sees it:
D getaddrinfo("api.twitter.com", NULL, {1,0,1,6,0,0x0,0x0,0x0}, 0x7ffc82af8d60)
D getaddrinfo(,,,{0x0}) -> -2
E getaddrinfo("api.twitter.com", "NULL", {1,0,1,6}, {}): Name or service not known
I have confirmed with
strace
that the UDP messages are making it to userspace and RouterOS 7.6rc1 is the only DNS server I have tested causing this behaviour, so the structure of the response looks to be the cause. I don't think that having
A
responses in the
Additional records
section of an
AAAA
request violates the spec necessarily, but it is certainly unexpected, and does not appear very helpful, since most software (including glibc) sends both
A
and
AAAA
requests simultaneously. The glibc DNS stack could definitely use some robustness improvements, but this response behaviour will break quite a few clients, including many that won't be meaningfully updated (printers, TVs, etc.).

I have sent full PCAPs and debug logs to support@. I tested this on my CCR-2116, and rolling back to 7.6beta8 resolves the issue. Deployments with RouterOS DNS caching and Linux clients should probably avoid 7.6rc1.
 
User avatar
Znevna
Forum Guru
Forum Guru
Posts: 1347
Joined: Mon Sep 23, 2019 1:04 pm

Re: 7.6rc1 cached DNS CNAME responses break with glibc/Linux

Thu Oct 06, 2022 5:36 pm

I'm sure that they can replace whatever they built in-house for DNS and DHCPv4 with a minimal build of dnsmasq but for some reason they don't want to.
They'd got rid of so many complaints regarding both of those services, and they wouldn't have to fiddle with fixing random introduced bugs all the time.
 
cowpoke
just joined
Topic Author
Posts: 4
Joined: Thu Oct 06, 2022 11:52 am

Re: 7.6rc1 cached DNS CNAME responses break with glibc/Linux

Thu Oct 06, 2022 6:33 pm

I don't think I would go that far without knowing the codebase. Integrating third-party developed services in a robust way requires a lot of work and tends to have a large amount of hidden fiddling. I would hope that there would be a test suite of clients to ensure compatibility for a service like DNS, though. Even with a widely used project like dnsmasq broad testing would also need to be present to make sure the integration doesn't deteriorate unnoticed.
 
pe1chl
Forum Guru
Forum Guru
Posts: 10183
Joined: Mon Jun 08, 2015 12:09 pm

Re: 7.6rc1 cached DNS CNAME responses break with glibc/Linux

Thu Oct 06, 2022 6:49 pm

At first glance, the problem appears to be that the A/AAAA records are sent as "additional records" instead of "answer records".
I think that is only to be done when an NS query is answered (with domain names), to give the A/AAAA of those domain names.

Answering a request for a DNS name that is a CNAME with both the CNAME and A/AAAA records is common practice, and certainly is handled correctly by glibc.
But the result should be Answer records, not Additional records.
 
cowpoke
just joined
Topic Author
Posts: 4
Joined: Thu Oct 06, 2022 11:52 am

Re: 7.6rc1 cached DNS CNAME responses break with glibc/Linux

Thu Oct 06, 2022 7:05 pm

I think that in this case having a A records sent in the Answer section of a response to an AAAA query would also lead to undesired behaviour, though I have not tested it specifically.
 
pe1chl
Forum Guru
Forum Guru
Posts: 10183
Joined: Mon Jun 08, 2015 12:09 pm

Re: 7.6rc1 cached DNS CNAME responses break with glibc/Linux

Thu Oct 06, 2022 8:20 pm

I can assure you that is not true. Those answers are sent by regular resolvers all the time:

dig -t A www.youtube.com @8.8.8.8

; <<>> DiG 9.11.5-P4-5.1+deb10u7-Debian <<>> -t A www.youtube.com @8.8.8.8
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 36903
;; flags: qr rd ra; QUERY: 1, ANSWER: 9, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;www.youtube.com. IN A

;; ANSWER SECTION:
www.youtube.com. 21149 IN CNAME youtube-ui.l.google.com.
youtube-ui.l.google.com. 300 IN A 142.250.179.142
youtube-ui.l.google.com. 300 IN A 142.251.36.46
youtube-ui.l.google.com. 300 IN A 142.250.179.174
youtube-ui.l.google.com. 300 IN A 142.250.179.206
youtube-ui.l.google.com. 300 IN A 142.251.36.14
youtube-ui.l.google.com. 300 IN A 142.251.39.110
youtube-ui.l.google.com. 300 IN A 172.217.168.206
youtube-ui.l.google.com. 300 IN A 216.58.208.110

;; Query time: 11 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Thu Oct 06 19:20:16 CEST 2022
;; MSG SIZE rcvd: 206

When I do the same query via RouterOS v7.6rc1 I get a similar response:

dig -t A www.youtube.com @192.168.1.1

; <<>> DiG 9.11.5-P4-5.1+deb10u7-Debian <<>> -t A www.youtube.com @192.168.1.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 39335
;; flags: qr rd ra; QUERY: 1, ANSWER: 11, AUTHORITY: 13, ADDITIONAL: 6

;; QUESTION SECTION:
;www.youtube.com. IN A

;; ANSWER SECTION:
www.youtube.com. 2034 IN CNAME youtube-ui.l.google.com.
youtube-ui.l.google.com. 231 IN A 216.58.214.14
youtube-ui.l.google.com. 231 IN A 216.58.208.110
youtube-ui.l.google.com. 231 IN A 142.251.36.14
youtube-ui.l.google.com. 231 IN A 142.250.179.206
youtube-ui.l.google.com. 231 IN A 142.251.39.110
youtube-ui.l.google.com. 231 IN A 172.217.168.238
youtube-ui.l.google.com. 231 IN A 142.251.36.46
youtube-ui.l.google.com. 231 IN A 142.250.179.142
youtube-ui.l.google.com. 231 IN A 142.250.179.174
youtube-ui.l.google.com. 231 IN A 172.217.168.206

;; AUTHORITY SECTION:
com. 1058 IN NS b.gtld-servers.net.
com. 1058 IN NS k.gtld-servers.net.
com. 1058 IN NS d.gtld-servers.net.
com. 1058 IN NS l.gtld-servers.net.
com. 1058 IN NS g.gtld-servers.net.
com. 1058 IN NS i.gtld-servers.net.
com. 1058 IN NS f.gtld-servers.net.
com. 1058 IN NS h.gtld-servers.net.
com. 1058 IN NS c.gtld-servers.net.
com. 1058 IN NS e.gtld-servers.net.
com. 1058 IN NS m.gtld-servers.net.
com. 1058 IN NS j.gtld-servers.net.
com. 1058 IN NS a.gtld-servers.net.

;; ADDITIONAL SECTION:
b.gtld-servers.net. 1058 IN A 192.33.14.30
d.gtld-servers.net. 1058 IN A 192.31.80.30
f.gtld-servers.net. 1058 IN A 192.35.51.30
c.gtld-servers.net. 1058 IN A 192.26.92.30
e.gtld-servers.net. 1058 IN A 192.12.94.30
a.gtld-servers.net. 1058 IN A 192.5.6.30

;; Query time: 7 msec
;; SERVER: 192.168.1.1#53(192.168.1.1)
;; WHEN: Thu Oct 06 19:21:28 CEST 2022
;; MSG SIZE rcvd: 550

As you can see, IP of DNS servers are sent in the additional section, IP relating to CNAME in the Answer.
 
pe1chl
Forum Guru
Forum Guru
Posts: 10183
Joined: Mon Jun 08, 2015 12:09 pm

Re: 7.6rc1 cached DNS CNAME responses break with glibc/Linux

Thu Oct 06, 2022 9:35 pm

In fact the only thing that I notice that is different from most other resolvers, is the inclusion of an authority and additional section in reply to a query for A or AAAA records of a subdomain.
 
cowpoke
just joined
Topic Author
Posts: 4
Joined: Thu Oct 06, 2022 11:52 am

Re: 7.6rc1 cached DNS CNAME responses break with glibc/Linux

Fri Oct 07, 2022 1:11 am

dig -t A www.youtube.com @8.8.8.8
This is not the equivalent of what is happening here. First, the resolver in
dig
is not affected by this issue, it is much more robust than the resolver in the various libc implementations. This issue is really best viewed with a packet capture. Second, you are not looking at the request that has the problem. The equivalent to what glibc does would be
dig -t A www.youtube.com @8.8.8.8; dig -t AAAA www.youtube.com @8.8.8.8
. Due to how it is programmed, glibc will fail if it doesn't like one of the responses. I have not done extensive testing on this front, it seems records of a type it does not expect will reliably trigger this issue, no matter which section they are found in.
A
records in the Answer section for an
A
request are fine, and
AAAA
records in the Answer section for an
AAAA
request are fine, but nothing puts
A
records in the Answer section for an
AAAA
request. The packet analysis in my first post shows that the request was for
AAAA
records, and what was sent back (in the additional section) was
A
records. This is what I believe is the issue, IPv4 records are being sent in response to a query the resolver is expecting to have IPv6 records.
 
Sob
Forum Guru
Forum Guru
Posts: 9119
Joined: Mon Apr 20, 2009 9:11 pm

Re: 7.6rc1 cached DNS CNAME responses break with glibc/Linux

Fri Oct 07, 2022 1:22 am

I see two differences for CNAMEs:

1) Order of records in aswers. If there's A/AAAA query, first response (when there's no cached data) returns CNAME followed by A or AAAA records (other servers do it like this). Subsequent queries (when router has cached answers) return A/AAAA before CNAME.

2) If there's query for AAAA and router has also cached A records, it sends them in additional section.

I'd guess that 2) is probably harmless, only useless, so my bet is on 1). I can't say if it's wrong, I didn't find any RFC about order or records in answers (it could be me not looking hard enough, I don't know). But I can imagine that if everything always returned CNAMEs first, (some) resolvers could simply assume it's always the case and not handle different order.
 
chiem
newbie
Posts: 41
Joined: Fri Oct 24, 2014 4:48 pm

Re: 7.6rc1 cached DNS CNAME responses break with glibc/Linux

Fri Oct 07, 2022 6:05 am

There may be a 3rd issue, but I abstained from reporting it until #1 is fixed to be sure--but it appeared to me that static DNS results were bypassed if the upstream result was a CNAME.
 
Sob
Forum Guru
Forum Guru
Posts: 9119
Joined: Mon Apr 20, 2009 9:11 pm

Re: 7.6rc1 cached DNS CNAME responses break with glibc/Linux

Fri Oct 07, 2022 2:39 pm

Yeah, there's something happening with that too. But it's not just 7.6rc1, there's problem in 7.5 already (at least, I don't have older 7.x right now).

In 7.6rc1, if there's already CNAME cached, then it has priority. If not, then static A is used and upstream is not asked about it.
In 7.5, cached CNAME has priority too. But when it's not cached and there's static A, it will still ask upstream, ignore the response and return nothing.

They should do more testing for changes like these, it can cause so many headaches...
 
JoshDi
newbie
Posts: 37
Joined: Fri May 21, 2021 4:49 pm

Re: 7.6rc1 cached DNS CNAME responses break with glibc/Linux

Wed Oct 12, 2022 5:01 pm

I am also experiencing this DNS issue with cached results with ROS v7.5 and v7.6rc. Following the thread for a fix or potential workaround.
 
pe1chl
Forum Guru
Forum Guru
Posts: 10183
Joined: Mon Jun 08, 2015 12:09 pm

Re: 7.6rc1 cached DNS CNAME responses break with glibc/Linux

Wed Oct 12, 2022 6:47 pm

It is better to follow the 7.6rc release topic in that case. There is another fix in 7.6rc2 but I have not yet installed it.
 
Sob
Forum Guru
Forum Guru
Posts: 9119
Joined: Mon Apr 20, 2009 9:11 pm

Re: 7.6rc1 cached DNS CNAME responses break with glibc/Linux

Thu Oct 13, 2022 6:53 am

Quick test with 7.6rc2:

- it fixed the order of records, CNAME is now before A/AAAA(s) (same as other resolvers do it)
- responses to AAAA queries still include unsolicited A(s) in additional section

So it should be better, but in release thread there's also complaint about another problem with CNAME chains (I didn't test it myself), so it's still too early to celebrate.
 
chiem
newbie
Posts: 41
Joined: Fri Oct 24, 2014 4:48 pm

Re: 7.6rc1 cached DNS CNAME responses break with glibc/Linux

Fri Oct 14, 2022 12:43 pm

Mikrotik said they couldn't reproduce the CNAME chain issue, and neither can I:

Config:
/ip dns static add cname=bar name=foo type=CNAME
/ip dns static add cname=baz name=bar type=CNAME
/ip dns static add address=1.2.3.4 name=baz
/ip dns static add cname=www.youtube.com name=youtube type=CNAME
...
 $ dig foo

; <<>> DiG 9.18.7 <<>> foo
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 32230
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;foo.                           IN      A

;; ANSWER SECTION:
foo.                    86400   IN      CNAME   bar.
bar.                    86400   IN      CNAME   baz.
baz.                    86400   IN      A       1.2.3.4

;; Query time: 0 msec
;; SERVER: 192.168.0.1#53(192.168.0.1) (UDP)
;; WHEN: Fri Oct 14 02:40:54 PDT 2022
;; MSG SIZE  rcvd: 71
...
 $ dig youtube

; <<>> DiG 9.18.7 <<>> youtube
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 9232
;; flags: qr rd ra; QUERY: 1, ANSWER: 10, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;youtube.                       IN      A

;; ANSWER SECTION:
youtube.                86400   IN      CNAME   www.youtube.com.
www.youtube.com.        271     IN      CNAME   youtube-ui.l.google.com.
youtube-ui.l.google.com. 271    IN      A       142.250.191.46
youtube-ui.l.google.com. 271    IN      A       142.250.189.174
youtube-ui.l.google.com. 271    IN      A       142.250.189.206
youtube-ui.l.google.com. 271    IN      A       142.251.32.46
youtube-ui.l.google.com. 271    IN      A       142.250.188.14
youtube-ui.l.google.com. 271    IN      A       142.251.214.142
youtube-ui.l.google.com. 271    IN      A       142.250.189.238
youtube-ui.l.google.com. 271    IN      A       142.251.46.206

;; Query time: 8 msec
;; SERVER: 192.168.0.1#53(192.168.0.1) (UDP)
;; WHEN: Fri Oct 14 02:41:25 PDT 2022
;; MSG SIZE  rcvd: 219
 
pe1chl
Forum Guru
Forum Guru
Posts: 10183
Joined: Mon Jun 08, 2015 12:09 pm

Re: 7.6rc1 cached DNS CNAME responses break with glibc/Linux

Fri Oct 14, 2022 1:09 pm

I think what you need is an external DNS name, not a static one.
And the A/AAAA records must have a shorter TTL than the CNAME records.
In that case, when the A/AAAA records have expired and the toplevel name is queried again, only the cached CNAME records are returned and no A/AAAA.
The resolver should do another A/AAAA query for the lowest level name and include that result in the response.
Also, I think it should not put arbitrary records in the additional section. The standard resolvers do not do that. But likely that will not hurt, it just will not be used.
 
chiem
newbie
Posts: 41
Joined: Fri Oct 24, 2014 4:48 pm

Re: 7.6rc1 cached DNS CNAME responses break with glibc/Linux

Fri Oct 14, 2022 2:33 pm

I think what you need is an external DNS name, not a static one.
And the A/AAAA records must have a shorter TTL than the CNAME records.
In that case, when the A/AAAA records have expired and the toplevel name is queried again, only the cached CNAME records are returned and no A/AAAA.
The resolver should do another A/AAAA query for the lowest level name and include that result in the response.

2 of the 3 records involved in my youtube dig are external.. and the 2nd CNAME and last A record have the same TTL because they both were uncached, so there doesn't appear to be an issue with looking up an external A record for an external CNAME that aren't cached. Where are you getting all these requirements from? Do you know of an external CNAME to CNAME to A record result that triggers this? Feel free to share if you do, that'll help Mikrotik folks diagnose this.

Who is online

Users browsing this forum: No registered users and 12 guests