Hi all,
I don’t have permission to reply to the announcement thread, so I’m posting here. The DNS server in 7.6rc1 serves cached CNAME responses in a way that causes getaddrinfo(3) to error out in at least recent versions of glibc. This causes most Linux and many IoT applications to break when trying to resolve CNAMEs, which most heavily affects CDN type stuff. It looks like at least one other person has seen the issue.
In addition to the normal response for A records, after the query has been cached AAAA queries return additional records for the IPv4 addresses. An example response:
Domain Name System (response)
Transaction ID: 0x12fc
Flags: 0x8180 Standard query response, No error
Questions: 1
Answer RRs: 1
Authority RRs: 0
Additional RRs: 4
Queries
api.twitter.com: type AAAA, class IN
Name: api.twitter.com
[Name Length: 15]
[Label Count: 3]
Type: AAAA (IPv6 Address) (28)
Class: IN (0x0001)
Answers
api.twitter.com: type CNAME, class IN, cname tpop-api.twitter.com
Name: api.twitter.com
Type: CNAME (Canonical NAME for an alias) (5)
Class: IN (0x0001)
Time to live: 1366 (22 minutes, 46 seconds)
Data length: 19
CNAME: tpop-api.twitter.com
Additional records
tpop-api.twitter.com: type A, class IN, addr 104.244.42.66
Name: tpop-api.twitter.com
Type: A (Host Address) (1)
Class: IN (0x0001)
Time to live: 261 (4 minutes, 21 seconds)
Data length: 4
Address: 104.244.42.66
tpop-api.twitter.com: type A, class IN, addr 104.244.42.2
Name: tpop-api.twitter.com
Type: A (Host Address) (1)
Class: IN (0x0001)
Time to live: 261 (4 minutes, 21 seconds)
Data length: 4
Address: 104.244.42.2
tpop-api.twitter.com: type A, class IN, addr 104.244.42.130
Name: tpop-api.twitter.com
Type: A (Host Address) (1)
Class: IN (0x0001)
Time to live: 261 (4 minutes, 21 seconds)
Data length: 4
Address: 104.244.42.130
tpop-api.twitter.com: type A, class IN, addr 104.244.42.194
Name: tpop-api.twitter.com
Type: A (Host Address) (1)
Class: IN (0x0001)
Time to live: 261 (4 minutes, 21 seconds)
Data length: 4
Address: 104.244.42.194
[Request In: 2]
[Time: 0.015108684 seconds]
And here’s how
socat
sees it:
D getaddrinfo("api.twitter.com", NULL, {1,0,1,6,0,0x0,0x0,0x0}, 0x7ffc82af8d60)
D getaddrinfo(,,,{0x0}) -> -2
E getaddrinfo("api.twitter.com", "NULL", {1,0,1,6}, {}): Name or service not known
I have confirmed with
strace
that the UDP messages are making it to userspace and RouterOS 7.6rc1 is the only DNS server I have tested causing this behaviour, so the structure of the response looks to be the cause. I don’t think that having
A
responses in the
Additional records
section of an
AAAA
request violates the spec necessarily, but it is certainly unexpected, and does not appear very helpful, since most software (including glibc) sends both
A
and
AAAA
requests simultaneously. The glibc DNS stack could definitely use some robustness improvements, but this response behaviour will break quite a few clients, including many that won’t be meaningfully updated (printers, TVs, etc.).
I have sent full PCAPs and debug logs to support@. I tested this on my CCR-2116, and rolling back to 7.6beta8 resolves the issue. Deployments with RouterOS DNS caching and Linux clients should probably avoid 7.6rc1.