Mikrotik RouterOS 6.49.6 strange issue when requesting TXT records from integrated DNS proxy

Hi folks,
i have a strange issue with the integrated DNS-Proxy of a Mikrotik RB4011 with RouterOS 6.49.6.
First, to reproduce the problem i first cleared the DNS-Cache with

/ip dns cache flush

All other router configuration were reset to a bare minimum.
For this Test i’m using cloudflare (1.1.1.1) as DNS-Forwarders in the Mikrotik settings.
Then i queried the DNS of the Mikrotik with dig

dig TXT gmx.net

First the results are OK and include all TXT Records of the domain, but after some time 5-10 Minutes the values returned from the Mikrotik are incomplete!!
To prove this i first made a request to the DNS-Proxy of the Mikrotik and then directly to the cloudflare server, you can see the difference, the entries are incomplete:
screenshot.png
Then i repeated clearing the dns cache, and again after some time the values are incomplete again.

To me it looks like the Mikrotik is using the two entries with the 24h TTL (86400s) as expiration for the cache and is forgetting the entries with the 5 minutes (300s) after 5 minutes but then not updating the cache when queried again.

Is this a known problem with the DNS-Proxy? For me this seams to be a serious bug with the cache. Can someone somehow reproduce this issue?

Thanks for any help.

Regards @colinardo

It seems clear what happens, it’s those different TTLs. Router first gets all six records, but four of them expire after five minutes. But when next query comes, router looks in cache and it still sees two perfectly good records, so it returns them. And yes, it’s wrong:

First, such mixed TTLs should not exist. But since it’s technically possible, it’s unavoidable that it will happen and DNS cache should handle it by using lowest TTL for all. When I test it with Unbound resolver, it returns mixed TTLs to clients, but it looks like it expires its cache based on lowest TTL.

If you want it fixed, write to support, they may not see it here.

Thanks for your clarifying post :+1:, OK this absolutely makes sense, i will open a ticket, and report if or when it is fixed.

Regards @colinardo

ISC’s bind (version 9.11.5) running on linux handles this situation in a way that it sets all TTLs to same value when first receiving RRs from upstream (either recursive or from forwarder) and returns those “floor-ed” TTLs to client … in my case I see all TTLs set to 300 in case reported by OP. Expiring cache in this case is straight business, cache clean up procedure doesn’t have to check if RR is part of RRset, each RR is handled separately.

BTW, google (8.8.8.8__) does the same.

I would also expect that cache would fix it and return same low TTLs to clients. But clearly not everything does it. On the other hand, each client still must be prepared that it might get mixed TTLs and be able to handle it. So if cache returns them and only takes care about not skipping some, sort of neutral approach (doesn’t make it better but not worse either), in a way it’s not too bad. I don’t know if there’s any further standard dealing with this.

Most client applications simply use query results and don’t need to worry about TTL. If (same) DNS data is needed (again), they run new queries. Some apps that do cache query results should indeed deal with TTLs properly.

Positive feedback from support

Hello,

Thank you for your report. 
I managed to reproduce such behavior and we are looking forward to fixing it in the further RouterOS releases.

Best regards,
Artūrs C.