DNSSEC Problems (large UDP packets)

Hi!

I have a problem with my DNS solution and hope to find help here.

Currently I’m trying to run a pi-hole with unbound as a DNS resolver. However unbound won’t resolve any names.
I’m pretty sure it’s the large DNSSEC UDP packets that somehow get dropped (?) by the MikroTik, because the exact same setup is working flawlessly with a FritzBox router.
Also choosing any other upstream DNS server without DNSSEC is working without a problem.

If unbound is running and I change the router from the FritzBox to the MikroTik, it can still resolve normal DNS requests (which are quite small I guess), however if I then restart unbound it wont resolve anything anymore. I guess that’s because then it’s contacting the root servers at start which give these huge replies…

This is really frustrating, I did quite some research and searched the net and forum, but I haven’t found anything that could help me.
Clearly I can’t be the only one with this setup…
Does anybody have some hints at what to do?

If you want to be completely sure, packet sniffer is your friend. But if it’s just packets passing through router, it’s like any other udp traffic, RouterOS doesn’t have anything against that.

Set Unbound to use a packet size of 1232 and if bigger is returned it swtches over on TCP/53 so you need also accept that in your firewall.

I’ve recently seen a case where huge packets came in via WAN, got reassembled, and got forwarded to a LAN destination without fragmentation because the mtu parameter of the bridge was left at auto and all the member ports had MTU larger than the popular value of 1500. So here too, if the recipient (running the pi-hole) has a smaller MTU than what is set on the bridge/member ports, packets exceeding recipient’s MTU get lost. Sniffing will show the size of the received and sent “versions” of packets.

@msatter, are you saying that when the query response is too big, the DNS server initiates a TCP session to the sender of the query?

..are you saying… brrrr Channel 4 News fibes also overhere?
https://m.youtube.com/watch?v=aMcjxSThD54

No, it does tell on request the size that will fit.

Unbound:

edns-buffer-size:

Number of bytes size to advertise as the EDNS reassembly buffer size. This is the value put into datagrams over UDP towards peers. The actual buffer size is determined by msg-buffer-size (both for TCP and UDP). Do not set higher than that value. Default is 1232 which is the DNS Flag Day 2020 recommendation. Setting to 512 bypasses even the most stringent path MTU problems, but is seen as extreme, since the amount of TCP fallback generated is excessive (probably also for this resolver, consider tuning the outgoing tcp number).

Mikrotik: max-udp-packet-size: 4096 Wiki on this: Maximum size of allowed UDP packet. Docs is silent on this setting.

Bear in mind that my native language is a bit more different from English than yours, so what exactly is wrong about “are you saying” as compared to “do you have in mind that” or some other phrase equal to “please clarify:”?

English is also not my native language, but it serves well as a kind of Esparato and is very simple language that can be used around the world.

When you watch the video you will see why it projects as a negative fibe towards many. It implies that what I am saying, does not match what you are thinking or what is acceptable in general. It urges one to be on defensive instead of being equal.

The automatic response to those words should be…NO, because it was just said it and if one oppose it then give ones own arguments. Then the conversation is equal again.

The NO in my answer was here also a direct reply to you question, accompanied by manual extract on that subject.

I am the last person to say that I all kown everything and I am eager to learn more.

That interview certainly makes a lasting impression. :laughing: But it’s under completely different circumstances. Here we have nice friendly technical forum, and there’s no reason whatsover to assume that someone is here to get you. Except maybe @anav, one needs to be careful around him, but even that’s just fun. :slight_smile:

Yeah Anav, as long he is on the other side of the ocean I feel safe. :wink:

I have no hard feeling towards Sindy…or in that case, Anav. I hope that is still mutual.

To me, there is a difference between “Are you saying?” and “So what you are saying is:”, but maybe that difference is not big enough to change the perception. Mental note made.

But we’ve gotten too far from the technical topic. The mechanism triggering the retry of the query using TCP does not depend on EDNS. EDNS is just a way to tell the server that the querier is ready to accept responses larger than the 512 bytes maximum specified by RFC 1035, by indicating a higher limit. The retry on TCP is always triggered by receiving a response with the TC (truncated) bit set.

Also, as @Sob wrote already, the UDP query being forwarded from unbound to a remote server is just another UDP packet to the Mikrotik, so if the whole chain works with Friztbox in place of Mikrotik, there must be a difference in handling of large UDP packets between Mikrotik and Frizbox, that’s why I concentrate on looking how to fix that difference rather than suggesting how to overcome it specifically for DNS traffic. Because it may affect also other traffic than the UDP one - as was the case where I’ve seen it before.

Thanks a lot for your replies!

Regarding the EDNS buffer size, this is already the default in unbound. I even set it as low as 512, and it’s still not working…
So would that mean there is actually a TCP problem here? Do I have to add a firewall rule?

I haven’t touched the MTU sizes on any of my devices or the MikroTik, so its all on 1500 default.

Sorry, I’m everything but a network expert, I actualy bought the MT to learn a bit about it :wink:

One of the great features of Mikrotik when it comes to learning about networking (but of course also for analysis of problems encountered) is that it can sniff the network traffic. So open a command line window (by pressing the [New Terminal] window in Winbox), make it as wide as your screen allows, run /tool sniffer quick ip-address=ip.of.dns.server port=53 in it, and make unbound send the problematic query to that server.

You will see immediately on the screen the UDP query packet “bubbling through the router”, and you should also see the UDP response packet bubbling back. If you can see a subsequent TCP packet from unbound to the server, it means that the retry on TCP has been triggered properly. If you can only see it at the interface facing the unbound but not at your WAN interface, it means that your firewall rules have prevented it from being delivered to the server.

If there is no TCP retry, you’ll have to set /tool sniffer set file-name=unbound.pcap, repeat the procedure above, download the file unbound.pcap, open it using Wireshark, and look into the DNS response for the value of the TC (truncated) flag.

So I did try the packet sniffer as you adviced, filtering for the unbound IP and port 53, and also on the pppoe interface, filtering for port 53.
In both cases I do see the UDP requests from unbound, going to the DNS root servers, trying all of them, but there never is any answer back… neither UDP nor TCP.

So is there actually a firewall rule preventing the answers? I only have the standard rules there…
I’m confused…

Can you see any responses if you send other queries, for which the responses are small enough?

If not, are your NAT rules correct, i.e. do the queries leave via the PPPoE interface with the IP address attached to that interface as their source one?

As usually, better post the export of the configuration as per the hint in my automatic signature below.

Yes, setting the DNS server in pi-hole to 1.1.1.1 e.g. there are UDP requests and resposes, that’s working without problem.


# jan/16/2022 12:25:33 by RouterOS 6.49.2
# software id = QFBI-B40J
#
# model = RB760iGS
# serial number = xxx
/interface bridge
add mtu=1500 name=LAN
/interface pppoe-client
add add-default-route=yes disabled=no interface=ether1 max-mru=1492 max-mtu=\
    1492 name=pppoe-out1 user=xxxx
/interface list
add name=listLAN
add name=WAN
/interface wireless security-profiles
set [ find default=yes ] supplicant-identity=MikroTik
/interface bridge port
add bridge=LAN interface=ether2
add bridge=LAN interface=ether3
add bridge=LAN interface=ether4
add bridge=LAN interface=ether5
add bridge=LAN interface=sfp1
/ip neighbor discovery-settings
set discover-interface-list=listLAN
/interface list member
add interface=LAN list=listLAN
add interface=LAN list=WAN
/ip address
add address=203.0.113.1/24 interface=LAN network=203.0.113.0
/ip dhcp-client
add interface=ether1
/ip dns
set servers=203.0.113.40
/ip firewall filter
add action=accept chain=input comment="accept established,related" \
    connection-state=established,related
add action=drop chain=input comment="drop invalid" connection-state=invalid
add action=accept chain=input comment="allow ICMP" in-interface=pppoe-out1 \
    protocol=icmp
add action=drop chain=input comment="block everything else" in-interface=\
    pppoe-out1
add action=fasttrack-connection chain=forward comment=\
    "fast-track for established,related" connection-state=established,related
add action=accept chain=forward comment="accept established,related" \
    connection-state=established,related
add action=drop chain=forward comment="drop invalid" connection-state=invalid
add action=drop chain=forward comment=\
    "drop access to clients behind NAT form WAN" connection-nat-state=!dstnat \
    connection-state=new in-interface=pppoe-out1
/ip firewall nat
add action=masquerade chain=srcnat out-interface=pppoe-out1
/ip service
set telnet disabled=yes
set ftp disabled=yes
set www disabled=yes
set api disabled=yes
set winbox address=203.0.113.0/24
/ip ssh
set strong-crypto=yes
/system clock
set time-zone-name=Europe/Berlin
/tool bandwidth-server
set enabled=no
/tool mac-server
set allowed-interface-list=listLAN
/tool mac-server mac-winbox
set allowed-interface-list=listLAN

And yes, my internal network is 203.0.113.0/24. I once set it that way for highest VPN compatibility :wink:

What is the size of the query packets that don’t get responded? A problem with PPPoE’s MTU is the last idea I’ve got for now. There may be something about the contents of the queries, but I cannot see why it should differ depending on whether the query is sent via Fritzbox or via Mikrotik. Mikrotik doesn’t redirect the queries, and they cannot evade the masquerade rule.

So if the responses don’t fit into the PPPoE MTU, it means that all the servers send larger ones than 512 bytes and do not respect the edns-buffer-size value.

And the difference between Fritzbox and Mikrotik may be in how they treat PPPoE. I’m not an expert here, look through the forum for people with more experience in this (MTU of PPPoE interface being lower than the 1500 commonly used on Ethernet interfaces, and ways to work that around).

Thanks a lot for your help so far.
I already tried different MTU/MRU/MRRU sizes for the pppoe interface without success.
I think I’m going to try it without pppoe and use a double NAT with my Fritzbox being another router in the network.
Right now the Fritzbox is only a modem, which requires some kind of a hacky workaround, so maybe thats the issue…

What I’d definitely try would be to sniff at the unbound host while Mikrotik is out of the path, using tcpdump or Wireshark diretctly, to visualize using Wireshark how the queries and responses look like, i.e. whether the edns-buffer-size field is indeed present in the queries and whether the responses come complete (and therefore fragmented) despite that.

Ok, so I just connected the MT directly to the Fritzbox without PPPoE and everything is working. There seems to be a problem with the PPPoE connection.
So it’s nothing with the MT. Guess I have to take a look into this or get another router.
Anyway, thanks for the assistance!