Netwatch DNS Probe - Attempt to Replace ICMP

Scenario:
OP has a primary adguard DNS server. OP has a backup PI server.
He wishes as elegantly as possible ( aka hands-off ) approach, that detects when the adguard DNS server is not working and he wishes to switch the responsibility of DNS to a backup PI server.
There are multiple private VLANs at play but only one WAN.

Facts:
AdguardServer 10.20.30.30
PiServer 10.10.10.5

Approach: Attempt to apply netwatch DNS probe to accomplish the goal
Obstacle: We use DSTNAT to force users to Adguard or if applicable to Pi, and since there is no such thing as distance in dstnat rules, how are we going to select the correct Server to force users through??

ex.
dst-port=53 in-interface-list=LAN src-address-list=!excluded to-address=10.20.30.30
dst-port=53 in-interface-list=LAN src-address-list=!excluded to-address=10.10.10.5

Where the excluded list includes devices that should not be forced out adg/pi for DNS.

Attempt

  1. We modify the DSTNAT rules to include address-lists that will be used for "activating" the rules later as follows:

dst-port=53 in-interface-list=LAN src-address-list=!excluded to-address=10.20.30.30
dst-address-list=Ad-UP
dst-port=53 in-interface-list=LAN src-address-list=!excluded to-address=10.10.10.5
dst-address-list=Pi-UP

The idea being that we create firewall address list entries, to the EMPTY address-lists shown above, in system scripts that are called on by Netwatch (up or down). The scripts are executed based on the netwatch DNS probe results. There are no entries in these dynamic lists, so by default, both dstnat rules would not be utilized/traffic captured the router. The conditions for the rule to be applied are not met.

Note: This method of naming or defining address-lists within rules, means that they will not show up on the applicable firewall table, showing the current lists. However, when you create a new entry and pull down the list rown, both of these lists will be available as an option. Works the same if you make entries via winbox or CLI.

  1. Next we setup the DNS probe methodology. Instead of an external canary reliance ( like ICMP ) the idea is to use a local "all the time available" interface, similar to the work around we use for wireguard:

a. we use the Lo interface to use as an 'always available' interface gateway that a DNS will always respond to positively if the Adguard DNS server is working. Thus we create a static DNS.

/ip dns static
add address=127.0.0.1 name=checkadguard.net ttl=0.1 type=A

b. we use Netwatch DNS probe to check the host domain address that points to the lo-interface.

/tool netwatch
add  dns-server=10.20.30.30  down-script=system script run Switch2Pi  \
   host=checkadguard.net name=Verify-UP record-type=A type=dns   \
   interval=4.0s timeout=1s up-script=system script run AllGood

Any positive resolved response is what we want and this tells us if the Adguard device is Functional. If the router could not resolve the lo-interface via the domain name 'checkadguard.net' , then it can be considered 'down'

c. Finally, the scripts come into play. If the adguard server is operational ( DNS probe gets a response ) then the script adds 0.0.0.0/0 to the firewall address list: Ad-UP on the destination nat rule for forcing users to the Adguard server, and as the conditions are met for the rule, the rule becomes active. to the adguard server and the conditions are met and the dst-nat rule is active. The system script executed, that adds 0.0.0.0/0 to the address list, includes a time-expiry of 5 secs. To ensure the continuous valid indication of adguard availability, is the reason we adjusted the default Netwatch interval time from 10s to every 4 sec.

The opposite occurs on the down side of the Netwatch Rule. If there is no response from the Adguard Server, on the DNS probe, then the other script is called and executed, which adds 0.0.0.0/0 to the Pi-UP firewall address list and times out after 5 secs. (also covered by the DNS probe occurring every 4 seconds.

Thus Every four seconds, a firewall address list entry is made in one of the pair of destination nat rules (udp/tcp) ensuring that associated rule is active and being used. If Adguard goes down, then there is a one second gap in availability, which seems feasible and the same gap should occur when Adguard comes back on line. We are not expecting flapping of the server, but a failure of equipment of some sort and not frequently.

/system script
add dont-require-permissions=yes name=AllGood owner=OP  \
   policy=read,write source="do={/ip firewall address-list add list=Ad-UP \
   address=0.0.0.0/0 timeout=5s}"

And the other

add dont-require-permissions=yes name=Switch2Pi owner=OP \
   policy=read,write source="do={/ip firewall address-list add list=Pi-UP \
   address=0.0.0.0/0 timeout=5s}"

+++++++++++++++++++

Good idea, bad idea?
Where have I got it wrong?
Where can it be improved?

Recent EDITS:
A. The IP DNS static entry of TTL we determined should be zero because we want a fresh look each time we run netwatch, and didnt want the router to cache results. However the netwatch status, stayed hard down with this setting so we switched it to 0.1. That works.

B. Added interval=4s for Netwatch and a timeout of 1s

C. System script didnt work at all, so trying next to remove all permissions other than read/write.

I'll check more when I have time.
But isn't it easier to provide both DNS servers in the DHCP server, so if one fails, the other can be used?

When its convenient for you of course.
Yes, I had not thought of two DNS servers on dhcp server network settings, BUT, the stipulation is to ONLY use adguard DNS server ( full time ) and on the very rare occasion of failure, switch to the PI server. To put it another way its not load balancing the DNS that is required :wink:

I want to ensure this approach is valid, and if so, to get it working. Then explore any other more elegant solutions.

But wouldn't it be simpler, in the meantime,
to have the DNS resolve by the RouterBOARD with fixed firewall/nat rules that don't need to be changed,
but simply change the DNS in ip/DNS when adguard doesn't work?

And you also block DoH / DoT / DoQ?
NATting only standard DNS do not solve anything.
For default most software use hardcoded DoH or DoQ that you can not devy, but only block...

Of course, if you have control of the device you need to block, everything is simpler, by changing the program settings...

Okay, in your 'simpler proposal'

how do I simply detect when adguard doesnt work
how then do I simply switch to a second pi server

You have captured the essence of the thread for sure.
No need at this time to block DoH, DoT, DoQ.
Similarly, the dstnat rules could be considered overkill, so if you want to remove that from the equation in your simple solution, by all means.
I just dont see how it will be accomplished.

Simply use Netwatch:

/tool netwatch
add disabled=no \
    name=test_adguard \
    interval=10s \
    type=dns \
    dns-server=10.20.30.30 \
    record-type=A \
    host=www.example.com \
    timeout=1s \
    up-script="/ip dns set servers=10.20.30.30" \
    down-script="/ip dns set servers=10.10.10.5"

This test AdGuard [10.20.30.30] every 10s if www.example.com is resolved.
If not use PiServer [10.10.10.5] instead.
When AdGuard resolve again www.example.com the AdGuard is used again.

Sounds easy but what does one do with the rest of the IP DNS setup.
a. dhcp network-servers for all the vlans? dns=10.20.30.30,10.10.10.5 ?????
b. are we allowing remote requests
c. what about the initial resolve both the pi and adguard server need to reach their encrypted links etc.
aka still need a "normal dns" (like 1.1.1.1)

Devils in the details !!

a. you can't fail-over the DHCP-server settings as changes are ignored by clients until they refresh their leases. Which can be days ...
b. yes, in this case we should
c. pi and adguard don't use router as their resolver so they should not be affected by the script by @rextended

a. No, I meant that to have them both available on dhcp server-network settings, so they are both available to the user, Im assuming that the IP DNS settings you have setup set server= ( depending upon netwatch results) will force which of the two will get used ( Ip dns server set x.y.z ??? )

b. okay

c. Not my understanding, and probably incorrect, but I though they need a first resolve from an unencrypted dns address.

For example with DOH: Only one DoH server is supported.
Note that you need at least one regular DNS server configured for the router to resolve the DoH hostname itself.

Yes in adguad home you stipulate the dns server adguard should use like 1.1.1.1
How do we ensure that the adguard server can reach 1.1.1.1

By IP address:
a. not looped back to itself
b. not forced into dstnat rule that forces users to the adguard server (excluded) if we have such a rule.
c.??????

In other words, I have no clue how the adguard server will reach any specific dns server,
IS IT SUFFICIENT THAT adguard server SIMPLY HAS INTERNET ACCESS ( lan to wan )???

Yes, obviously, but at the same time is obviously not from WAN, like on DEFAULT firewall rules...

AdGuard and PiServer do not use DHCP since have fixed IPs and fixed DNSs inside, like 1.1.1.1 and 8.8.8.8

no, the DNS on the DHCP Network is the IP of the RouterBOARD...


Assuming the RouterBOARD IP is 192.168.88.1

/ip firewall address-list
add address=192.168.88.1 list=skip_DNS_redirect
add address=10.20.30.30 list=skip_DNS_redirect
add address=10.10.10.5 list=skip_DNS_redirect
add address=192.168.88.1 list=allowed_direct_DNS
add address=10.20.30.30 list=allowed_direct_DNS
add address=10.10.10.5 list=allowed_direct_DNS

/ip firewall nat
add chain=dstnat protocol=tcp dst-port=53 action=dst-nat to-addresses=192.168.88.1 in-interface-list=LAN \
    src-address-list=!skip_DNS_redirect dst-address-list=!allowed_direct_DNS
add chain=dstnat protocol=udp dst-port=53 action=dst-nat to-addresses=192.168.88.1 in-interface-list=LAN \
    src-address-list=!skip_DNS_redirect dst-address-list=!allowed_direct_DNS

skip_DNS_redirect = for those IPs redirect do not happen

allowed_direct_DNS = any device can use directly this IPs as DNS, without redirection

Typically we have use excluded list to identify the Adguard IP and any other IPs that should not be redirected. You have added allowed direct-DNS, I am assuming they would be other private LANIPs that should not be redirected either??? How will they access DNS?

When you say 192.168.88.1 is the routerboard IP................. why is this there, or why important? Dont we want the TO ADDRESS to be the ip address of the adguard server???????????

So on dhcp sever network settings we dont want any dhcp servers from the router identified??

/ip dhcp-server network
add address=192.168.0.0/24 comment=vlan10 gateway=192.168.0.0
add address=192.168.1.0/24 comment=vlan11 gateway=192.168.1.0
add address=192.168.2.0/24 comment=vlan20 gateway=192.168.2.0
add address=192.168.3.0/24 comment=vlan30 gateway=192.168.3.0

If thats the case, then why do we want to allow remote requests??

+++++++++++++++++++++++
Further, you cannot be trivial with DNS, in fact according to docs, since we are NOT using dynamic servers but setting fixed servers............. we should do this///////////

/ip dhcp-server network
add address=192.168.0.0/24 comment=vlan10 gateway=192.168.0.0 dns-server=none
add address=192.168.1.0/24 comment=vlan11 gateway=192.168.1.0 dns-server=none
add address=192.168.2.0/24 comment=vlan20 gateway=192.168.2.0 dns-server=none
add address=192.168.3.0/24 comment=vlan30 gateway=192.168.3.0 dns-server=none

So again I ask how will the adguard server get its own dns resolved, as I see yes, they have a spot to identify dns addresses in the program. .........I am thinking that is okay because we allow LAN to WAN traffic ??

Now how do any users, on the allowed direct DNS list, actually access any DNS???

Do NOT access other DNS than the RouterBOARD DNS service. (DoH, DoT, DoQ are not influenced)
Any standard DNS request that is NOT directed to allowed_direct_DNS is redirected to RouterBOARD (except for the device with IP in skip_DNS_redirect)


Becaus IS THE ROUTERBOARD the DNS server for all, except the exclusion list

But wouldn't it be simpler, in the meantime,
to have the DNS resolved by the RouterBOARD with fixed firewall/nat rules that don't need to be changed,
but simply change the DNS in ip/DNS when adguard doesn't work?


In the same way that do in a default config that use RouterBOARD as DNS proxy: use customized DNS inside the device config...

Is also possible to customize DNS (also NTP and other options) for each single MAC, until respect the DNSs provided by DHCP server:

/ip dhcp-server lease
add address=192.168.0.7 lease-time=1h mac-address=00:0C:29:DE:AD:BE
add address=192.168.0.8 lease-time=1h mac-address=00:0C:29:DE:AD:EF
/ip dhcp-server network
add address=192.168.0.7/32 dns-server=9.9.9.9,8.8.4.4 gateway=192.168.0.1 netmask=24
add address=192.168.0.8/32 dns-server=8.8.8.8,1.0.0.1 gateway=192.168.0.1 netmask=24

Okay, I will stop fighting the current LOL. But I still dont understand where the heck did 192.168.88.1 come from? Are you saying I create a faux subnet, a faux address, why not use 127.0.0.1 then???

So what you are saying is Normal default setup, ensuring the following

/ip dhcp-server network
add address=192.168.0.0/24 comment=vlan10 gateway=192.168.0.0 dns-server=192.168.0.1
add address=192.168.1.0/24 comment=vlan11 gateway=192.168.1.0 dns-server=192.168.1.1
add address=192.168.2.0/24 comment=vlan20 gateway=192.168.2.0 dns-server=192.168.2.1
add address=192.168.3.0/24 comment=vlan30 gateway=192.168.3.0 dns-server=192.168.3.1

/ip dns
add allow-remote requests set-servers=AdguardServerIP,PiServerIP

/ip firewall address-list
add address=127.0.0.1  list=skip_DNS_redirect
add address=10.20.30.30 list=skip_DNS_redirect
add address=10.10.10.5 list=skip_DNS_redirect
add address=127.0.0.1 list=allowed_direct_DNS
add address=10.20.30.30 list=allowed_direct_DNS
add address=10.10.10.5 list=allowed_direct_DNS

/ip firewall nat
add chain=dstnat protocol=tcp dst-port=53 action=dst-nat to-addresses=127.0.0.1  \
    src-address-list=!skip_DNS_redirect dst-address-list=!allowed_direct_DNS
add chain=dstnat protocol=udp dst-port=53 action=dst-nat to-addresses=127.0.0.1 \
    src-address-list=!skip_DNS_redirect dst-address-list=!allowed_direct_DNS

makes no sense to me at all......
in fact more confused than before.
No idea how the above will work at all with your netwatch rule.

Getting nowhere fast. I am wasting your time as I have no clue how dns works.

Since if is the default IP for Router, you must use the IP used on the router... reachable from the other devices...
why 127.0.0.1???...

/ip firewall address-list
add address=192.168.0.1 list=skip_DNS_redirect
add address=192.168.1.1 list=skip_DNS_redirect
add address=192.168.2.1 list=skip_DNS_redirect
add address=192.168.3.1 list=skip_DNS_redirect
add address=10.20.30.30 list=skip_DNS_redirect
add address=10.10.10.5 list=skip_DNS_redirect
add address=192.168.0.1 list=allowed_direct_DNS
add address=192.168.1.1 list=allowed_direct_DNS
add address=192.168.2.1 list=allowed_direct_DNS
add address=192.168.3.1 list=allowed_direct_DNS
add address=10.20.30.30 list=allowed_direct_DNS
add address=10.10.10.5 list=allowed_direct_DNS

/ip firewall nat
add chain=dstnat protocol=tcp dst-port=53 action=dst-nat to-addresses=192.168.0.1 in-interface-list=LAN \
    src-address-list=!skip_DNS_redirect dst-address-list=!allowed_direct_DNS
add chain=dstnat protocol=udp dst-port=53 action=dst-nat to-addresses=192.168.0.1 in-interface-list=LAN \
    src-address-list=!skip_DNS_redirect dst-address-list=!allowed_direct_DNS

You annoy anyone who asks questions without specifying the details (like IPs, VLAN, and the schema...),
and you're the first to ask questions without specifying the parameters!!! :rofl: :rofl: :rofl:

Simply the netwatch rule change EXCLUSIVELY the IP used by RouterBOARD,
because the RouterBOARD act as "DNS proxy" and is not needed change DNS everywhere when AdGuard do not work...
and the firewall rules force all device (that already do not use RouterBOARD IPs or AdGuard IP) to use RouterBOARD IPs...

Hahahaha,
Whats annoying is that you forgot to specify in-interface-list=LAN on your dstnat rules ( inviting the whole world to my dns server ) :wink:

/ip firewall nat
add chain=dstnat protocol=tcp dst-port=53 action=dst-nat to-addresses=192.168.0.1  \
    src-address-list=!skip_DNS_redirect dst-address-list=!allowed_direct_DNS \
    in-interface-list=LAN
add chain=dstnat protocol=udp dst-port=53 action=dst-nat to-addresses=192.168.0.1 \
    src-address-list=!skip_DNS_redirect dst-address-list=!allowed_direct_DNS
    in-interface-list=LAN

So back to life, back to reality......... all users except those on the address lists, that are:
a. any user on local LAN ( aka by source address list=! )
b. any dst address ( but not the ones on the list by dst-address-list=! )

When directed to port 53, shall go to 192.168.0.1 a routerboard address!
Got that...

Now since we have

/ip dns
add allow-remote requests

AND netwatch which is providing a static Server for the router.

/tool netwatch
add disabled=no \
    name=test_adguard \
    interval=10s \
    type=dns \
    dns-server=10.20.30.30 \
    record-type=A \
    host="adguard.example.com" \
    timeout=1s \
    up-script="/ip dns set servers=10.20.30.30" \
    down-script="/ip dns set servers=10.10.10.5"

Means that every user ( other than those on lists ) will be directed to the routerBoard and will use one of the two STATIC provided servers, depending upon UP or DOWN status ( static DNS has priority over dynamic for example ).

So in effect, the netwatch provides us with continuous DNS service, checking every 10 seconds and resulting in:
If UP status

/ip dns
add allow-remote-requests=yes  set-server=adguard IP

and if down

add allow-remote-requests=yes  set-server=pi IP

PS. Could have saved much grief if instead of using 192.168.88.1 , state =any subnet gateway IP , as an address to force all DNS traffic to, a local subnet that already exists, and thus use routerboard DNS functionality and control which static server is available for the routerboard.

Does this mean we also then allow DNS like normal on the Input chain to all of LAN??