DNS Packets going Missing

I have the following setup:

[ HOST1 192.168.10.30 ] — 192.168.10.0/24 — [ eth0 - RB750GL - eth2 ] — 192.168.30.0/24 — [ DNS_SERVER 192.168.30.250 ]

I noticed delays when initiating ssh connections from HOST1 when using DNS names so started doing some troubleshooting. What I’ve found is that DNS packets are going consistently missing - the same packet every time.

Packet capture on HOST1:

1: 00:19:44.728839 IP 192.168.10.30.57708 > 192.168.30.250.domain: 23374+ A? HOST2-VLAN70. (43)
2: 00:19:44.728865 IP 192.168.10.30.57708 > 192.168.30.250.domain: 11385+ AAAA? HOST2-VLAN70. (43)
3: 00:19:44.730292 IP 192.168.30.250.domain > 192.168.10.30.57708: 23374* 2/1/1 CNAME HOST2-VLAN70., A 192.168.70.90 (124)
4: 00:19:49.733644 IP 192.168.10.30.57708 > 192.168.30.250.domain: 23374+ A? HOST2-VLAN70. (43)
5: 00:19:49.735060 IP 192.168.30.250.domain > 192.168.10.30.57708: 23374* 2/1/1 CNAME HOST2-VLAN70., A 192.168.70.90 (124)
6: 00:19:49.735196 IP 192.168.10.30.57708 > 192.168.30.250.domain: 11385+ AAAA? HOST2-VLAN70. (43)
7: 00:19:49.736165 IP 192.168.30.250.domain > 192.168.10.30.57708: 11385* 1/1/0 CNAME HOST2-VLAN70. (120)

Packet capture on the RB750GL:

1: 00:19:44.733296 IP 192.168.10.30.57708 > 192.168.30.250.domain: 23374+ A? HOST2-VLAN70. (43)
2:
3: 00:19:44.733939 IP 192.168.30.250.domain > 192.168.10.30.57708: 23374* 2/1/1 CNAME HOST2-VLAN70., A 192.168.70.90 (124)
4: 00:19:49.738091 IP 192.168.10.30.57708 > 192.168.30.250.domain: 23374+ A? HOST2-VLAN70. (43)
5: 00:19:49.738930 IP 192.168.30.250.domain > 192.168.10.30.57708: 23374* 2/1/1 CNAME HOST2-VLAN70., A 192.168.70.90 (124)
6: 00:19:49.739722 IP 192.168.10.30.57708 > 192.168.30.250.domain: 11385+ AAAA? HOST2-VLAN70. (43)
7: 00:19:49.740008 IP 192.168.30.250.domain > 192.168.10.30.57708: 11385* 1/1/0 CNAME HOST2-VLAN70. (120)

You can see that the AAAA query doesn’t appear to arrive on the RB750GL.

Now, if I re-address the DNS server, putting it on the same VLAN as the server running the DNS query, to give this network setup:

[ HOST1 192.168.10.30 ] — 192.168.10.0/24 — [ DNS_SERVER 192.168.10.250 ]

And re-test:

Packet capture on HOST1:

1: 00:36:29.535517 IP 192.168.10.30.49950 > 192.168.10.250.domain: 16817+ A? HOST2-VLAN70. (43)
2: 00:36:29.535543 IP 192.168.10.30.49950 > 192.168.10.250.domain: 33342+ AAAA? HOST2-VLAN70. (43)
3: 00:36:29.536564 IP 192.168.10.250.domain > 192.168.10.30.49950: 33342* 1/1/0 CNAME HOST2-VLAN70. (120)
4: 00:36:29.536610 IP 192.168.10.250.domain > 192.168.10.30.49950: 16817* 2/1/1 CNAME HOST2-VLAN70., A 192.168.70.90 (124)

Packet capture on DNS Server:

1: 00:36:29.537829 IP 192.168.10.30.49950 > 192.168.10.250.domain: 16817+ A? HOST2-VLAN70. (43)
2: 00:36:29.537895 IP 192.168.10.30.49950 > 192.168.10.250.domain: 33342+ AAAA? HOST2-VLAN70. (43)
3: 00:36:29.538501 IP 192.168.10.250.domain > 192.168.10.30.49950: 33342* 1/1/0 CNAME HOST2-VLAN70. (120)
4: 00:36:29.538546 IP 192.168.10.250.domain > 192.168.10.30.49950: 16817* 2/1/1 CNAME HOST2-VLAN70., A 192.168.70.90 (124)

In my eyes, this points the finger at the RB750GL losing the packet.

My next step is to put a hub between HOST1 and the RB750GL and packet sniff there.

Whilst I’m prepping that, I was wondering if anyone else had any suggestions for tracking this fault down?

For reference, HOST1 is a physical server running CentOS 6 (fully patched), and the DNS server is a VM on VMware ESXi 5, with the same O/S.

Interesting. I have some DNS timeout in my setup with the RB as DNS server/cache when some of the queues are loaded, despite a low cpu usage.

I’ll be following your thread to see if it gives me some pointers to solve my issue.

OK, I’ve now tested with a hub in between HOST1 and the RB750GL:

[ HOST1 192.168.10.30 ]
|
[ HUB ] — 192.168.10.0/24 — [ eth0 - RB750GL - eth2 ] — 192.168.30.0/24 — [ DNS_SERVER 192.168.30.250 ]
|
[ Laptop + Wireshark ]

Same results:

On the laptop:

1: 20:45:12.354998 IP 192.168.10.30.37122 > 192.168.30.250.domain: 25419+ A? HOST2. (43)
2: 20:45:12.355001 IP 192.168.10.30.37122 > 192.168.30.250.domain: 40043+ AAAA? HOST2. (43)
3: 20:45:12.356828 IP 192.168.30.250.domain > 192.168.10.30.37122: 25419* 2/1/1 CNAME HOST2-VLAN70., A 192.168.70.90 (124)
4: 20:45:17.359059 IP 192.168.10.30.37122 > 192.168.30.250.domain: 25419+ A? HOST2. (43)
5: 20:45:17.360617 IP 192.168.30.250.domain > 192.168.10.30.37122: 25419* 2/1/1 CNAME HOST2-VLAN70., A 192.168.70.90 (124)
6: 20:45:17.360976 IP 192.168.10.30.37122 > 192.168.30.250.domain: 40043+ AAAA? HOST2. (43)
7: 20:45:17.362011 IP 192.168.30.250.domain > 192.168.10.30.37122: 40043* 1/1/0 CNAME HOST2-VLAN70. (120)
8: 20:45:17.492407 IP 192.168.10.30.40009 > 192.168.30.250.domain: 8150+ PTR? 90.70.168.192.in-addr.arpa. (44)
9: 20:45:17.493323 IP 192.168.30.250.domain > 192.168.10.30.40009: 8150* 1/1/1 PTR HOST2-VLAN70. (123)

On the RB750GL:

1: 20:45:15.550704 IP 192.168.10.30.37122 > 192.168.30.250.domain: 25419+ A? HOST2. (43)
2:
3: 20:45:15.551537 IP 192.168.30.250.domain > 192.168.10.30.37122: 25419* 2/1/1 CNAME HOST2-VLAN70., A 192.168.70.90 (124)
4: 20:45:20.554938 IP 192.168.10.30.37122 > 192.168.30.250.domain: 25419+ A? HOST2. (43)
5: 20:45:20.555782 IP 192.168.30.250.domain > 192.168.10.30.37122: 25419* 2/1/1 CNAME HOST2-VLAN70., A 192.168.70.90 (124)
6: 20:45:20.556777 IP 192.168.10.30.37122 > 192.168.30.250.domain: 40043+ AAAA? HOST2. (43)
7: 20:45:20.557183 IP 192.168.30.250.domain > 192.168.10.30.37122: 40043* 1/1/0 CNAME HOST2-VLAN70. (120)
8: 20:45:20.687810 IP 192.168.10.30.40009 > 192.168.30.250.domain: 8150+ PTR? 90.70.168.192.in-addr.arpa. (44)
9: 20:45:20.688453 IP 192.168.30.250.domain > 192.168.10.30.40009: 8150* 1/1/1 PTR HOST2-VLAN70. (123)

And here’s a full dump of one of the packets that goes missing:

20:45:12.355001 3c:4a:92:74:27:52 > 00:0c:42:c4:8e:d3, ethertype IPv4 (0x0800), length 85: 192.168.10.30.37122 > 192.168.30.250.domain: 40043+ AAAA? clementine.liakakos.me.uk. (43)
        0x0000:  000c 42c4 8ed3 3c4a 9274 2752 0800 4500  ..B...<J.t'R..E.
        0x0010:  0047 fba0 4000 4011 949c c0a8 0a1e c0a8  .G..@.@.........
        0x0020:  1efa 9102 0035 0033 6cd3 9c6b 0100 0001  .....5.3l..k....
        0x0030:  0000 0000 0000 0a63 6c65 6d65 6e74 696e  .......clementin
        0x0040:  6508 6c69 616b 616b 6f73 026d 6502 756b  e.liakakos.me.uk
        0x0050:  0000 1c00 01

So, the million dollar question, what is the RB750GL doing with this packet?

The problem appears resolved with RouterOS 6.0rc7 - the AAAA requests are making it all the way to the DNS server and back again. :smiley: