I have the following setup:
[ HOST1 192.168.10.30 ] — 192.168.10.0/24 — [ eth0 - RB750GL - eth2 ] — 192.168.30.0/24 — [ DNS_SERVER 192.168.30.250 ]
I noticed delays when initiating ssh connections from HOST1 when using DNS names so started doing some troubleshooting. What I’ve found is that DNS packets are going consistently missing - the same packet every time.
Packet capture on HOST1:
1: 00:19:44.728839 IP 192.168.10.30.57708 > 192.168.30.250.domain: 23374+ A? HOST2-VLAN70. (43)
2: 00:19:44.728865 IP 192.168.10.30.57708 > 192.168.30.250.domain: 11385+ AAAA? HOST2-VLAN70. (43)
3: 00:19:44.730292 IP 192.168.30.250.domain > 192.168.10.30.57708: 23374* 2/1/1 CNAME HOST2-VLAN70., A 192.168.70.90 (124)
4: 00:19:49.733644 IP 192.168.10.30.57708 > 192.168.30.250.domain: 23374+ A? HOST2-VLAN70. (43)
5: 00:19:49.735060 IP 192.168.30.250.domain > 192.168.10.30.57708: 23374* 2/1/1 CNAME HOST2-VLAN70., A 192.168.70.90 (124)
6: 00:19:49.735196 IP 192.168.10.30.57708 > 192.168.30.250.domain: 11385+ AAAA? HOST2-VLAN70. (43)
7: 00:19:49.736165 IP 192.168.30.250.domain > 192.168.10.30.57708: 11385* 1/1/0 CNAME HOST2-VLAN70. (120)
Packet capture on the RB750GL:
1: 00:19:44.733296 IP 192.168.10.30.57708 > 192.168.30.250.domain: 23374+ A? HOST2-VLAN70. (43)
2:
3: 00:19:44.733939 IP 192.168.30.250.domain > 192.168.10.30.57708: 23374* 2/1/1 CNAME HOST2-VLAN70., A 192.168.70.90 (124)
4: 00:19:49.738091 IP 192.168.10.30.57708 > 192.168.30.250.domain: 23374+ A? HOST2-VLAN70. (43)
5: 00:19:49.738930 IP 192.168.30.250.domain > 192.168.10.30.57708: 23374* 2/1/1 CNAME HOST2-VLAN70., A 192.168.70.90 (124)
6: 00:19:49.739722 IP 192.168.10.30.57708 > 192.168.30.250.domain: 11385+ AAAA? HOST2-VLAN70. (43)
7: 00:19:49.740008 IP 192.168.30.250.domain > 192.168.10.30.57708: 11385* 1/1/0 CNAME HOST2-VLAN70. (120)
You can see that the AAAA query doesn’t appear to arrive on the RB750GL.
Now, if I re-address the DNS server, putting it on the same VLAN as the server running the DNS query, to give this network setup:
[ HOST1 192.168.10.30 ] — 192.168.10.0/24 — [ DNS_SERVER 192.168.10.250 ]
And re-test:
Packet capture on HOST1:
1: 00:36:29.535517 IP 192.168.10.30.49950 > 192.168.10.250.domain: 16817+ A? HOST2-VLAN70. (43)
2: 00:36:29.535543 IP 192.168.10.30.49950 > 192.168.10.250.domain: 33342+ AAAA? HOST2-VLAN70. (43)
3: 00:36:29.536564 IP 192.168.10.250.domain > 192.168.10.30.49950: 33342* 1/1/0 CNAME HOST2-VLAN70. (120)
4: 00:36:29.536610 IP 192.168.10.250.domain > 192.168.10.30.49950: 16817* 2/1/1 CNAME HOST2-VLAN70., A 192.168.70.90 (124)
Packet capture on DNS Server:
1: 00:36:29.537829 IP 192.168.10.30.49950 > 192.168.10.250.domain: 16817+ A? HOST2-VLAN70. (43)
2: 00:36:29.537895 IP 192.168.10.30.49950 > 192.168.10.250.domain: 33342+ AAAA? HOST2-VLAN70. (43)
3: 00:36:29.538501 IP 192.168.10.250.domain > 192.168.10.30.49950: 33342* 1/1/0 CNAME HOST2-VLAN70. (120)
4: 00:36:29.538546 IP 192.168.10.250.domain > 192.168.10.30.49950: 16817* 2/1/1 CNAME HOST2-VLAN70., A 192.168.70.90 (124)
In my eyes, this points the finger at the RB750GL losing the packet.
My next step is to put a hub between HOST1 and the RB750GL and packet sniff there.
Whilst I’m prepping that, I was wondering if anyone else had any suggestions for tracking this fault down?
For reference, HOST1 is a physical server running CentOS 6 (fully patched), and the DNS server is a VM on VMware ESXi 5, with the same O/S.