In our deployments, we are experiencing significant DHCP issues. These usually occur after RADIUS outages; we can’t be entirely sure, but the common symptom is that the DHCP server function of our MikroTiks ceases to work as intended. The most common method to initiate communication between the device and the MikroTik is using the IP scan tool to send a BOOTP via DHCP to all possible clients off of the ether5 interface.
We’ve attempted to simulate RADIUS outages by placing a firewall between a test controller and the RADIUS server to filter RADIUS traffic, and we’ve confirmed with our third-party group who supports our controllers that we are filtering all traffic between their equipment and the RADIUS server. Sadly this hasn’t replicated the problem in our lab, and we cannot work toward creating a solution without figuring out what the exact cause is. Below is a summary of some of our efforts thus far:
Checked the provided PCAP for BOOTP and saw this single packet that is sent by the router to synchronize clients:
1550 10.779919 255.255.255.255 255.255.255.255 BOOTP 288 Boot Request from b8:69:f4:a5:8b:38 (Routerbo_a5:8b:38)
(((See attached bootp.jpg image)))
We accessed the test AP and verified this packet appears on both the Ethernet WAN interface (eth0) and the WLAN interface where you had clients (wlan32)
tcpdump -i eth0 -vvv | grep -i bootp
255.255.255.255.68 > 255.255.255.255.67: [udp sum ok] BOOTP/DHCP, Request from b8:69:f4:9a:70:8c, length 246, xid 0x8, Flags [none] (0x0000)
tcpdump -i wlan32 -vvv | grep -i bootp
255.255.255.255.68 > 255.255.255.255.67: [udp sum ok] BOOTP/DHCP, Request from b8:69:f4:9a:70:8c, length 246, xid 0x8, Flags [none] (0x0000)
Checked the PCAP taken on the routers ethernet 5 interface for a client that was not getting a valid IP but reporting the APIPA 169.x IP and we see the DHCP discovery packets on the correct VLAN 227 but no replies from the router or DHCP server:
(((See Attached DHCP.jpg and Tell0000.png image)))
Based on the packet capture analysis, we need to focus on digging deep on MikroTik DHCP function.
We currently don’t see any external factors heavily contributing to this issue.
Capture has traffic on two VLAN’s: 50 and 108 from two clients concurrently.
• Client on VLAN 50 – struggling to get an IP address from respective DHCP-50/VLAN-50/Unauthenticated pool
o MikroTik DHCP server app didn’t receive the discovery packet
o MikroTik DHCP server app received the discovery packet but dropped
o MikroTik DHCP server app received the discovery packet, but failed to send back with proper VLAN TAG/50
o MikroTik Transmit inactive state
• Client on VLAN 108 – exchanging bidirectional traffic without any issue, while Client on VLAN-50 struggling with DHCP discovery retransmits.
Moreover, it makes some correlation with magic for resolving the issue using IP-SCAN or reboot of MikroTik, probably bringing that stranded logical interface or DHCP Server process.
Below is a snapshot of what we believe is happening.
(((See attached simu.png image)))
Attached are the packet captures of two separate devices captured off of their ether5 interfaces respectively.
Below is our current lab setup:
The typical MikroTik deployment consists of a MikroTik acting as gateway for a WattBox PDU, Ruckus Access Point, and client devices, and it is usually bridged to a modem; which receives DHCP from a node. The Access Points are managed via a vSZ (virtual SmartZone) controller; which send their configurations, and handle a lot of the RADIUS process.
(((See attached lab.png image)))
Example.rsc (265 KB)