CHR stops forwarding DHCP traffic

Dear all,

we’re evaluating using CHR 7.20.2 to handle the routing of our Wifi network traffic. Our network is as follows:

  • Proxmox 9.0.10 Host
  • Already tried on two different physical hosts, with very different configurations (first Intel Xeon now AMD Epyc)
  • Virtual interface for CHR, bridged to a 10Gbit Intel SFP+ interface on Proxmox
  • 32 VLANs (each a /512) to distribute the client in a round-robin fashion, with VRRP configured with our existing routing solution
  • The interface VLANs are created on the CHR over the single ether interface from the virtual interface provided by Proxmox
  • CHR has 48 cores and multiqueue is enabled in the virtual interface, with 24 queues
  • Cisco WLC 5500 controllers, using DHCP Select Feature from the Cisco WLC to handle client distribution among the VLANs
  • All client traffic is handled by the WLC (tunnelled)
  • Around 12k clients on the network, peaks at 15k
  • Double stack (IPv4/Ipv6) for the clients - all clients receive public IPs (no private networks)
  • DHCP is handled by a KEA Server in these VLANs, to which the Cisco WLC work as a relay (we do not use the Mikrotik relay, although it is enabled for the VLANs)

This setup works fine, no overload, packet loss or any issue.

But randomly (sometimes after 2 days, other times in less than 3 hours) the CHR seems to stops/drop forwarding DHCP offers to the clients.

Just by switch the routing to our existing solution (a Fortigate device) by enabling its interfaces on these VLANs "fixes" the issue - immediately clients start to receive IP again. If we then turn off again the interfaces on the Fortigate device (CHR reassumes the VLANs), CHR starts handling client traffic normally and will stay like that until a new "crash".

Any ideas on how could we debug these issues on the CHR? We tried looking for some extended logging or arp/mac related settings on CHR but found nothing that could help.