Yup, we are aware of this, and I would LOVE that they would add more VPLS/MPLS offloading capabilities (RFC2544 fails on VPLS over CPU). But in our case, we use LDP/MPLS solely for loopback VPLS transport capabilities, it is filtered for loopback addresses only, so no customer or backbone traffic runs over MPLS on our network. In the extensive testing we’ve done trying to figure out what is causing this issue, we even completely disabled LDP/MPLS with absolutely no change in the outcome. Like I mentioned before, offloading does what it is expected to do (drops cpu usage over to the ASIC), but breaks routes in the system.
One curious detail we noticed, is that each time l3hw offloading is enabled, it always breaks different routes. With each ON/OFF cycle, it affects a different group of customers that becomes unreachable. It would appear that when the routing table gets sent to the ASIC, some of the data is incorrect or corrupted. I’m just speculating here. But as I mentioned before, as long as everything stays in the CPU, no issues at all, but CPU usage is high and we cannot keep it like this long term, much less scale for growth.