Having some concerns with iperf testing and cpu usage

Hello,
I have been testing out a configuration for CRS326-24S+2Q+RM (4 of them), when I do testing from a VM running on XCP-NG to the firewall I only get ~2Gbps results, and from a VM to VM on two separate hosts I get ~5Gbps. A little more concerning is that CPU usage on these four switches, is ranging from 14%-58%. I am under the impression if hardware offloading is being used there shouldn’t be hardly any CPU usage. I have submitted a diagram showing the logical network setup, all 4 configs of the respective switches, and a screenshot of the iperf testing. A review of my setup would be much appreicated to identity where I am messing this up. Thanks in advance.
Test-Switch-1.rsc (9.61 KB)
iperf-test.png
Test-Switch-4.rsc (7.61 KB)
Test-Switch-3.rsc (7.61 KB)
Test-Switch-2.rsc (9.24 KB)
Network.png

I got the CPU usage down to 3%-35% by disabling IGMP on the Bridges. Determined the ~5Gbps speeds are the best I can get using VIFs for the VMs running on XCP-NG, I moved the VMs to PIFs and saw ~8Gbps speeds. I think I have this pretty figured out now. Thanks!