Second: experience goes that when looking at official test results (published on official product pages), the single number which represents real-life performance the best, is under “routing → 25 ip filter rules → 512 byte packet size” (with decently high error margin). For hEX S the number is 385Mbps and that number most of times requires fast-track active (or no firewalling as in OP’s case) . As I said, the number is probably +100% -50%, but the figure of 500Mbps (mentioned by @OP) is in the higher range already. And then @OP enabled queuing, which is mostly excluding fast-track. So I’d say that result, seen by @OP (400Mbps) is a pretty decent for this device.
If i use the bridge to bridge all ports and go to Switch > Ports and do some testings on INGRESS / EGRESS settings. If i set EGRESS to 270M, Download is about 256M, so this is perfect. But on the INGRESS when i set 270M i only get 12M on the client on Upload.
Is the switch chip not capable to handle the correct speed for ingress traffic?
/interface ethernet switch port
set 4 egress-rate=270.0Mbps ingress-rate=270.0Mbps
Under bridge ports the ports have the H (Hardware-Offload) Flag.
AFAIK ingress rate control can only work somehow smoothly if flow control is active (the alternative is to drop frames and packet loss absolutely kills TCP performance). However, flow control on MT devices is sometimes iffy …
The thing is: egress bandwidth limiting is always easy, one only needs Tx buffer and then it’s as simple as adding appropriate delay between transmissions of subsequent frames/packets. Ingress limiting is hard … the other party delivers frame whenever it sees fit and the only “smooth” way of telling the other party to slow down is flow control. Any actions that can be done solely on Rx side are inevitably brutal in certain conditions. And this really doesn’t depend on where exactly it’s implemented (switch chip vs. bridge vs. anywhere else).
For ingress, you’d want to use a queue that sends ECN, which the switch chip does not. fq_codel would do this in a simple queue. However, this would consume more CPU, than blunt dropping of switch chip. But that would likely get your speedtest results more even/higher, since your speedtest uses TCP which will slow down with packet loss.
I think you’re experiencing the “caching” effect. Do a search on the forums and you’ll find lots about this. v6 had caching due to the kernel supporting it. At some point caching was removed from the Linux kernel. V7 runs on a newer kernel therefore it doesn’t have caching. From what I’ve seen, if this is a real deal breaker for you, and you don’t use any V7 features, downgrade to latest V6 LTS and be happy.