Simple queues and core router

Hello ! I have Intel® Core™ i7-870 Processor (8M Cache, 2.93 GHz) and Intel dual port server network adapter (82576 chipset) on Intel motherboard with Mikrotik 5.0rc5 installed.RPS is turned ON and network adapter use all cores for own queues. There are 4000 simple queues, traffic ~400mbit, NAT for 3500 users, BGP (default route from two uplink and~3500 routes without full view). The problem is that simple queues load the router very high. Go to the PCQ is unrealistic (even with using mask), since the ~1000 users have several IP Address (such as 172.16.0.100 and 172.16.1.122). What can I do? May be using two XEON X-core processor solve the problem (I don’t know how well mikrotik balancing simple queues on different cores) ?

Redesign your network so you can use PCQ.

Scaling to 4,000 simple queues is ridiculous. Feasible, maybe, but the administrative overhead is no worse than a network redesign and doing things right in the first place .

only PCQ will help you, and use different router for NAT and BGP with routing