Traffic disappering on CCR1036-12G-4S

Last week one of our clients (he is ISP, too) started to complain about the quality of his Internet connection.
He showed me traffic graph from his outgoing interface with holes in the graph. Like that:
holes_in_traffic.png
He is sending to us ca. 200Mbit/s download traffic and ca. 30Mbit/s upload traffic - so it is impossible to have such brakes in such big traffic.
He is buying from us 300/300 Mb/s link.
He is complaining that websites are loading very slow, and he cannot reach from a single pc speedtst better than ca. 30Mb/s - connected directly to his router.
Our routers are connected by fibre link (through a different provider, because the distance between us is about 400 kilometres) - fibre connected directly to router’s SFP ports.
As I tested, while those holes occur - no ping is lost - either 50 or 1400 bytes ping.
bandwith_test1.png
Moreover, I performed bandwidth test to this btest.planetcoop.com public server from his router (through mine CCR1036) - and the test was going fine, even if his traffic got lost at the moment.
no_lost_pings.png
I was almost about to tell him “go fix your network”, but:
I saw that on mine outgoing interface there are similar brakes (but I don’t observe any problem because of that).
Also when I performed some bandwidth tests between 2 of my routers that need to pass through CCR1036 - holes in traffic were much more often.
So I am blaming this CCR1036 for all of this problems. It’s our border router with BGP from provider, also this router maintains 2 public /24 networks with plain DHCP server - no PPPoE clients.
Right now it has 28 firewall rules, 92 mangle rules, and queue tree with 82 rules - so not so much for this kind of hardware.
Soft is newest bugfix release 6.39.3 with firmware upgraded.
The average load is 5-13% CPU.
I tried to disable whole mangling, queues, and even firewall for a while, connection tracking also.
There was no improvement. But I saw that all the time one cpu is highly loaded while others are idle. It goes like, let’s say now cpu23 is 95% loaded, then cpu6 is 95%, then cpu10 is 95%. It changes any second, and all the time other cpus are doing nothing (according to profile graph).
FastTrack rules are enabled.

Can someone tell me how to optimize it, how to test it?

You should check what is loading your CPU so much.
It could be BGP. BGP is always using only a single core.
When you have multiple BGP peers and a lot of flapping it could load the CPU.