With an RFC2544 test we see 100% load on one of CCR2004 cores and frame drops with just 3.9gbps traffic, 350k pps @ 1396 bytes between two loop-backed 10G ports.
This is rather poor, we cannot load even two 10G ports out of 12 that device has. I stripped the config to bare minimum with no bridge firewalls etc. Learning is disabled. ROS 6.47.3. Any ideas on further tuning ?
uhm, and how was this test done exactly?
except the " two loop-backed 10G ports" there’s nothing mentioned. what util, what generated the traffic on which port, what captured it on what port.. except RFC2544 which states hours of testing.. nothing.
so.. again, what and how did you test exactly?
I did mention that it’s 350k pps with 1396 byte frame size (~3.9 gbps traffic). Ports pairs in bridges as you can see in config, e.g. sfp1/sfp2, and learning is disabled. There is hardware loopback on port sfp1 and tester is connected to sfp2. Traffic is generated using Digital Lightwave tester, typical L2 frames used for RFC2544.
This device doesn’t have a switch chip: even with these two ports on the same bridge, this is routing, CPU wise.
You removed everything - even the fast path rule. This device is rated to about 13Mbps, without fast path and with 25 ip rules. This is about 4 times what You are seeing - but only if using the 4 cores! Did You test with a single network stream? If so, that would explain the single core load and the low throughput.
I’m aware it is a CPU-based device. However, datasheet says 40G throughput with bridging and I’m not able to meet this number. Test is a single stream, but won’t you expect to be able to fill two 10G ports with single stream on a 12-port device?
Which “fast path rule” you’re referring to exactly? Keep in mind that I’m testing a bridge and rules under /ip filter have no effect.
In my initial post there is “interface monitor-traffic” output, you can see that fastpath is being used by the bridge (fp-tx-bits, fp-rx-bits).
@Paternot got it a little wrong.
But, those tests say 39444Mbps, 3248kpps at 1518 byte packet size on ALL ports. (thus involving all 12x 10G ports and the two remaining uplinks of 25Gbps out of the 4 total on that 98PX1012, two of which are connected to the CPU?).
Is performance that limited when using only the network ports without the other two uplinks? maybe.
Some talk here: http://forum.mikrotik.com/t/just-going-to-leave-this-here/137223/1
I start to think so too. The specs are vague on how exactly the RFC2544 was measured. Possibly the bandwidth numbers are not the amounts of traffic transiting, but sum of rx+tx traffic on all the ports. So, in my test case with sfp1/sfp2 in a bridge, tester sending 3.9gbit on sfp2, and hw loopback on sfp1, the spec would indicate the result as 3.9*4 = 15.6 gbits. While in reality I’m simulating scenario of two ports sending 3.9gbit to each other, so 7.8gbit is actually transiting the device.
It is also disappointing that only a single CPU core is used, while traffic is on two different 10G ports.
Well, like in the other thread, it is mentioned that this is advertised as a router, not a switch, so maybe performance between the 25Gbps ports and 10Gbps ports might be better than only using the 10Gbps ports. I see that config as the intended use for this device.
Archived datasheet of that PX is found here: https://www.datasheetarchive.com/whats_new/e0f274c1ed596a366491eca03055bf0c.html
I guess it depends on implementation. From Mikrotik’s block diagram it isn’t clear if 98PX1002 links to CPU are dedicated to particular port group or they are aggregated.
As regards to being router or a switch … I’m evaluating CCR2004 for VPLS/EoIP services. So it needs to do both basic bridging and L2.5/L3 encapsulation/routing. However, seems I can barely use it for 1G services. Above 1G I’m starting to encounter out-of-sequence frames. So I decided to verify whether it meets basic datasheet specs.
Which, if I understand things correctly, is the basic reason for single stream consuming single CPU … much easier to get the timing right and not to introduce out-of-sequence frames due to different processing time on different CPU cores.
RFC2544 is documented. One of the rules is to test ALL interfaces at the same time - half of them with inbound traffic, the other half with outbound.
As I said above: a single core will be used if the traffic is a single stream. Try multiple streams.
You better read it up. The spec sheet says “fast path”. The manual page I sent you says “fast path”. You insist on “fast track”, don’t answare if the test was with a single connection, don’t read upon the RFC used on the test. And, yet, want a solution.
Firstly, the post you are responding to is from another forum member and not me. Myself I already answered that it is a single stream test. This is how telco carriers worldwide test circuit performance.
As regards to the fast-path rule, which rule exactly you think should be added? Note that I provided “monitor-traffic” output which showed that all traffic was already going through fast-path.