CCR2004 poor bridge performance

Hi All,

With an RFC2544 test we see 100% load on one of CCR2004 cores and frame drops with just 3.9gbps traffic, 350k pps @ 1396 bytes between two loop-backed 10G ports.

This is rather poor, we cannot load even two 10G ports out of 12 that device has. I stripped the config to bare minimum with no bridge firewalls etc. Learning is disabled. ROS 6.47.3. Any ideas on further tuning ?

> /system resource monitor 
          cpu-used: 25%
  cpu-used-per-cpu: 0%,0%,100%,0%
       free-memory: 1742696KiB

> /interface monitor-traffic sfp-sfpplus1
                         name:  sfp-sfpplus1
        rx-packets-per-second:       352 114
           rx-bits-per-second:       3.9Gbps
     fp-rx-packets-per-second:       352 066
        fp-rx-bits-per-second:       3.9Gbps
        tx-packets-per-second:       352 114
           tx-bits-per-second:       3.9Gbps
     fp-tx-packets-per-second:       352 066
        fp-tx-bits-per-second:       3.9Gbps
    tx-queue-drops-per-second:             0

Config:

# jan/02/1970 14:30:23 by RouterOS 6.47.3
# software id = FDNS-1V9I
#
# model = CCR2004-1G-12S+2XS
# serial number = D4F00C4D3EE2
/interface bridge
add name=br-sfp12 protocol-mode=none
add name=br-sfp34 protocol-mode=none
add name=br-sfp56 protocol-mode=none
add name=br-sfp78 protocol-mode=none
/interface ethernet
set [ find default-name=sfp-sfpplus1 ] l2mtu=9200 mtu=9200
set [ find default-name=sfp-sfpplus2 ] l2mtu=9200 mtu=9200
set [ find default-name=sfp-sfpplus3 ] l2mtu=9200 mtu=9200
set [ find default-name=sfp-sfpplus4 ] l2mtu=9200 mtu=9200
set [ find default-name=sfp-sfpplus5 ] l2mtu=9200 mtu=9200
set [ find default-name=sfp-sfpplus6 ] l2mtu=9200 mtu=9200
set [ find default-name=sfp-sfpplus7 ] l2mtu=9200 mtu=9200
set [ find default-name=sfp-sfpplus8 ] l2mtu=9200 mtu=9200
set [ find default-name=sfp-sfpplus9 ] l2mtu=9200 mtu=9200
set [ find default-name=sfp-sfpplus10 ] l2mtu=9200 mtu=9200
set [ find default-name=sfp-sfpplus11 ] l2mtu=9200 mtu=9200
set [ find default-name=sfp-sfpplus12 ] l2mtu=9200 mtu=9200
/interface wireless security-profiles
set [ find default=yes ] supplicant-identity=MikroTik
/user group
set full policy=local,telnet,ssh,ftp,reboot,read,write,policy,test,winbox,password,web,sniff,sensitive,api,romon,dude,tikapp
/interface bridge port
add bridge=br-sfp12 interface=sfp-sfpplus1 learn=no
add bridge=br-sfp12 interface=sfp-sfpplus2 learn=no
add bridge=br-sfp34 interface=sfp-sfpplus3 learn=no
add bridge=br-sfp34 interface=sfp-sfpplus4 learn=no
add bridge=br-sfp56 interface=sfp-sfpplus5 learn=no
add bridge=br-sfp56 interface=sfp-sfpplus6 learn=no
add bridge=br-sfp78 interface=sfp-sfpplus7 learn=no
add bridge=br-sfp78 interface=sfp-sfpplus8 learn=no
/ip neighbor discovery-settings
set discover-interface-list=none
/ip address
add address=192.168.88.1/24 comment=defconf interface=ether1 network=192.168.88.0
add address=172.x.x.x/24 interface=ether1 network=172.x.x.0
/ip route
add distance=1 gateway=172.x.x.254

Have you taken a look at

tool profile cpu=all

?

Not in tool profile, but under CPU load 100% due IRQ.

What is the mtu on the client side?

uhm, and how was this test done exactly?
except the " two loop-backed 10G ports" there’s nothing mentioned. what util, what generated the traffic on which port, what captured it on what port.. except RFC2544 which states hours of testing.. nothing.
so.. again, what and how did you test exactly?

I did mention that it’s 350k pps with 1396 byte frame size (~3.9 gbps traffic). Ports pairs in bridges as you can see in config, e.g. sfp1/sfp2, and learning is disabled. There is hardware loopback on port sfp1 and tester is connected to sfp2. Traffic is generated using Digital Lightwave tester, typical L2 frames used for RFC2544.

What else do you need to know?

9000 byte MTU.

Some thoughts about it:

  1. This device doesn’t have a switch chip: even with these two ports on the same bridge, this is routing, CPU wise.
  2. You removed everything - even the fast path rule. This device is rated to about 13Mbps, without fast path and with 25 ip rules. This is about 4 times what You are seeing - but only if using the 4 cores! Did You test with a single network stream? If so, that would explain the single core load and the low throughput.

Here You can find the test results for this unit, and the block diagram too (under the tab support and downloads).
https://mikrotik.com/product/ccr2004_1g_12s_2xs#fndtn-testresults

I’m aware it is a CPU-based device. However, datasheet says 40G throughput with bridging and I’m not able to meet this number. Test is a single stream, but won’t you expect to be able to fill two 10G ports with single stream on a 12-port device?

Which “fast path rule” you’re referring to exactly? Keep in mind that I’m testing a bridge and rules under /ip filter have no effect.

In my initial post there is “interface monitor-traffic” output, you can see that fastpath is being used by the bridge (fp-tx-bits, fp-rx-bits).

@Paternot got it a little wrong.
But, those tests say 39444Mbps, 3248kpps at 1518 byte packet size on ALL ports. (thus involving all 12x 10G ports and the two remaining uplinks of 25Gbps out of the 4 total on that 98PX1012, two of which are connected to the CPU?).
Is performance that limited when using only the network ports without the other two uplinks? maybe.
Some talk here: http://forum.mikrotik.com/t/just-going-to-leave-this-here/137223/1

I start to think so too. The specs are vague on how exactly the RFC2544 was measured. Possibly the bandwidth numbers are not the amounts of traffic transiting, but sum of rx+tx traffic on all the ports. So, in my test case with sfp1/sfp2 in a bridge, tester sending 3.9gbit on sfp2, and hw loopback on sfp1, the spec would indicate the result as 3.9*4 = 15.6 gbits. While in reality I’m simulating scenario of two ports sending 3.9gbit to each other, so 7.8gbit is actually transiting the device.

It is also disappointing that only a single CPU core is used, while traffic is on two different 10G ports.

Well, like in the other thread, it is mentioned that this is advertised as a router, not a switch, so maybe performance between the 25Gbps ports and 10Gbps ports might be better than only using the 10Gbps ports. I see that config as the intended use for this device.
Archived datasheet of that PX is found here: https://www.datasheetarchive.com/whats_new/e0f274c1ed596a366491eca03055bf0c.html

I guess it depends on implementation. From Mikrotik’s block diagram it isn’t clear if 98PX1002 links to CPU are dedicated to particular port group or they are aggregated.

As regards to being router or a switch … I’m evaluating CCR2004 for VPLS/EoIP services. So it needs to do both basic bridging and L2.5/L3 encapsulation/routing. However, seems I can barely use it for 1G services. Above 1G I’m starting to encounter out-of-sequence frames. So I decided to verify whether it meets basic datasheet specs.

Which, if I understand things correctly, is the basic reason for single stream consuming single CPU … much easier to get the timing right and not to introduce out-of-sequence frames due to different processing time on different CPU cores.

Datasheet says 40 Gb WITH fast path. Read the specs. You removed it, so it will NOT do 40 Gb.

And bridges do use fast path:
https://wiki.mikrotik.com/wiki/Manual:Fast_Path

Not really. Take a look at the specs again. The 40Gbps is WITH fast path. And yes, bridges do use fast path:
https://wiki.mikrotik.com/wiki/Manual:Fast_Path

RFC2544 is documented. One of the rules is to test ALL interfaces at the same time - half of them with inbound traffic, the other half with outbound.
As I said above: a single core will be used if the traffic is a single stream. Try multiple streams.

Paternot, you are confusing fastpath with fasttrack. Document yourself better about the two.

You better read it up. The spec sheet says “fast path”. The manual page I sent you says “fast path”. You insist on “fast track”, don’t answare if the test was with a single connection, don’t read upon the RFC used on the test. And, yet, want a solution.

I wash my hands. Good luck with this.

Firstly, the post you are responding to is from another forum member and not me. Myself I already answered that it is a single stream test. This is how telco carriers worldwide test circuit performance.

As regards to the fast-path rule, which rule exactly you think should be added? Note that I provided “monitor-traffic” output which showed that all traffic was already going through fast-path.