Community discussions

 
User avatar
lapsio
Member
Member
Topic Author
Posts: 472
Joined: Wed Feb 24, 2016 5:19 pm

CCR1009 - low single tcp tunnel performance?

Fri Aug 31, 2018 1:08 am

I recently managed to get my hands on Intel X710-DA4, CRS317 and CCR1009. However unfortunately... Performance is quite disappointing and I don't know who to blame. When I enable multiple tunnels in iperf then everything is cool - full 10G. However with single tunnel... not so much.

If I use UDP for iperf then again I get full 10G, even on single "connection", but TCP caps between 3.5 - 4gbps (with 9k jumbo ofc). While I expected CCR to struggle with single tcp tunnel (because Tile-Gx is superscalar/megascalar CPU type) I must say that I rather expected something around... idk 6-8 gbps. Not 3.5. I'm not really sure who to blame now. CCR is first shot but on the other hand... why would UDP work fine then? Firewall processing is exactly the same, jumbo frames the same, features the same, only difference in TCP stack itself. Also tests didn't really scale well - 2 connections achieved only ~4.5 gbps, not 7. If it was CCR issue I think I'd expect more linear scaling.

On the other hand workstation uses i7-2600k and pci-e 2.0 (Intel x710 is 3.0 card) so it's relatively dated and could be an issue as well. Plus I performed test on single machine, 2 interfaces (using dstnat+masquerade hack on CCR to force push packets via router rather than loopback). When I used UDP in iperf I got giant wall of "out of order" warnings so there's probably sh*tton of packets reordering in tcp so it sounds to me like root of issues but I don't really know what it means in realms of this problem nor what can I do to reduce it. Connection is really simple: workstation -> CRS317 -> CCR1009.

Using 3m DACs. CRS with CCR connected via tagged trunk. NAT rule 6.6.6.6 -> second interface address of workstation + masquerade so that iperf to 6.6.6.6 resulted in looping traffic from first interface, through router back to workstation to different interface.

Both machines didn't indicate much load. None of cores utilization or ipc exceeded 30% yet performance was crap.

So my question is - could someone confirm that CCR1009 indeed bottlenecks around ~3.7 gbps with single tcp connection? Otherwise I'll continue investigation on workstation side. Right now im puzzled, disappointed and demotivated :P
MTCNA, MTCRE, MTCINE
 
User avatar
lapsio
Member
Member
Topic Author
Posts: 472
Joined: Wed Feb 24, 2016 5:19 pm

Re: CCR1009 - low single tcp tunnel performance?  [SOLVED]

Sat Sep 01, 2018 7:18 pm

So yeah. It's CCR1009 issue. It really does bottleneck on single TCP connection, even with 9k jumbo at 3.5 gbps. With standard 1500 frames it bottlenecks at around 1.2gbps. When fasttrack is disabled and we use bridge ip firewall

Removing bridge interface (so that ip is assigned directly to VLAN interface instead of bridge with vlan interface added) but still using conventional firewall increases throughput to around 4.5 gbps with jumbo and around 1.5 gbps with 1500 which is already a bit better.
Enabling fasttrack results in full 10G (9.8 bps) and around 8.5 gbps with 1500 which is more than okay.

So fasttrack results in full 10G support on CCR1009 even with single TCP tunnel. I'm still a bit disappointed though I hoped that I won't have to use fasttrack. I'm still a little bit concerned about fasttrack security. For now I enabled fasttrack only between machines with Intel X710 NICs because only they need full 10G on single TCP tunnel (mostly for storage)
MTCNA, MTCRE, MTCINE
 
User avatar
chechito
Forum Guru
Forum Guru
Posts: 1740
Joined: Sun Aug 24, 2014 3:14 am
Location: Bogota Colombia
Contact:

Re: CCR1009 - low single tcp tunnel performance?

Sat Sep 01, 2018 10:17 pm

i suppose you put the 10g NIC on the pci express x16 slot of your motherboard
 
User avatar
lapsio
Member
Member
Topic Author
Posts: 472
Joined: Wed Feb 24, 2016 5:19 pm

Re: CCR1009 - low single tcp tunnel performance?

Sun Sep 02, 2018 12:46 am

i suppose you put the 10g NIC on the pci express x16 slot of your motherboard
Technically x8 because it's P67 chipset so it has x8/x8 pci-e 2.0 but card has x8 connector anyways. As it's 4x10G NIC it has theoretical throughput of around 36 gbps in such config. In practice probably above 20 or something. I didn't test. But nevertheless 10G works. I wanted 4 ports because MikroTik switches don't support VEPA viewtopic.php?f=2&t=135226&p=666067&hilit=vepa#p666067 so if I want VMs to be switched through CRS317 I need to attach each VM to separate port (otherwise CRS won't forward packet destined to the same port as it came from). So for VMs attached to single port I need to route them through router (I can split VMs into groups of 4 that are switched in hardware and each such group can be routed through CCR1009. Without routing I could have only 4 VMs). It works as expected. All in all I wanted 10G on single tcp connection and I managed to get it on CCR1009 with fasttrack so I guess I'm satisfied with results.

Iperf (MTU 1500):

No fasttrack, single TCP with bridge-ip-firewall and ip on bridge
lapsio@linux-qzuq ~> iperf3 -c 6.6.6.6 -p 8150 -P 1 -Z -b 10G
Connecting to host 6.6.6.6, port 8150
[  4] local 10.237.230.226 port 42584 connected to 6.6.6.6 port 8150
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  68.8 MBytes   577 Mbits/sec    1    341 KBytes
[  4]   1.00-2.00   sec   154 MBytes  1.29 Gbits/sec  398    529 KBytes
[  4]   2.00-3.00   sec   120 MBytes  1.01 Gbits/sec  923    327 KBytes
[  4]   3.00-4.00   sec   150 MBytes  1.26 Gbits/sec    0    571 KBytes
[  4]   4.00-5.00   sec   115 MBytes   962 Mbits/sec  587    433 KBytes
[  4]   5.00-6.00   sec   136 MBytes  1.14 Gbits/sec  472    423 KBytes
[  4]   6.00-7.00   sec   111 MBytes   932 Mbits/sec  653    318 KBytes
[  4]   7.00-8.00   sec   119 MBytes  1.00 Gbits/sec  370    238 KBytes
[  4]   8.00-9.00   sec   124 MBytes  1.04 Gbits/sec    0    494 KBytes
[  4]   9.00-10.00  sec   143 MBytes  1.20 Gbits/sec  418    461 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  1.21 GBytes  1.04 Gbits/sec  3822             sender
[  4]   0.00-10.00  sec  1.21 GBytes  1.04 Gbits/sec                  receiver

iperf Done.
Fasttrack, ip on vlan:
lapsio@linux-qzuq ~> iperf3 -c 6.6.6.6 -p 8150 -P 1 -Z -b 10G
Connecting to host 6.6.6.6, port 8150
[  4] local 10.237.230.226 port 42718 connected to 6.6.6.6 port 8150
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec   893 MBytes  7.49 Gbits/sec  232    793 KBytes
[  4]   1.00-2.00   sec  1012 MBytes  8.49 Gbits/sec  577    445 KBytes
[  4]   2.00-3.00   sec  1.00 GBytes  8.60 Gbits/sec  460    601 KBytes
[  4]   3.00-4.00   sec  1.02 GBytes  8.72 Gbits/sec  163    544 KBytes
[  4]   4.00-5.00   sec  1.01 GBytes  8.67 Gbits/sec  337    648 KBytes
[  4]   5.00-6.00   sec  1.02 GBytes  8.73 Gbits/sec  326    882 KBytes
[  4]   6.00-7.00   sec  1.00 GBytes  8.61 Gbits/sec  188    882 KBytes
[  4]   7.00-8.00   sec  1.00 GBytes  8.62 Gbits/sec  245    411 KBytes
[  4]   8.00-9.00   sec  1020 MBytes  8.55 Gbits/sec  183    554 KBytes
[  4]   9.00-10.00  sec  1021 MBytes  8.56 Gbits/sec  186    881 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  9.90 GBytes  8.51 Gbits/sec  2897             sender
[  4]   0.00-10.00  sec  9.90 GBytes  8.50 Gbits/sec                  receiver

iperf Done.
MTCNA, MTCRE, MTCINE
 
Paternot
Long time Member
Long time Member
Posts: 607
Joined: Thu Jun 02, 2016 4:01 am
Location: Niterói / Brazil

Re: CCR1009 - low single tcp tunnel performance?

Sun Sep 02, 2018 1:13 am

So fasttrack results in full 10G support on CCR1009 even with single TCP tunnel. I'm still a bit disappointed though I hoped that I won't have to use fasttrack. I'm still a little bit concerned about fasttrack security. For now I enabled fasttrack only between machines with Intel X710 NICs because only they need full 10G on single TCP tunnel (mostly for storage)
Why would fasttrack be less secure than no fasttrack? The streaming is marked to be fasttracked after the firewall looks into it, so I don't get this.
 
User avatar
chechito
Forum Guru
Forum Guru
Posts: 1740
Joined: Sun Aug 24, 2014 3:14 am
Location: Bogota Colombia
Contact:

Re: CCR1009 - low single tcp tunnel performance?

Sun Sep 02, 2018 2:18 am

the question about pxi express x16 slot is because in that platforms is the only slot almost guaranteed to be connected directly to CPU in most motherboards
 
User avatar
lapsio
Member
Member
Topic Author
Posts: 472
Joined: Wed Feb 24, 2016 5:19 pm

Re: CCR1009 - low single tcp tunnel performance?

Sun Sep 02, 2018 2:51 am

Why would fasttrack be less secure than no fasttrack? The streaming is marked to be fasttracked after the firewall looks into it, so I don't get this.
in filter chain - yeah but there's plenty of caveats. For example mangle chain and packet marking. Iirc fasttracked packets don't get processed on per-packet basis in mangle chain so I think it could cause issues with PBR based on packet marks. For example if you want to route all managment traffic via separate gateway then I'm afraid fasttrack would result in connections leaking to primary gateway. Basic concept is that fasttrack bypasses firewall under certain conditions. So one bad move and you may be in trouble.
MTCNA, MTCRE, MTCINE
 
Paternot
Long time Member
Long time Member
Posts: 607
Joined: Thu Jun 02, 2016 4:01 am
Location: Niterói / Brazil

Re: CCR1009 - low single tcp tunnel performance?

Sun Sep 02, 2018 8:35 pm

Why would fasttrack be less secure than no fasttrack? The streaming is marked to be fasttracked after the firewall looks into it, so I don't get this.
in filter chain - yeah but there's plenty of caveats. For example mangle chain and packet marking. Iirc fasttracked packets don't get processed on per-packet basis in mangle chain so I think it could cause issues with PBR based on packet marks. For example if you want to route all managment traffic via separate gateway then I'm afraid fasttrack would result in connections leaking to primary gateway. Basic concept is that fasttrack bypasses firewall under certain conditions. So one bad move and you may be in trouble.
Yes, it bypasses the firewall - but only after the first packet is processed by it. I use the mangle chain to divide traffic between two WANs, and the fasttrack doesn't seen to cause problems with it. It is true that somethings don't work with fasttrack - but this is more a nuisance than a security problem.
 
User avatar
lapsio
Member
Member
Topic Author
Posts: 472
Joined: Wed Feb 24, 2016 5:19 pm

Re: CCR1009 - low single tcp tunnel performance?

Sun Sep 02, 2018 9:36 pm

I use the mangle chain to divide traffic between two WANs, and the fasttrack doesn't seen to cause problems with it.
I thought that routing-mark is per-packet, not per-connection. If you assign routing mark on connection level it's gonna persist and be taken into account in routing rules?
MTCNA, MTCRE, MTCINE
 
Paternot
Long time Member
Long time Member
Posts: 607
Joined: Thu Jun 02, 2016 4:01 am
Location: Niterói / Brazil

Re: CCR1009 - low single tcp tunnel performance?

Mon Sep 03, 2018 4:29 am

I use the mangle chain to divide traffic between two WANs, and the fasttrack doesn't seen to cause problems with it.
I thought that routing-mark is per-packet, not per-connection. If you assign routing mark on connection level it's gonna persist and be taken into account in routing rules?
I believe it is per connection. It is working for me, and the option is "mark connection". True, it could use per-packet mark to do it, but I believe it doesn't.

Who is online

Users browsing this forum: No registered users and 5 guests