Community discussions

MikroTik App
 
syadnom
Forum Veteran
Forum Veteran
Topic Author
Posts: 849
Joined: Thu Jan 27, 2011 7:29 am

question on tunnel performance and getting past single core limits

Mon Apr 15, 2024 7:27 pm

Hi all, I'm looking to do some tunneling but my testing keeps finding limitations. Target is 2Gbps aggregate ie 1Gbps 'full duplex'

First note that I'm using the built in bandwidth test, so results will be a bit higher if it's just handling the tunnel, but te rb5009 can do 5Gbps on the TCP test and 8.7 on the UDP so the testing isn't going to steal away more than 10-20% performance .

First issue, I need to handle fragmenting seamlessly and transparently, the connections are over 1500 MTU commodity services and I need to pass full frame. I also don't need encryption, the data being carried is encrypted.

My test boxes and target platform is RB5009 though a CCR2xxx could be used also. One of the modern arm64 devices. For my tests I'm using the SFP+ 10G port so that's not a limitation.

I can get about 1Gbps aggregate in a bi-directional UDP test out of zerotier, about 1/2 my target. This is multi-threaded, but uses up 100% of 4 CPUs to get there. I tried on a CCR2116 and got similar results, looks like it wont scale past 2 cores per direction. One direction maxes 2 cores and ~418-506Mbps.

Next up is wireguard. Wireguard can do about 1.5Gbps aggregate with 1600 byte UDP packets (ie, all fragmented) and can get about 1G one way on the same test. Problem is that it doesnt do layer2 and stacking something on top takes this down to about zerotier level. It does appear to be multithreaded, pushes up CPU usage on all 4 cores.

l2tp is fast w/ excrytion off, but it wont get out of a single core. UDP encapsulation hits 4 cores but decapsulation only hits one. thats 4x 50% usage on the sending side and 1x 96% on the receiving side. TCP seems to be limited to 1 core on the sending side too. so that means a 700-800Mbps speed limit. I just don't know if there's a reasonable way around this. l2tp can't go into a bonding interface, which would have CPU cost anyway.

GRE seems to be doing much of what l2tp does. However, sending much over 1Gbps through GRE and it quits. Have to way 20 seconds or so for the tunnel to re-establish. Sending side at 4x50% usage.

IPIP is the same as GRE in that I can't get much more than 1G over it, but it does lean a bit more up against 1G one way and 1.6Gbps aggregate. (again, this is with fragmentation). IPIP appears to be single threaded.

Any ideas here? Not a lot of options for faster boxes, especially when single threading is part of the deal. Ideally I want to be layer2 and use the mikrotik to bridge the tunnel to an ethernet port so the IP only options aren't ideal. Wireguard seems to be able to extract the most from the hardware but needs a layer2 tunnel over the top.

Thanks.
 
rplant
Long time Member
Long time Member
Posts: 659
Joined: Fri Sep 29, 2017 11:42 am

Re: question on tunnel performance and getting past single core limits

Tue Apr 16, 2024 3:04 am

I have seen posters having success using vxlan, apparently it runs multi core
and supports fast path.

If the above works, For later experimentation:
You could somehow have the encrypted traffic in the internal network(s) use bigger than 1500 sized packets. (say 2800+)
And let them be fragmented over the wan with a relatively small loss of efficiency.
 
syadnom
Forum Veteran
Forum Veteran
Topic Author
Posts: 849
Joined: Thu Jan 27, 2011 7:29 am

Re: question on tunnel performance and getting past single core limits

Tue Apr 16, 2024 4:07 am

Unfortunately vxlan doesn't handle frementation. It is fast though, can push ~5Gbps UDP and >3.6Gbps TCP which I think is being influenced by the brandwidth test quite a bit... on an rb5009.

And vxlan over wireguard works but maxes out the CPU, I still have my testbed up so I did the following

vxlan@1600MTU > wireguard@2800MTU > SFP+ @1500MTU SFP+ >wireguard@2800MTU > vxlan@1600MTU.

I'm able to ping with packets at 1550 so wireguard is handling fragmentation. and passing the oversized vxlan packets through just fine. However, wireguard's CPU use shows up and limits to around 800Mbps one way and 600-700Mbps both ways on UDP.

As far as handling this in the encryption, sure, but mikrotik doesn't appear to support much for expanded MTU here. using l1tp or ipip and seeing ipsec, the tunnel is still doing the anti-fragmentation. setting a transport mode policy doesn't allow for any sort of MTU configuration and tunnels set with high MTUs still throw a 'packet to large' error.
 
rplant
Long time Member
Long time Member
Posts: 659
Joined: Fri Sep 29, 2017 11:42 am

Re: question on tunnel performance and getting past single core limits

Tue Apr 16, 2024 9:15 am

vxlan has an option to override/force allow fragmentation. It might need working pmtu to work correctly?
 
syadnom
Forum Veteran
Forum Veteran
Topic Author
Posts: 849
Joined: Thu Jan 27, 2011 7:29 am

Re: question on tunnel performance and getting past single core limits

Thu Apr 18, 2024 12:22 am

vxlan has an option to override/force allow fragmentation. It might need working pmtu to work correctly?
where are you seeing this configuration?
 
syadnom
Forum Veteran
Forum Veteran
Topic Author
Posts: 849
Joined: Thu Jan 27, 2011 7:29 am

Re: question on tunnel performance and getting past single core limits

Thu Apr 18, 2024 12:30 am

I'm not finding anythign allowing for packet re-assembly on mikrotik. I know other platforms can do this, but it seems unimplemented on mikrotik.
 
syadnom
Forum Veteran
Forum Veteran
Topic Author
Posts: 849
Joined: Thu Jan 27, 2011 7:29 am

Re: question on tunnel performance and getting past single core limits

Thu Apr 18, 2024 12:46 am

excuse the spam here, but I put the vxlan interfaces into a bridge with them having 2800 MTU and setting the bridge to 2800 and it works. IPs on the vxlan interfaces would not pass oversized but in a bridge it does. confirmed 1600MTU UDP speed tests cross the bridge on an SFP+ port set to 1514 MTU.

However, this introduces some issue where the tunnel can't handle more than about 1Gbps aggregate UDP. small pings still cross the vxlan interface but oversized pings don't. It's almost like I'm crashing the fragmenter/defragmenter. I can get 1.4Gbps on TCP and it and that seems to be the limit, that's about 60-65% CPU on 4 cores, seems like there's headroom but that's all it will do. bidirectional is the same, 1.4Gbps seems to be the limit.

Changing the sfp+ MTU the UDP test goes up to 2.8Gbps and doesn't drop out so I think there's a severe penalty for this fragmentation.
 
syadnom
Forum Veteran
Forum Veteran
Topic Author
Posts: 849
Joined: Thu Jan 27, 2011 7:29 am

Re: question on tunnel performance and getting past single core limits

Thu Apr 18, 2024 1:19 am

I just re-ran this experiment on the other tunnels. This is definitely the fragmentation code freaking out and performing very slowly. I submitted a bug about my 'crashes' where it would stop handling fragmented packets for burst. I think this basically solves my question, this hardware can't really do what I'm asking of it with the fragmentation.
 
rplant
Long time Member
Long time Member
Posts: 659
Joined: Fri Sep 29, 2017 11:42 am

Re: question on tunnel performance and getting past single core limits

Thu Apr 18, 2024 2:56 am

I think at this point you know far more than me about vxlan, fragmentation.
But anyway...
where are you seeing this configuration?
dont-fragment (disabled | enabled | inherit; Default: disabled)
(default disabled looks correct)
I'm not finding anythign allowing for packet re-assembly on mikrotik. I know other platforms can do this, but it seems unimplemented on mikrotik.
Sorry, I don't know of anything, but more than likely there would be Nat involved in the 'real world' and it would do it then.
 
User avatar
Amm0
Forum Guru
Forum Guru
Posts: 4965
Joined: Sun May 01, 2016 7:12 pm
Location: California
Contact:

Re: question on tunnel performance and getting past single core limits

Thu Apr 18, 2024 5:56 am

I suppose you could try the old /ip/packing, as that lets you set an aggregated size. It's old as dirt, but if "packing" smaller packets into a bigger one is the goal, worth a look/try:
https://help.mikrotik.com/docs/display/ROS/IP+packing
 
syadnom
Forum Veteran
Forum Veteran
Topic Author
Posts: 849
Joined: Thu Jan 27, 2011 7:29 am

Re: question on tunnel performance and getting past single core limits

Thu Apr 18, 2024 9:06 am

IP packing is backwards, I have big packets that need to fit into small ones.