Very Slow L2TP between multiple CHR and RB750Gr3

Howdy,

I will start by saying I am pretty sure the issue is with CHR, not the RB750Gr3..

I have CHR installed as VMs in NewYork and Seattle, different companies. Both have full BGP configured with different /24s. Both have 2 vCPU, one has 2GB RAM the other 4GB RAM

Both have a bridge interface configured with the .1 IP from the /24 and routing .64/27 to the RB750Gr3 over L2TP.

It is extremely slow.. https://www.speedtest.net/result/c/69882aed-9c1f-4d20-8197-a61d46aa13a4

I am not at the location of the RB750Gr3.. I have VMs installed behind it doing the speedtests though.

While the speedtest is running from one of the IPs in the /27, the latency when pinging from my workstation goes up considerably..

Idle:
CHR’s .1 IP 68-71ms
RB750Gr3 .65 IP 107-112ms
VM’s .75 IP 111-114ms

Running Ookla’s Speedtest
CHR 500-900ms
RB750Gr3 1200-1600ms or timeout
VM 1200-1600ms or timeout

The non BGP advertised IP never changes always 68-71ms.

Same thing happens on two different VM hosts, but the latency numbers are a little different, as expected.

What am I missing? Any ideas?

Is that bare L2TP or L2TP/IPsec?

I tried both, first bare L2TP then added IPSec to test.

You are not alone to suffer from this, but in other cases, the “with” and “without” IPsec yield different results.

It would be interesting to sniff the traffic on the CHR to see what is the processing time of the packets, whether they are fragmented etc. But it would require filtering by both the IP of the L2TP client and by the IP of the speedtest server to catch all the three appearances of the request packets from the client (the L2TP transport one, the payload one emerging from the L2TP virtual interface, and the payload one leaving src-nated towards the server) and all three appearances of the responses. You cannot filter by ports because port numbers are only present in the first fragment in case of fragmented packets.

Do the test results change when you use a mangle rule to force MSS to, say, 1300 for traffic both to and from the L2TP tunnel (I know that already ping shows long RTT and no MSS rule can affect it, but the effect on TCP would be interesting anyway).

I considered the with/without IPSec when I saw

encoding: “MPPE128 stateless”

Thought it was messing with the CPU processing so I turned it on to check.

I set the VM’s interface to 1400 to ‘test’ MTU/MSS. With and without MRU set at 1600. I’ll try 1300 as well later today.

The part I find most “interesting” is that the .1 IP on the CHR router also takes a hit while the speedtest is running.

I will attempt the packet captures. The RB750Gr3 does have other ‘real’ traffic on it, but the two CHRs are just for getting the /27 routed over L2TP (or another tunnel) to multiple endpoints.

IPSec maxing the CPU usually? Yes, I ran into this when I was still using the RB2011.. Don’t have them anymore.. Hardware accelerated IPSec devices only.


I will also add there is no NAT’d traffic involved. I setup RFC1918 IPs for the L2TP interface, but they are just used for that. I’m routing a subnet of the /24 I am BGP advertising with the CHR routers to the RB750Gr3. The RB750Gr3 has some NAT’d traffic, but it is for the ‘live’ network, this is using different interfaces and public IPs.

On the CHR routers, ‘IP - Firewall’ has one Raw rule to drop an address list, which currently has 3 IPs in it. (Just a few IPs that were filling my logs with SSH and/or WinBox connection attempts)

KISS principle

@kevinds - We have given up trying to get the L2TP virtual LAN to run at full speed WAN link speed. @Sindy is right about not being alone. However, it works and it is stable if you are not expecting to run at 1Gb/sec speeds. Let’s hope that the situation will improved with the new hardware or v7.

I’d be content with 100 mbps and very happy with 500 mbps.. But I’m seeing less than 1 mbps and ~25% packet loss.

I don’t understand why pinging the bridge interface with the .1 from the BGP advertised /24 takes a crap when there is traffic over the L2TP to the /27. Should I be changing to straight GRE tunnels?

GRE tunnel made a difference, but still wasn’t expecting the results..

https://www.speedtest.net/result/c/711117e2-fa3d-485b-b8ff-a3f9821b24a3

Download speed is still terrible, but upload is much, much better…

Latency jumps during both the upload and download tests but during the upload tests it only increases by 100ms.. During the download test the latency jumps 3000+ ms.

This doesn’t make sense to me.

Beginning to think it is CHR and the host.. Changed to Vultr’s 2 CPU High-Compute (Seattle) and things are slightly better but still terrible.

Downloading an ISO file, 105KB/s.. But latency after the download starts jumps on all four IP addresses, instead of 70ms on the VPS IP, seeing 300.

Might try a traditional RouterOS install next, just to see if it makes a difference. Traditional install didn’t work. The long-term ISO wouldn’t boot. The beta ISO would boot/install but the system wouldn’t boot afterwards.

Download is still less than 1 mbps on the speedtests but upload it is showing ~25.