Both sites are connected to the same ISP with pppoe (mtu 1492) 100/100Mbps connection, ~4ms latency and 3-4 hops away. Almost ideal.
Both routers are on the same sw level 6.35.4.
L2TP tunnel MTU is at 1450.
IPSec is sha1/aes256cbc in ph2.
Case1
PC1 and PC2 are physical linux machines.
iperf (tcp) and scp transfers in any direction almost max out at 100Mbps (w/ overhead, actual throughput on the tunnel is around 80-90Mbps)
Case2
PC1 is a windows machine, PC2 is linux
tcp transfers from PC2->PC1 give a similarly good performance, throughtrput around 80-90Mbps
tcp transfers from PC1->PC2 give barely 10-11Mbps of throughtput.
I’ve tested different windows machines, different protocols SMB, FTP and iperf.
I’ve also played with TCP adaptive windowing on the windows machine as well as TCP Offloading to no avail…
Case3
PC1 and PC2 are windows machines.
Throughput is around 10Mbps in both directions with same tests.
Case4
PC1 and PC2 are windows machines
ipsec is disabled so there’s only l2tp tunnel between
Throughput in both directions is again cca 80-90Mbps.
I’ve compared packets in wireshark in situations when the throughput is good and bad and i coulnd’t identify anything different.
Packets are same size, i couldn’t identify MTU problems.
There’s a dynamic mangle rule (from pppoe profile) for TCP syn packets to keep MSS below 1410.
I’ve also played with GRE/ipsec tunnels and GRE MTU of 1400, DF disabled and TCP MSS clamp enable - no change in behavior.
I think it is due to the bad TCP implementation in Windows.
The setup you depict is potentially re-ordering packets because the multicore CCR architecture operates
on individual packets on separate cores, and the processing of a later packet may finish sooner.
This should not be a problem because IP is a datagram protocol that does not specify integrity or sequence
of datagrams, it is the task of the end-system to re-order datagrams in the correct sequence.
However, Linux performs this task much better than Windows. Windows may ask for re-transmits of
packets immediately when it receives an out-of-sequence packet and assumes the packet inbetween is
lost, while in fact it will arrive immediately thereafter and no action should be taken other than re-ordering.
AES-256-cbc uses hardware “acceleration”. I put in quotes because it seems to be coded in one thread. Change from AES-256-cbc to AES-256-ctr or Camelia-256 and try.
PS: and better use GRE or IP-IP instead l2tp in this case…
No it is not coded in one thread. All cores are used encrypt/decrypt. Of course it will try to stay on one core as much as it can to avoid packet reordering that pe1chl mentioned, but if one core is not enough to process amount of packets then next core is used.
Note that “performance measurement” is also more likely to trigger that condition than the normal traffic.
Of course copying a large file may look the same as performance measurement, but “bad results” when pushing
traffic does not necessarily mean it is always that bad.
But fact is that AES-256-CBC much slower for transferring in one pipe (copying via Samba for example) comparing AES-256ctr or Camelia-256. I’ve tested on 6.36 , CCR-1016 ↔ CCR-1016 - CBC gives maximum 4-5 Mbit, CTR - about 30-35 Mbit, Camelia - 35-40Mbit for 100Mbit inter office connection. 100M channels on both sides. 40 ms. I.e. theoretical maximum is 50Mbit for default Windows TCPWindowSize. I tried GCM but connection failed on phase2.
With SMB, windowsize is not the only limitation. There is also the request size.
Also, when you want optimum speed it is not wise to use such heavy encryption.
Use it only in phase1 when you consider yourself that important, and use 128 bit in phase2 with short lifetime.
I can confirm i’m definitely getting better throughput with sha1/aes256ctr (aes128ctr gives similar performance) on windows machines.
I’m getting around 70Mbps in either direction both with linux and windows.
This is a huge improvement for windows, yet it’s about 20-30% worse for linux to linux.
The worst thing about mikrotik is that basically there’s almost no testing and verification of software releases.
I mean after so much changes for ipsec in relation to CCRs. one would expect it to be working way better by now.
There really should be a list of test cases that should be verified BEFORE the sw is released.
e.g. have a CCR to CCR testbed and simply do some throughput testing.
Actually with aes256ctr i’m getting more like 50-60Mbps (even with multiple tcp connections!).
Win some, lose some… it’s like a game with ipsec
Here some UDP stats:
aes256cbc:
------------------------------------------------------------
Server listening on UDP port 5001
Receiving 1470 byte datagrams
UDP buffer size: 416 KByte (WARNING: requested 16.0 MByte)
------------------------------------------------------------
[ 3] local 10.1.0.102 port 5001 connected with 10.2.0.18 port 39546
[ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams
[ 3] 0.0-60.0 sec 357 MBytes 50.0 Mbits/sec 0.329 ms 35/254921 (0.014%)
[ 3] 0.0-60.0 sec 67360 datagrams received out-of-order
aes256ctr:
------------------------------------------------------------
Server listening on UDP port 5001
Receiving 1470 byte datagrams
UDP buffer size: 416 KByte (WARNING: requested 16.0 MByte)
------------------------------------------------------------
[ 3] local 10.1.0.102 port 5001 connected with 10.2.0.18 port 40971
[ 3] 0.0-60.0 sec 356 MBytes 49.7 Mbits/sec 0.335 ms 191/253921 (0.075%)
[ 3] 0.0-60.0 sec 1740 datagrams received out-of-order
Wow… 67000 vs 1700.
Is there really no hope for improvement?
As for aes-ctr, I can’t figure out what’s the bottleneck in this case - i’m not seeing any CPU core being overloaded.
Out of order packets are way lower than in aes-cbc and there’s not much packet loss as well.
That explains (see earlier in the thread) why there is less re-ordering.
However, I think you are creating your own problem by using aes-256 instead of aes-128.
Sure you can argue that “it is offered so it should work”, but you should be able to use aes-128 in phase2 without much concern.
Also, I persist in my opinion that it is a Windows problem, you should report it to Microsoft. IP layer is fully within spec
when it re-orders some of the packets.