GRE over IPSEC, CCR, VERY SLOW

This really sounds so unbelievable to me. It’s like a car manufacturer explaining their customers that this specific car only tops out at 15km/h on tarmac roads and they’re working on the issue and fix it “somewhere” in the future. But offroad the car is fine. Like people use their car to go to work “offroad” each day.

To be honest, everyone buying these CCR’s are using it for TCP. This bottleneck is really critical and should be fixed asap. I cannot digest that I just bought a CCR for a pretty steep price, not being able to handle TCP traffic properly on some ipsec tunnel. :frowning:

I guess, your credibility is at stake here.

@mrz

I am very surprised with you explanation.
We bought several CCR models to be able to handle ipsec VPN tunnels at high speed. Also because of the hardware AES support.
We build VPN networks for our customers. That’s our job.

Now you are telling us that the ipsec speed problems (which are mentioned a lot on this forum) will be fixed in the future???
Please make fast ipsec tunnel and transport handling top priority.

We are going to consider other brands for VPN if Mikrotik is not going to improve.

It is very difficult to find other vendors selling TILE. For now you should try my advice on having multiple tunnels. CPUs like TILE and GPUs are something new to mikrotik so there are bound to be difficulties making full use of it.

If this really just a fragmentation-related issue, wouldn’t explicit MSS clamping solve the issue? Just wondering.
I personally don’t have access to any of the CCRs, unfortunately, otherwise I would have tested it myself.

Its ok, these OEM users dont seem to want to listen to people who use mikrotik hardware in their hacking activities or even experienced and expert users in ways to overcome or get around the problem. With that attitude I’m just glad im not their customers for the services they offer.

Edit: on the history of the VLIW architecture there have been problems making full use of it as evident on the AMD 5xxx and 6xxx series GPUs.

Yeah. Because I really want to go from 10 tunnels to 40 just to get the throughput I need. Talk about network diagram nightmares.

It’s not “just” fragmentation. I’m using MSS clamping, throughput is still pretty poor.
I do believe it has to do something with the offloading, because I don’t see any CPU spikes on the Tilera CPU.

It looks like Mikrotik has finally improved VPN performance in CCR models. :smiley:

Here is the changelog for 6.24rc2

What's new in 6.24rc2 (2014-Dec-10 11:04):

*) fixed problem where some of ethernet cards do not work on x86;
*) improved CCR ethernet driver (less dropped packets);
*) improved queue tree parent=global performance (especially on SMP systems and CCRs);
*) eoip/eoipv6/gre/gre6/ipip/ipipv6/6to4 tunnels have improved per core balancing on CCRs;
*) fixed tx for 6to4 tunnels with unspecified dst address;
*) fixed vrrp - could sometimes not work properly because of advertising bad set of ip addresses;

My testing started at 6.33.3. While others may be reporting performance improvements from earlier versions, I find the hardware encryption quality with IPSEC on CCR1036 dissatisfying. Testing with everything identical except setting and unsetting ipsec-secret in GRE config (enabling and disabling encryption) shows night and day difference in connection quality (TCP retransmission, out of order packets, packet loss, etc). Unfortunately, many applications are sensitive to this. For example, my SMB tests dropped by about 10x (went from 250Mbps/450Mbps down/up to 25/45Mbps). While software encryption improves the latter SMB numbers by about 4x (not seeing the same retransmissions, out of order, loss, etc as before), that also means that I no longer benefit from parallel streams (now limited by single CPUs software encryption abilities). That tells me the single core in the hardware encryption isn’t the limitation (since it still benefits from parallel streams), rather there are issues with quality as already mentioned. It has been over 1 year since that 6.24rc2 release. I hope that doesn’t mean work to fix this has ceased.

I also noticed in-state-sequence-errors under /ip ipsec statistics increasing when quality is poor.

This only “GRE Tunnel + IPsec” problems? Or “IP Tunnel + IPsec” also have problems?

its consist 3 bottlenecking things:

  1. “general” ROS multi-core scalability. whcih is slowly, but improved from versions to versions.
  2. lack of truly-scalable cihper modes.
  3. scalabe chipers itself.

for example Serpent(https://en.wikipedia.org/wiki/Serpent_(cipher)) in EAX mode (https://en.wikipedia.org/wiki/EAX_mode ) was Wastly more scalable in multi and Many -core environment than Rinjael/AES in CBS, XTS and GCM modes (EAX had bit more overhead as 2-p things, than say CCM, but it SCALABLE !!! and benefits from both better hardware and software fine-tuning, while more ancent things - are NOT)

AES-GCM should be enough
https://en.wikipedia.org/wiki/Galois/Counter_Mode

for “multi-core”(ie with strea/core-count equal to 4x and below) - maybe.
but for many-core environment, including most of tilera chips for example - its simply both waste lot of resources and DO NOT scale well.
there isn’t any “should be” in engineering or science.
there only “can do” and relevant “know how” of Good engineers and “can’t do that, Dave” excuses of Bad ones.
and thats basically all. among not-technical aspects of it.
and in my opinion OCB, CWC, EAX support essential in both performance, scalability, securty as GCM support for similar reasons are(compared to CCM and more ancient modes)
btw there also was very funny and interesting GCM fork aswell, named SGCM https://en.wikipedia.org/wiki/Sophie_Germain_Counter_Mode which may be handy in networking for obvious reasons/advantages over generic/original GCM and Considerably boost its securty too.
but so far i think CWC is Most interesting candidate to start such work with(more clear benefits, less legal obstacles, documented src-code, etc).

+1 for massive packet reordering in simpe IPSEC+GRE case.

Bug can be easily reproduced on 2 CCRs connected by typical 5..15Mb WAN.
iperf3 in UDP mode reporting out of order packets even on manual bandwidth regulation (10-30% of actual link bandwidth)

[root@gw-sev.bigcar.local ~]# iperf3 -c 192.168.2.2 -b 1M  -u -V
iperf 3.0.11
Linux gw-sev.bigcar.local 2.6.32-573.12.1.el6.x86_64 #1 SMP Tue Dec 15 21:19:08 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
Time: Fri, 25 Dec 2015 21:51:28 GMT
Connecting to host 192.168.2.2, port 5201
      Cookie: gw-sev.bigcar.local.1451080288.34838
[  4] local 10.1.0.7 port 45391 connected to 192.168.2.2 port 5201
Starting Test: protocol: UDP, 1 streams, 8192 byte blocks, omitting 0 seconds, 10 second test
[ ID] Interval           Transfer     Bandwidth       Total Datagrams
[  4]   0.00-1.00   sec   112 KBytes   917 Kbits/sec  14  
[  4]   1.00-2.00   sec   120 KBytes   983 Kbits/sec  15  
[  4]   2.00-3.00   sec   128 KBytes  1.05 Mbits/sec  16  
[  4]   3.00-4.00   sec   120 KBytes   983 Kbits/sec  15  
[  4]   4.00-5.00   sec   120 KBytes   983 Kbits/sec  15  
[  4]   5.00-6.00   sec   128 KBytes  1.05 Mbits/sec  16  
[  4]   6.00-7.00   sec   120 KBytes   983 Kbits/sec  15  
[  4]   7.00-8.00   sec   120 KBytes   983 Kbits/sec  15  
[  4]   8.00-9.00   sec   120 KBytes   983 Kbits/sec  15  
[  4]   9.00-10.00  sec   128 KBytes  1.05 Mbits/sec  16  
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bandwidth       Jitter    Lost/Total Datagrams
[  4]   0.00-10.00  sec  1.19 MBytes   996 Kbits/sec  0.833 ms  42/152 (28%)  
[  4] Sent 152 datagrams
CPU Utilization: local/sender 0.4% (0.0%u/0.4%s), remote/receiver 0.0% (0.0%u/0.0%s)

(iperf client counts reordered packets as lost)

Server listening on 5201
-----------------------------------------------------------
Accepted connection from 10.1.0.7, port 46784
[  5] local 192.168.2.2 port 5201 connected to 10.1.0.7 port 45391
iperf3: OUT OF ORDER - incoming packet = 3 and received packet = 4 AND SP = 5
iperf3: OUT OF ORDER - incoming packet = 6 and received packet = 7 AND SP = 5
....
iperf3: OUT OF ORDER - incoming packet = 148 and received packet = 149 AND SP = 5
iperf3: OUT OF ORDER - incoming packet = 151 and received packet = 152 AND SP = 5
[  5]   9.00-10.00  sec   128 KBytes  1.05 Mbits/sec  0.833 ms  5/16 (31%)  
[  5]  10.00-10.09  sec  0.00 Bytes  0.00 bits/sec  0.833 ms  0/0 (-nan%)  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Jitter    Lost/Total Datagrams
[  5]   0.00-10.09  sec  1.19 MBytes   987 Kbits/sec  0.833 ms  42/152 (28%)  
[SUM]  0.0-10.1 sec  42 datagrams received out-of-order
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------

And this is only 1Mbit between 2 CCRs.

IPSec is working stable - UDP flooding over IPSEC always utilise all bandwidth. Statistics is almost clean:

[artem@mow01.r.severscan.ru] > /ip ipsec statistics print 
                  in-errors: 0
           in-buffer-errors: 0
           in-header-errors: 0
               in-no-states: 60
   in-state-protocol-errors: 416
       in-state-mode-errors: 0
   in-state-sequence-errors: 0
           in-state-expired: 0
        in-state-mismatches: 0
           in-state-invalid: 49
     in-template-mismatches: 0
             in-no-policies: 0
          in-policy-blocked: 0
           in-policy-errors: 0
                 out-errors: 0
          out-bundle-errors: 0
    out-bundle-check-errors: 0
              out-no-states: 594195
  out-state-protocol-errors: 4242
      out-state-mode-errors: 0
  out-state-sequence-errors: 0
          out-state-expired: 4242
         out-policy-blocked: 0
            out-policy-dead: 0
          out-policy-errors: 0

All routerboards are flashed with latest firmware

[artem@noz01.r.severscan.ru] > /system package print 
Flags: X - disabled 
 #   NAME                     VERSION                     SCHEDULED              
 0   routeros-tile            6.33                                               
 1   system                   6.33                                               
 2 X wireless-cm2             6.33                                               
 3 X ipv6                     6.33                                               
 4   wireless-fp              6.33                                               
 5   hotspot                  6.33                                               
 6   dhcp                     6.33                                               
 7   mpls                     6.33                                               
 8   routing                  6.33                                               
 9   ppp                      6.33                                               
10   security                 6.33                                               
11   advanced-tools           6.33  

[artem@noz01.r.severscan.ru] > /system routerboard print 
       routerboard: yes
             model: CCR1009-8G-1S
     serial-number: *************
     firmware-type: tilegx
  current-firmware: 3.27
  upgrade-firmware: 3.27

@artemlight Run the same setup between directly connected CCRs and see if you still have the same problem

Sorry, but I do not have another CCR to connect it directly.

Solved for me by switching to AES-GCM algo.

Could you please post a sample config with GCM? There’s clearly more to it than just changing the encryption algorithm in the proposal. I wasn’t able to achieve a tunnel after updating the proposal. What else needs to be set?!

You just need to remove auth algorithm and set enc algorithm to gcm.

GRE+IPsec still slow:
http://forum.mikrotik.com/t/slow-speed-through-gre-ipsec-tunnel/128714/1