VPN Connectivity : Very Degraded Throughput

Issue:
VPN connectivity from RouterOS devices is extremely degraded on throughput.
This does not appear to have anything to do with the hardware type used and appears to be an issue with the OS itself.

I have extended the troubleshooting beyond just Mikrotik OS on both ends,
and have created connections to pfSense and Cisco ASA firewalls at one end with the same results.
(ie. Mikrotik OS <> pfSense IPSEC VPN || Mikrotik OS <> Cisco ASA IPSEC VPN)

Fix:
Unknown

Troubleshooting / Connectivity Overview:

SITE A (Primary Data Center)

Routerboard: CCR-1036-8G-2S+
RouterOS: v6.37
Internet Connection:

  • Download: 10GB
  • Upload: 10GB

SITE B (Corporate Office)

Routerboard: RB3011UiAS
RouterOS: v6.37
Internet Connection:

  • Download: 100MB
  • Upload: 10MB

SITE C (Secondary Data Center)

Routerboard: x86 - Dual Core 3.0GHz
RouterOS: v6.37
Internet Connection:

  • Download: 1GB
  • Upload: 1GB


    VPN Connectivity Tests

SITE B <> SITE A
Tunnel Type: None - Direct Public to Public
Max Throughput: 100 Mbps / 10 Mbps

SITE B <> SITE A
Tunnel Type: IP-IP
Max Throughput: 32-35 Mbps

SITE B <> SITE A
Tunnel Type: GRE
Max Throughput: 32-35 Mbps

SITE B <> SITE A
Tunnel Type: IPSEC AES-256/SHA1
Max Throughput: 28 Mbps

SITE C <> SITE A
Tunnel Type: None - Direct Public to Public
Max Throughput: 1 Gbps / 1 Gbps

SITE C <> SITE A
Tunnel Type: IP-IP
Max Throughput: 32-35 Mbps

SITE C <> SITE A
Tunnel Type: GRE
Max Throughput: 32-35 Mbps

SITE C <> SITE A
Tunnel Type: IPSEC AES-256/SHA1
Max Throughput: 28 Mbps

SITE B <> SITE C
Tunnel Type: None - Direct Public to Public
Max Throughput: 100 Mbps / 10 Mbps

SITE B <> SITE C
Tunnel Type: IP-IP
Max Throughput: 32-35 Mbps

SITE B <> SITE C
Tunnel Type: GRE
Max Throughput: 32-35 Mbps

SITE B <> SITE C
Tunnel Type: IPSEC AES-256/SHA1
Max Throughput: 28 Mbps

Mikrotik Support Questions

Q: How do you perform these tests?

  • Mikrotik Bandwidth Test Tool
  • Public Speedtest Server (HTTPS)

Q: What protocol do you use?

Q: Packet fragmentation?

  • Under the interface configuration options, I have tried both “inherit” and “no”
    for the “don’t fragment” value.

Q: What kind of CPU usage do you see under System/Resource/CPU menu?

  • Less than 10% on either end.
  • 1 Core on sending router is maxed at 100% using btest process.
  • When doing a public speed test site (HTTP) no noticeable CPU spikes or limitations.

Q: What consumes most of the resources under Tool/Profile during these throughput tests?
Sending Router

  • process = btest
  • cpu = 1 core @ 100% while testing.
  • same process and cpu usage when testing public IP to public IP without a tunnel.

Receiving Router

  • process = btest
  • cpu = 0-3% not related to btest
  • same process and cpu usage when testing public IP to public IP without a tunnel.

Many posts here on that.

Running btest directly on the router(s) you’re testing will not give proper test results.

Could be an issue of the OS you are using to test it…
The setup you describe will re-order the sequence of packets through multicore processing.
Some OS are less capable of handling that than others.
It may also be more of a problem during testing/benchmarking than it is in real usage by many users.
There are many discussions about this topic on the forum, you could have investigated a bit more before posting.
Here is a summary/status posting, but please also search for others: http://forum.mikrotik.com/t/is-re-ordering-fixed-yet-with-ipsec-and-hardware-acceleration-updating-thread/101814/1

Thank you for your input, however, you are jumping to the conclusion that I did no searching of the forums for previous discussions. I did many hours of research before posting and verified all of my testing thoroughly.

This has nothing to do with the “OS” that is being used to perform the testing. When trying to do a file transfer from the primary site to the remote site, either via Windows SAMBA, Linux SCP, SFTP, or any other method of file transfer the throughput performance is a max of 32Mbps.

When I then testing with the BTEST tool integrated into the Mikrotik OS, I got the same results.
When I tested with the BTEST tool outside of the Mikrotik OS (on my Windows Workstation), I got the same results.

@pe1chl

The other point that I would like to direct you to – is that I’m not relying on IPSEC for the testing. This is effecting GRE and IP-IP tunnels as well WITHOUT encryption. 99.9% of the other topics are all driven around IPSEC encryption and the performance degradation within the hardware encryption.

You must be making a mistake somewhere. There is the known issue of packet re-ordering but there is
no issue with plain GRE or other simple tunneling on those powerful routers.
(it is not RB750s or similar)

Maybe you are relying on PMTU discovery on a platform where it does not work very well?
I have seen, for example, that the CDN used by MikroTIk for firmware updates suffers from PMTU inefficiency.
When the MSS in a TCP connection is not clamped to MTU, the server sends too-big packets, gets an
ICMP fragmentation required, reduces the segment size, but very soon it again increases the segment
size and lots of traffic is wasted on too large segments. When MSS is properly clamped, it works OK.

Is the PMTU discovery on the physical interface configuration or the GRE tunnel configuration?
Attached is a screenshot of my IP-IP tunnel currently in place, which only gets 32-35Mbps throughput.
ORL-FL Primary Router - WinBox v6.37 on CCR1036-8G-2.png

This is one of the worst responses I have ever received from a support group regarding their own product, and this proves that Mikrotik support either 1) did not read ANY of my support request information or 2) do not care to support their product.

Just based on the lack of concern about their customers, I will no longer be purchasing Mikrotik product or software.

On 10/18/2016 02:17 AM, Emils [MikroTik Support] wrote:

Hello,

This is an issue with the Bandwidth test which is single threaded process meaning the speed test result is limited to single core processing speed.

Best regards,
Emils


MikroTik.com

Come to the MUM conferences, registration open in Netherlands, United Kingdom, Brazil, Bolivia!

http://mum.mikrotik.com/

This just proves the stupidity in their response.

BTEST ON SITE B ROUTER – FROM SITE A ROUTER {PUBLIC NETWORK}
Screenshot from 2016-10-18 10-16-53.png
BTEST ON SITE B ROUTER – FROM SITE A ROUTER {IP-IP VPN TUNNEL}
Screenshot from 2016-10-18 10-23-22.png

What results you get when you run UDP bw test with lets say 1300byte packets?

By the numbers you get I would say that tunnel traffic is queued somewhere in providers network, can you connect routers directly and see if you still get the same low speed (just to eliminate possibility that ISP is queuing tunnel traffic)?

Based on the discussion that pe1chl and I started down, this does stink of an MTU issue, however, I’ve not had any luck with getting better performance when setting up a mangle rule on both ends of the tunnel to clamp the MSS at 1400. But, by setting the UDP packet size to 1300 this does reflect better throughput on the tunnel.
2016-10-18 10_56_52-admin@10.10.100.1 (fw-maitland-01.corp.atlantic.net) - WinBox v6.37 on RB3011UiA.png
2016-10-18 10_58_37-admin@10.10.100.1 (fw-maitland-01.corp.atlantic.net) - WinBox v6.37 on RB3011UiA.png

>ping 209.xxx.xxx.172 -f -l 1473
Pinging 209.xxx.xxx.172 with 1473 bytes of data:
Packet needs to be fragmented but DF set.

Ping statistics for 209.xxx.xxx.172:
    Packets: Sent = 1, Received = 0, Lost = 1 (100% loss),

>ping 209.xxx.xxx.172 -f -l 1472
Pinging 209.xxx.xxx.172 with 1472 bytes of data:
Reply from 209.xxx.xxx.172: bytes=1472 time=12ms TTL=54
Reply from 209.xxx.xxx.172: bytes=1472 time=13ms TTL=54
Reply from 209.xxx.xxx.172: bytes=1472 time=22ms TTL=54
Reply from 209.xxx.xxx.172: bytes=1472 time=13ms TTL=54

Ping statistics for 209.xxx.xxx.172:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 12ms, Maximum = 22ms, Average = 15ms

> ping 10.10.200.1 -f -l 1453
Pinging 10.10.200.1 with 1453 bytes of data:
Packet needs to be fragmented but DF set.

>ping 10.10.200.1 -f -l 1452
Pinging 10.10.200.1 with 1452 bytes of data:
Reply from 10.10.200.1: bytes=1452 time=14ms TTL=63
Reply from 10.10.200.1: bytes=1452 time=13ms TTL=63
Reply from 10.10.200.1: bytes=1452 time=12ms TTL=63

Ping statistics for 10.10.200.1:
    Packets: Sent = 3, Received = 3, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 12ms, Maximum = 14ms, Average = 13ms

Both ends of the tunnel are setup with a MSS clamp, still only getting 32-35Mbps throughput. I have tried from 1300 to the current 1452.

/ip firewall mangle
add action=change-mss chain=forward new-mss=1452 out-interface=ipip-MATILAND__ORL-FL passthrough=no protocol=tcp tcp-flags=syn tcp-mss=1453-65535
add action=change-mss chain=forward in-interface=ipip-MATILAND__ORL-FL new-mss=1452 passthrough=no protocol=tcp tcp-flags=syn tcp-mss=1453-65535

Got same issue. Problem is when transfer TCP for long distances (long fat network sutuation).
If RTT is low - there is no problem!
Testbed is:

Physical connections are copper (no sft modules are used)
Windows\Linux PC => |3011UiAS| => GRE\PPTP\L2TP(or any other tunnel) => |CCR1009-8G-1S-1S+|
Links speed
3011UiAS 100\100Mbit WAN
CCR1009-8G-1S-1S+ 500\500 WAN
Firmware and ROS(6.37.1) are latest, after upgrade they were reset to default

Btest results for: TCP send, 1 connection count

Trought tunnel (using treir private tunnels IPs):
3011 => CCR is 90+ Mbit |OK|
CCR => 3011 is 90+ Mbit |OK|
PC => CCR is 90+ Mbit |OK|
CCR => PC is 90+ Mbit |OK|

Then lets try to test it to Public-Mikrotik-Bandwidth-Test-Server, 207.32.195.2 (PUB). 200+ms latency
CCR => PUB is 60-70 Mbit |OK| and it’s linearly scalable with connections count
3011 =>tunnel=> CCR => PUB is 10-20 Mbit and graph is sawtoth
PC => 3011 => tunnel => CCR => PUB looks the same as previous

UDP flows 90+ in any test, only TCP issue
I’ve tried all sorts of MTU\MSS combinations.
I’ve tried all ways to handle the traffic to CCR from PC:All kinds of tunnels, Fasttrack, Mangle with mark routing, simple manual route in Routes, make EOIP tunnel to use CCR as gateway in PC config and so on…
I’ve tried to change CCR in datacenter for Linux sever, same low speed trought VPN\Tunnels from 3011 to hight latency sites

The only way to get some good results is to turn on SOCKS server on CCR and work using it with Proxifier.
In that case i get 70+ Mbit in last scenario.

Please, could you comment this strange situation?

By your tests and results, it’s now clear that either ISPs or a router on the datacenter is the one throttling TCP speeds.

Do a test between two routers connected to your ISP; if speeds are fine, then by elimination is the NOC who you must complain about.

No problem between connection between my CCR and 3011. Tcp frows between them with no problem.
When i connect my PC directly(physically to it’s ether port) to CCR in datacenter - it’ showing good results to PUB test server (HIGH RTT 200+ ms)
And if i connect trought somekind of tunnel to CCR - speed is bad to high RTT sites.
If i test trough tunnel to low RTT (<30ms) i see no difference in speed.
Other way is to get CCR speed is to use is as socks proxy trought tunnel. Then i get almost same speed from my pc to PUB test server

P.S. Looks like Mikrotik do all TCP magic(window size etc) according connection between only them (2 ends of tunnel), not connection to END host.
If the END host is far enought for much more “packets in flight”, then problems begin.
Long fat network situation.
And again: if i use SOCKS proxy in mikrotik, then i achive good speeds.
PC[socks]=>3011=>tunnel=>CCR[socks]=>PUB[200+ms rtt] 60-70 mbit |OK|
3011=>tunnel=>CCR=>PUB[200+ms rtt] 5-10 mbit
PC=>3011=>tunnel=>CCR=>PUB[200+ms rtt] 5-10 mbit

Just wondering, are you seeing any difference with this with the latest IPSec reorder fix?