GRE performance problems

Actually, it’s an everything performance problem – a problem that’s all over the net, but no answers… forgive the length of this – there’s a lot here, but I want to make sure I covered everything in this case…

I have a CHR Router under VMWare (7.1.3), attached to gigabit fiber. Plenty of RAM, CPU never rises above 30%. VMXNet3 drivers. On the other end, I’m using a new RB5009 router. At another location, the same CHR connects to the new HEX series. The only difference between the two paths – one has a last mile of Comcast, the other Spectrum.

  • A Mikrotik /tools/speedtest from the CHR on fiber to the HEX on Spectrum shows about TCP 120/35 down/up respectively for a 600/40 link. This is just raw transport - no tunnels

[list[A Mikrotik /tools/speedtest from the CHR on fiber to the RB5089 on Comcast shows TCP 100/7 down/up for a 1Gb/40M link. Again raw transport./list]

  • The CHR to Hex on Spectrum shows a UDP speedtest of 583 down/35 up


  • The CHR to RB5089 on Comcast shows UDP speedtest of 300/9 Mbs

So far, we can see Comcast is having issues – but of course, this is Cocmast, so I’m holding it wrong. There network is never broken. But even the Spectrum link seems odd.
Now, if we add GRE, no encruption

  • CHR to Hex on Spectrum gets about 150/35 GRE


  • CHR to 5009 on Comcast gets 80/5

I’ve adjusted the GRE MTU to 1280, TCP msg clamping, allow fastpath. Packet loss is around 1% max.

YET…

If I use any of the VPN services, NordVPN, etc. over the same routers (no GRE obviously), they get nearly link speed. So, I tried setting up Wireguard and OpenVPN, and they were even worse due to the encryption overhead! I am clearly missing something! I even tried SSTP thinking it couldn’t possibly get any worse – I was wrong.

On the wireguard issue – For a site-to-site Mikrotik wireguard tunnel, what should allowed IPs be set to for the following:

-----(Internet)—CHR(BGP)–199.181.204.0/24—CHR(WiregaurdR1)----tunnel-over-net–5089–199.182.204.128/26

I thought[/i AllowedIPs on the CHR should be 199.181.204.128/26 and on the 5009 it should be 0.0.0.0/0 right? That doesn’t actually work though…]

Re /tools/speedtest: when you run it, does any of CPUs go up to 100%? ROS banwidth tests are not very CPU friendly and most of times tge apos themselves are bottlenecks, not ROS or hardware in general. Speedtests shoukd properly be run by separate devices through routers.

Re wireguard: allowed-addresses on CHR should be 5009’s subnet (those IP addresses will be dst-address when entering tunel on CHR end and src-address when exiting tunnel on CHR end).
Your last sentence does send mixed signal as subnet address in text doesn’t correspond to subnet address in ASCII lineart diagram preceeding it. If you’re actually trying to use a part of CHR’s subnet on 5009 end, then you at least need to run proxy-arp on CHR so that other LAN devices on CHR know to send frames to CHR if they want to send them to 5009 LAN.

CPUs never go above 35% on the speedtests…

To be more specific:

The CHR on the Internet edge is sending IN 199.181.204.0/24 from our BGP edge router. The far branch end will consume a slice of that (199.181.204.128/26).
On the wireguard tunnel, I’ve allocated two IP addresses (192.168.88.1/30 and 192.168.88.2/30) So, I can look at this as two possible ways:

  • The wireguard AllowdIPs should simply be CHR=192.168.88.2 and 5009=192.168.88.1


  • The wireguard AllowedIPs on the CHR end should be 199.181.204.128/26 and the 5009=0.0.0.0/0

The second way is right. Normal routing sends the packets to the Wireguard process; the Wireguard process acts as another router where the allowed-address plays a role of dst-address and peer plays the role of gateway, except that these “routes” are evaluated top to bottom until first match rather than being chosen by the best 'longest prefix) match.

Regarding the suboptimal results on TCP over GRE, this may be related to packets arriving to the destination in shuffled order. You can sniff into a file, or stream the sniffed packets to a locally connected PC, at both ends - Wireshark will show you what happened as it will display the sequence numbers in the TCP headers even if they are encapsulated into GRE.

I did do that – wireshark shoes a lot of retransmits and sequence events – but I’m sure what I can do about it. The MTU is 1280, not 14xx, TCP MSG is set, what else is there to do?

I don’t say sniffing is a solution, it should just show whether the packets are actually lost or only shuffled. A small enough payload-side MTU makes sure that the transport packets do not need fragmentation, i.e. that way you avoid problems with lost second fragments that happen on some network paths. But if there are indeed multiple network paths between your endpoints, causing the packets to arrive in a shuffled order, you can do little about it, especially if it is not just GRE but also UDP-based tunnels that suffer from this. Since you seem to have some issue with Wireguard, have you tried L2TP between your endpoints? I mean, if you would be happy with GRE without any encryption, you can use L2TP as well, at least to test whether UDP will be shuffled too. Both L2TP and Wireguard use sequence numbers too so you can see the arrival order and eventual loss at the receiving side.

Of course feel free to post wour wireguard configuration for inspection.