GRE MTU issue

Hello anyone!

On one end, I have a CHR instance running on a cloud server at a provider, and on another end, I have a RB4011 running.

Currently, I announce via bgp my /24 via my CHR cloud router, and the general idea was to make a GRE tunnel to the RB4011, so I can use the /24 on devices behind my RB4011.


My current setup (Slightly modified to not leak any ips):

Public BGP subnet: 192.168.50.0/24
CHR instance local GRE IP: 10.0.0.1/30
RB4011 instance local GRE IP: 10.0.0.2/30

Both CHR and RB4011 can ping each other via their respective local IP. I’ve then created a static route from CHR instance that takes 192.168.50.0/24 via 10.0.0.2

On RB4011, I have some devices running behind a switch on Ether2. So on Ether2, I create 192.168.50.1/24. I assign whatever needed.

The outside internet can now ping 192.168.50.1, and every service under it! Great!

My issue arises with MTU/TCP MSS issues. Going to speedtest.net just loads forever, many pages show without styles, and some outright just disconnects.
This happends once the MTU is set at 1476, 1452 and 1400. TCP Clamping is enabled on both sides. MTU is set on both sides to the same value

Once I set the MTU to 1500 on both sides, it seems to work alright. The pages that didnt load before loads just fine now. However, as far as I understand, MTU on 1500 is not the right way to do it.


I tried on a test setup to use Wireguard instead on a HEX router, but I only got around 100mbits traffic via it (Hex 100% CPU), on a gigabit connection. I could do all Wireguard and just live with the slower speeds. Using wireguard resolves any MTU issues 100%, however, I currently just want it working with GRE.

I’ve tried different MTU’s, with and without fast path, and with and without TCP MSS clamping. Nothing resolves this issue.

192.1268.50.0/24 is just an example, I have a public routed /24 that should be put in place.

What could cause this behaviour, and why wont it work with MTU 1476 (Which Mikrotik also uses normally).

The connection between 10.0.0.1 and 10.0.0.2 is around 12ms.

CHR version: newest (7.0.12?)
RB4011 version: 6.49 LTS

If you need any other information please let me know. I hope to get this resolved! :slight_smile:

It’s tricky business. IMO allowing PMTUD to do its thing is most important – which means just allowing ICMP through a firewall. Since your in charge of the firewall, allowing PMTUD isn’t a problem, and the protocols (TCP, more advanced UDP things) will sort it out

e.g. since your going to publics IP and control the entire path, PMTUD should work fine to determine the MTU. Since 1500 works you [seemingly] don’t block ICMP for it to work. Why 1500 is should be “safe”.

Now you should be able to set a lower MTU (e.g. the actual ones), and theoretically that be best… but not always in practice since the needed MTU calculation get tricky fast – one mistake and your worse off than potential fragmentation. And any MSS adjustment is always a little risky, since it assuming no one else is also playing MSS tricks in the path & also if mss-adjust is too high, nothing else will fix that.

Hello!
Thank you ALOT for taking the time to explain and answer my question.

I’m happy to be confirmed that I am not turning insane! So in fact, a MTU of 1500 “might” be feaseable in my situtation.