Possible L2 MTU issues with EoIP Tunnel and Bridge

Hello all:

I am using a RBwAPGR-5HacD2HnD for my home internet router. Its fed on LTE. To get around all the LTE double-nat junk, I’m also using wireguard to VPN everything from the router out to a pfsense box I own with a full public IP. Life is great, everything works awesome, all websites work flawlessly, etc., etc.

Now I want to throw into the mix an HD HomeRun TV tuner located off site. Unfortunately, these tuners are notorious for many (most?) applications not working when there’s an L3 hop in the mix; they’re designed and intended to be on the same LAN as the device using it. I had a spare old Mikrotik router hanging around, so I figured I’d set up an EoIP tunnel between them (not worry about additional encryption – the device is actually located on a “secure” physical lan off the pfsense box that’s handling my wireguard, so all the parts of the network that might matter for encryption are already encrypted by the wireguard link). I built my EoIP tunnel, added it to my bridge with ether1 in it at the home end, and built a bridge for it on the remote mikrotik with just the ports for the HDHR in it.

I set it up, and the HDHR, plugged into the far mikrotik gets an IP from my home mikrotik dhcp server, and I can brows the web pages using an IP from my lan..looks promising, but not quite fully working (autodetection didn’t work with Jellyfin, and I still get extremely regular pauses in video when viewing 1080p, but works great with less resolutions. The total bandwidth usage is consistent with 720p and 1080p I am NOT hitting bandwidth limits.)

The weird thing is some web pages on the internet at large are no longer working at home. Three specific examples I’ve pinpointed are bankofamerica.com, yahoo.com, and duckduckgo.com. If I remove the EoIP interface from the bridge at home, all starts working just fine, and so far, the rest of the internet I’ve tested works fine (some sites seem to load more slowly, but I don’t have empirical data to support that), although other occupants at home said “the internet is being wierd” and weren’t able to be more specific. Once I removed the EoIP from the bridge, all went back to normal immediately.

My first thought is to suspect MTU issues on this. This feels like large packets not making it through (especially how 1080p video steams coming through the EoIP tunnel hiccup so regularly that makes me think its a packet that just exceeded the MTU and was dropped). I’ve fought similar issues before, and this “feels” like an MTU issue, but I could be completely wrong. In any case, I did a bunch of MTU-focused troubleshooting.

I noticed that the EoIP interface reports an actual MTU of 1378. I noticed on the bridge, it has a few different MTUs reported: Actual MTU (which changes to 1378 when the EoIP tunnel is bridged, resets to 1500 when it isn’t, and an L2 MTU of 1598. I tried manually specifying my MTU to match the smaller 1378, but it didn’t make a difference.

I did some more testing, with manual MTU discovery using ping and DF, and discovered that the internet traffic is in fact limited to a 1420 MTU due to the wireguard. Interestingly enough, even though the EoIP link showed a 1378 MTU, I could ping the HDHR across the link with an effective MTU of 1500! Also, when the bridge actual MTU was 1378 (due to the EoIP link bridged in), I saw no change in MTU to 1.1.1.1.

So now, I’m confused about what’s going on. I really did not expect to be able to ping the HDHR with an effective MTU of 1500…That is going through the EoIP AND wireguard link! Especially when going through the wireguard alone I have the 1420 MTU. I would also assume that websites (using TCP) would not set the DF frame and thus would be fragmented when necessary and just work.

I’m not really sure where to go from here. It seems like there should be a solution…And my test cases also seem to conflict. Any ideas how to address this? I know in the past I set up a VPN system with ubnt gear that used ipsec and I think openvpn in tap mode with some MSS clamping and something else to force fragmentation when needed, and through that I was able to get a multicast and UDP through correctly that was definitely suffering from MTU-related packet drops. I’m not sure how to do that here, or if that is “the right answer”.

Thanks for your help!

Hi JimKusz,

i have the same issue. I use Wireguard because of NAT/WAN and no Public IPs on the Remote Sites.
Ontop of WG i use EoIP Tunnel. If i add the EoIP Interfaces to the Bridge, i can’t reach some Websites.

Did you found already a solution? I didn’t test different MTUs yet.

Regards,
Jan

Did you ping with DF set when you got 1500 bytes through?

I assume that the Eoip bridge is correctly loaded with the IP’s of the two wireguard ends. There is no additional routing for the wireguard tunnel. The EoIP is inside the local bridge. The bridge has its IP gateway and DHCP server, and the DNS server is the same gateway.
You also have a Nat rule

/ip firewall nat
add action=redirect chain=dstnat comment=“DNS LOCAL” dst-port=53 protocol=udp src-address=YOURLAN to-ports=53

And in IP/DNS the external DNS.
On the other hand, it is good practice to use Leases on the dhcp server.
You can review those points, maybe something was not right.
Regarding the services in the Box, Jellyfin, TVHeadend, etc. They can be very configurable regarding internal or external networks. But I wouldn’t trust autodetections.

You can set both the wireguard and eoip mtu’s to 1500, it becomes less efficient as the larger packets are fragmented, but they get
rebuilt at the endpoint. Perhaps set the eoip mtu to 1500 and leave the wireguard one at 1420 (1420 assumes no pppoe).

An alternative, you can use a mangle rule to do mss clamping (clamp to pmtu), but only good for tcp/ip.

When talking about non-standard MTU values remember that eoip carries L2 traffic … where there’s no PMTU nor fragmentation. If eoip interface is “enslaved” as bridge port with goal of providing end-to-end L2 connectivity, then one has to set eoip interface MTU to same value as the rest of LAN has (most likely 1500) … and accept the inefficiency it brings.

Things are different if eoip is used as interface (e.g. for routing), then reduction of MTU is a valid option. It’s still a niusance though and I’d rather accept a (say) 10% performance drop than face potential issues due to smaller MTU size (which may or may not happen and are thus PITA to troubleshoot).