Hey guys, I know this one will probably be simple to answer so here goes…
I have 2 sites linked via an EOIP/IPSEC tunnel. Each sites internet connection is 1GB/1GB however i’ve never been able to get more than about 180mb/sec variable throughput between the sites which at peak times causes a bottle neck and users complain of slow internet. In order to address that i’ve blocked DHCP across the tunnel and have setup a separate DHCP service at the remote office to point their gateway at their local internet connection to speed internet traffic throughput. Implemented and thought all was well until I start getting certain websites not loading. Google fine, but speedtest.net for example only loaded text without any media, many other websites i tested also just failed to connect. I looked down the interfaces list and noticed the bridge that the EOIP connection port is in has dropped it’s MTU to 1388, the same as the EOIP connection itself.
The PPoA connection for our ISP is 1480 so is this what would be causing the loading issues and what would the solution be? Any help appreciated.
Cheers up front
The root cause of your issue is most likely a broken PMTUD (Path MTU Discovery) on the path between your router(s) and those servers in the internet. The MTU of a bridge always gets automatically adjusted to the one of that member port of the bridge whose MTU is the lowest, to prevent the router itself from attempting to send larger packets.
So for any connection that passes through this bridge, either PMTUD must work properly, or you have to use a bandaid to substitute the PMTUD. The bandaid is a pair of rules in chain forward of /ip firewall mangle, such as
If 1360 turns out to be still too high, start from 1280 and try finding the largest working value iteratively.
The above solution is just one of the possible ones. A much better one would be to use an L3 tunnel instead, but I figure you have strong reasons to use an L2 one.
For the price of doubling the packet rate, you can either force the MTU of the EoIP tunnel to 1500, which means that the EoIP transport packets will be fragmented and thus there is a risk of losing them as some networks drop non-first fragments, or use L2TP with MLPPP and BCP, which chops already the payload packets during encapsulation so the transport ones do not get fragmented.
The first approach (the two mangle rules) cannot actually break anything unless you force a lower MSS than 1280, so you can safely try that even during the rush hours.
Heya wiseroute, have had the tunnel running for 2 years and it’s fine. It’s just getting remote clients on the same bridge to get their internet traffic directly from their own router that i’m having these webpage loading issues. I’m pretty sure it’s MTU related.
As far as I’m aware devices in the same L2 network are unlikely to do PMTUD for link-local packets, they expect the network to support whatever the interface MTU is set to so you should set the EoIP MTU to 1500.
Obviously this will lead to fragmentation as 1500 + EoIP encapsulation + IPsec encapsulation > WAN MTU, I’ve found it is better to ensure the fragmentation isn’t seen across the internet as ISPs sometimes don’t handle fragments very well.
L3 routing and MSS clamping would be a better solution unless you really need L2 connectivity between sites.
I have the same issue, my EoIP interface is on a bridge with multiple other ports. I’m using V7.11.2. If I lower the MTU of the EoIP interface down from 1500 to 1458, traffic flows much better across the tunnel but cause issues for other people on other interfaces on the bridge. I can’t add those mangle rules to lower the MTU just on the EoIP interface as it gives an error that you must use the bridge interface it is slave to which would be back to causing issues for everyone. We never seemed to have the MTU issue with the EoIP tunnel till we moved to 7.x firmware, for some reason the older firmware allowed it all to work. Any ideas?
You can use those mangle rules using a slightly more complicated way. Use a bridge filter rule to assign a packet mark (for simplicity, EoIP) to frames entering the bridge via the EoIP interface, and use mangle/prerouting rules to add source IP addresses of packets bearing this packet mark to an address list (also named EoIP), with address-list-timeout like 1h or so. Then replace the in-interface and out-interface matching in the change-mss rules by src-address-list=EoIP and dst-address-list=EoIP, respectively.
Thanks I’ll give that a try. Is there any other way to lower the MTU below 1500 on an EoIP interface and not affect everyone else on the bridge even though they aren’t even sending traffic over the EoIP interface. Both ends of the EoIP link are on NAT’d firewalls to the LAN bridges.
There isn’t. From the point of view of the IP stack, which deals with the MTU, the bridge (as in “the virtual interface of the router connected to the virtual port of the virtual switch”) is the IP interface. So if the bridge interface has an MTU of 1500 and you give it a packet that has more than the EoIP’s MTU, the bridge (as in “the virtual switch”) will forward that packet to the EoIP port but EoIP will drop it without sending an ICMP notification back because bridges (as in “virtual switches”) do not generate ICMP.
So the MTU of the bridge interface automatically decreases to the MTU of the member port whose MTU is the smallest of all; if you let EoIP set it automatically, it takes the MTU of the interface through which it sends the transport packets and subtracts the size of the headers from it. So the bridge accommodates to that, making the router send “fragmentation needed” if the packet to be routed via the bridge interface is bigger than that.
If you force the EoIP MTU to 1500, it simply means that EoIP will split the transport packets, carrying payload packets larger than the limit, into fragments, and even if the resulting doubling of the packet rate doesn’t bother you, you may hit the issue of some networks randomly dropping non-first fragments of packets. A workaround to this is to use L2TP in BCP (bridge) mode with MLPPP, which splits the payload packets before loading them into the transport ones, and the transport ones are never fragmented. The issue of double packet rate remains, though.
Thanks again for the info, is there a better more modern way to connect remote sites across the internet instead of EoIP where MTU sizing is not an issue and may also take less CPU resources and have better throughput?
The best way is to avoid L2 tunneling completely, so the MTU issues are handled at routing level as they should. Without restricting the payload MTU, any tunneling (even an L3 one) always causes (almost) doubling of packet rate simply because any kind of tunneling adds overhead to the payload packets when forming the transport ones, so the excess bytes overflow to a fragment. “Almost” because some payload packets are smaller then the tunnel MTU.
So if you do need the L2 tunneling across the internet, L2TP with BCP and MLPPP is still your best current bet as its transport packets are not fragmented. As for CPU load, it doesn’t matter much whether you use EoIP, L2TP, or VxLAN. And if you use encryption, the process of encapsulating L2 frames into L3 transport packets is a negligible part of the CPU load compared to the encrytion and decryption.
I was able to get l2tp set up with BCP pretty easily. The issue I’m having is pinging any hosts on the other end of tunnel from either end are returns DUP pings non-stop so I’m assuming traffic packets are duplicated as well. Speed through the l2tp is much better than the EoIP link though. When I put the EoIP link back online, pings are normal again. On the L2TP i have max mtu set to 1450, max mru set to 1450, and mrru set to 1600 to force MLPPP. I’ve checked ARP tables and host tables on either end and everything looks fine. I’ve searched the internet and can’t figure out why there are duplicate packets when using ping across the l2TP link. Could it be because the l2tp doesn’t have a mac address assigned to it when you view it under interfaces? Any ideas?
I would guess you have somehow managed to engage the L3 and L2 tunnel in parallel, but it’s just a wild guess. I’d have to see the complete configuration exports from both ends of the tunnel as well as the addresses you used for the ping (both source and destination). Don’t forget to anonymize the exports while maintaining integrity of address prefixes if any public or global addresses are used in the configuration.
Addresses were just 192.168.xxx.xxx that we are pinging on our internal lan that this link is connecting across the internet and between 2 sites on natd firewalls.
There are no addresses specified. Only things specified in the secret is the name, password, and profile. Should there be a mac address associated with this interface somewhere? Also should arp=proxy-arp be set on the bridges and the admin-mac address at both ends? I see some people use that in their setups but it was never specified in the the mum presentaions for l2tp with bcp.