I’ve noticed today something very peculiar when working with MTUs and jumbo frames.
There’s something weird happening when I change the MTU from 4136 bytes to 4137 bytes.
I first saw this because I had a HTTP sensor in PRTG and I saw huge delay increases when I started using big frames in the RB3011 (2 ms VS 200 ms).
I thought it could be normal because bigger frames might take longer to create and send, but then after a decrease it looked like an ON/OFF issue.
So I started testing with different MTU and finally found out that there’s that 2 to 200 ms increase when I change the MTU between the said values (4136 bytes to 4137).
L2 MTU does not interfere with this since I maxed out and the variance stays the same.
Also Wireshark agrees with this increase, though in smaller amounts (1-2ms vs 50-60ms).
Also, this variance is noted between the first “GET / HTTP/1.1” and the second packet “HTTP/1.1 200 OK”.
Meaning that it is the delay between a GET method and the first MT reply.
Any ideas on what can cause this?
How can I further investigate the issue?
I assume the L2MTU is still set to 8156 on the ten copper gigabit interfaces, and 8158 on SFP1 (which the Wiki states is the values for a 3011 routerboard)
Are you using any kind of IPSec or tunnel interfaces?
Are there other devices in the path between your 3011 and PRTG?
If so, what is their maximum MTU at the relevant layers? (L2MTU for switches / L3MTU for router interfaces)
Does this delay also happen on similar connections which are simply being forwarded through the 3011 at layer 3?
If so, do the two devices show the delay when directly connected to each other?
What is the MTU on your PRTG box?
Have you confirmed that its ethernet interface and interface driver are capable of supporting jumbo frames of this size?
Make sure that the entire path MTU is >= 4137 on all layer 3 devices + any additional overhead your path may require, such as from 802.1q vlan headers (scroll down in the article linked below to the section “Simple Examples”) and if it is, and if the problem only happens when polling 3011 routers, then it could just be some glitch in ROS for ARM on 3011…
EDIT:
Another question: are you filtering ICMP at any point in your path? If so, and it is an MTU-related issue, then something could be interfering with the PMTU discovery process…
Honestly the copper are all 1500MTU/1598L2MTU.
SFP is the only one in the local-bridge for testing purposes. This way I can also guarantee that the bridge always has the same MTU’s as the SFP interface.
No IPSec, tunnel or VLANs at all.
There is a Dlink DGS-1510-28 switch, that supports 9k frames. It allows flow correctly between 2 hosts with 9014 byte frames.
Switch has configuration for MRU per interface. As such it has 9018 for the PRTG host, and 8176 for RB3011. (MRU should be +4 bytes for FCS)
This delay happens in another host machine, which I can reproduce via Wireshark.
Could this be some issue with the Intel NIC, since both hosts have the same Intel CT Gigabit cards?
I’ve been trying to reproduce this between the hosts but it doesn’t seem to happen.
MTU on the PRTG host is 9014.
And yes, the switch has been configured to drop all packets that go a single byte over the capabilities of each device. So they all are capable of handling frames of this size.
The PMTU appears to be working correctly.
I confirmed this because the MSS is being acknowledged exactly equal to the MTU at the moment minus 40 bytes (MSS=4096).
I’ve introduced a rule in the RB3011 for Clamping MSS to PMTU.
I can confirm via Wireshark that the packets won’t exceed the MTU because they both agree on the MSS.
I know that this probably won’t present any real-life issue, although as a future network engineer I feel driven to test these things as deep as possible.
P.S.: Here is the exact place where the issue happens!
MTU 4136:
Well, the first rule of thumb about MTU is that everything’s MTU setting should match when on the same network.
If anything can be different, in my experience it doesn’t hurt anything for a switch to have a higher L2MTU than the devices attached to it will use. (obviously the other way around is bad)
In theory, there isn’t any harm in having varying sizes of L3MTU within your network so long as ICMP is not filtered, so that mtu-exceeded messages can be delivered for PMTU discovery to work properly.
In my experience, though, there are several organizations whose infrastructure breaks PMTU (Akami, a big content delivery network is one such organization) pmtud- Our PPPoE-using customers would report that certain websites would not work. Adding clamp-mss to our access routers' PPPoE interfaces fixed this for them. Basically, our router would send the "packet too large" message to the web server, but their network would drop the message somewhere and the http server would keep trying to send 1500-byte packets when our router was screaming at them to reduce to 1496... This mismatch was occurring because the customers' networks were 1500-byte MTU internally, so the initial SYN/ACK/ACK process negotiated an MSS of 1452 (or whatever the MSS is for 1500-mtu - it's late) which was 4 bytes too large for their PPPoE-based wan connections. All the websites that worked for them were ones that didn't drop the ICMP messages from our router about the MTU. (or didn't set DF bit on their IP packets)
In the end, I’d say try it using the same MTU on both the client machine (PRTG host) and RB3011 and if possible, connect the two directly to see if it continues. (I doubt the switch is the issue, thoguh, since it supports up to ~9k jumbo frames).
See if you get the same behavior with other devices in various combinations, and if the only host that does it is the 3011, then there’s probably a minor glitch in the code for that routerOS. I would then test between two unaffected hosts via the 3011 as a layer 3 hop to see whether it’s just the internal web service process on the 3011 or something in its forwarding engine. Almost certainly it’s the former and not the latter.
Yes, I’m trying to figure out where the problem comes from.
Although it doesn’t appear to be from MTk. All packets are the same size in either situation, and no ICMP are trying to be sent, so it doesn’t appear to be a MTU mismatch.
Here I managed to capture both situations with MTk MTU=4137 & MTU=4136 respectively.
What I see is that when I set 4137, there’s one more ACK right after a 4150 byte frame which is that one that introduces the 200ms delay.
When I have 4136, it appears to have a delayed ACK only after 4150+3123 byte frames and there’s no delay at all.
How weird.
Edit:
I’m now going to try and replicate the same between two hosts without the RB.
I was able to track this issue to the hosts themselves.
It appears that they were taking too long to answer due to a config called DelayedAckTimeout.
On Win8.1 one can see the settings over powershell: Get-NetTCPSettings
There are several setting profiles, and it was using one that had a delayed Ack of 50ms.
One can alter via Set-NetTCPSettings.
On Win Server 2008R2 I am still trying to figure out how to solve this, because I can’t find how to configure those parameters, but I’m pretty sure it also has a timeout = 200ms!
My apologies since this had nothing to do with MTk, but with OS only!
In any case, it stays here for future reference.
Edit:
For WS 2008R2, I needed to add a DWORD registry to the following location:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\Interfaces
Find the desired interface in that folder and then add the DWORD “TcpAckFrequency” (Values: 1 = No DelayedAck | 2-255 = Number of Acks)
I can’t still find a way to change the delay value.
Cool. This is a great illustration of why troubleshooting requires carefully changing one thing at a time to see which variable the problem follows.
Glad you got to the bottom of it.