Packet fragmentation - high ping

Hi, I have a question regarding packet fragmentation.
I have an L2 network that passes through another operator’s infrastructure. Unfortunately, there is a problem with increasing the MTU.
Through this network I transmit EoIP tunnels where I have to send traffic in 1900 packets. So I have MTU set in EoIP tunnel to 2000.
Of course, I understand that in this case there will be fragmentation of EoIP traffic, but when I perform a simple ping test in this tunnel, I already have high ping values (oddly some pings have short times) and the CPU use on devices is practically zero. So why is ping increasing so much and the resources (which could be used for fragmentation calculations) are not used ?

Thanks in advance for the clarification

It really sounds weird.
So first, when you ping between the EoIP endpoints outside the EoIP tunnel, is the round-trip time much better or the same like when pinging through the tunnel?
Second, do you specify any echo request packet size when pinging through the EoIP tunnel, or do you ping with the default small size, so packet fragmentation doesn’t even happen and still the round-trip time is higher than for pinging alongside the tunnel?
Third, if you run tool sniffer quick ip-protocol=icmp,gre at both ends while pinging through the tunnel, what are the delays between ICMP and GRE on departure and between GRE and ICMP on arrival? Eventually, if you specify an echo request packet size about 1700, what’s the time difference between the first and second fragment of the carrying GRE packet?

Hi, maybe I’ll start with the configuration.
MikroTik 1 (RB4011):
interface sfp +: mtu = 2100 l2mtu = 9586
interface vlan (on sfp +): mtu = 1500 l2mtu = 9582
Connection addressing for EoIP is raised on the vlan interface
EoIP interface: mtu = 2100 actual-mtu = 2100 l2mtu = 65535

MikroTik 2 (RB750Gr3):
interface ether1: mtu = 1500 l2mtu = 1596
interface vlan: mtu = 1500 l2mtu = 1592
Connection addressing for EoIP is raised on the vlan interface
mtu = 2100 actual-mtu = 2100 l2mtu = 65535

When performing a ping in the connection addressing - up to the limit of 1500 (without the use of do-not-fragment) there is calm time after 0-2 ms, while after resizing to 1510 (of course without using do-not-fragment) the time fluctuations are large:
0 10.0.0.2 1510 64 31ms
1 10.0.0.2 1510 64 27ms
2 10.0.0.2 1510 64 17ms
3 10.0.0.2 1510 64 7ms
4 10.0.0.2 1510 64 108ms
5 10.0.0.2 1510 64 1ms
6 10.0.0.2 1510 64 233ms
7 10.0.0.2 1510 64 227ms
8 10.0.0.2 1510 64 217ms

The same is true of the internal tunnel stress test. Ping 1400 (with do-not-fragment) goes clean. However, already 1500 (with do-not-fragment) which with EoIP headers will be already fragmented, has a similar behavior as ping outside the tunnel.

So in another words, it is not an EoIP problem, because bare ping with large fragments has the same issue. OK, so what about the sniffing? On every ping request sent, you should see two ICMP packets in the sniff, example:

[me@HyperV-CHR-1] > tool sniffer quick ip-protocol=icmp
INTERFACE    TIME    NUM DIR SRC-MAC           DST-MAC           VLAN   SRC-ADDRESS       DST-ADDRESS       PROTOCOL   SIZE CPU FP
ether1      6.935      1 ->  00:15:5D:FC:E9:01 00:15:5D:FC:E9:06        192.168.227.11    192.168.227.13    ip:icmp    1514   0 no
ether1      6.935      2 ->  00:15:5D:FC:E9:01 00:15:5D:FC:E9:06        192.168.227.11    192.168.227.13    ip:icmp     534   0 no
ether1      6.935      3 <-  00:15:5D:FC:E9:06 00:15:5D:FC:E9:01        192.168.227.13    192.168.227.11    ip:icmp    1514   0 no
ether1      6.935      4 <-  00:15:5D:FC:E9:06 00:15:5D:FC:E9:01        192.168.227.13    192.168.227.11    ip:icmp     534   0 no
ether1      7.946      5 ->  00:15:5D:FC:E9:01 00:15:5D:FC:E9:06        192.168.227.11    192.168.227.13    ip:icmp    1514   0 no
ether1      7.946      6 ->  00:15:5D:FC:E9:01 00:15:5D:FC:E9:06        192.168.227.11    192.168.227.13    ip:icmp     534   0 no
ether1      7.946      7 <-  00:15:5D:FC:E9:06 00:15:5D:FC:E9:01        192.168.227.13    192.168.227.11    ip:icmp    1514   0 no
ether1      7.946      8 <-  00:15:5D:FC:E9:06 00:15:5D:FC:E9:01        192.168.227.13    192.168.227.11    ip:icmp     534   0 no

As you can see, both fragments in both directions come within the same millisecond in my case (two CHRs on a Hyper-V). Test the same in your case to find out whether the issue is inside one of your Tiks or outside. You should see more rows per packet as there are more interfaces to pass through in your case (the VLAN and the physical interface).

When I send (using connection addressing) a ping from 4011 to 750Gr3, e.g. size = 1600(forcing negotiation), then in the sniffer on 750 it can be seen like this:
vlanMGMT 36.949 159 ← C4:AD:34:70:41:41 C4:AD:34:9F:49:3B 10.91.255.253 10.91.255.195 ip:icmp 1514 2 no
vlanMGMT 36.95 160 ← C4:AD:34:70:41:41 C4:AD:34:9F:49:3B 10.91.255.253 10.91.255.195 ip:icmp 134 2 no
vlanMGMT 36.95 161 → C4:AD:34:9F:49:3B C4:AD:34:70:41:41 10.91.255.195 10.91.255.253 ip:icmp 1514 2 no
vlanMGMT 36.95 162 → C4:AD:34:9F:49:3B C4:AD:34:70:41:41 10.91.255.195 10.91.255.253 ip:icmp 134 2 no
vlanMGMT 37.946 163 ← C4:AD:34:70:41:41 C4:AD:34:9F:49:3B 10.91.255.253 10.91.255.195 ip:icmp 1514 2 no
vlanMGMT 37.984 164 ← C4:AD:34:70:41:41 C4:AD:34:9F:49:3B 10.91.255.253 10.91.255.195 ip:icmp 134 2 no
vlanMGMT 37.984 165 → C4:AD:34:9F:49:3B C4:AD:34:70:41:41 10.91.255.195 10.91.255.253 ip:icmp 1514 2 no
vlanMGMT 37.984 166 → C4:AD:34:9F:49:3B C4:AD:34:70:41:41 10.91.255.195

However, when sending a ping from 750Gr3 to 4011 on the 4011 side it looks like this:
vl 6.836 9 ← C4:AD:34:9F:49:3B C4:AD:34:70:41:41 10.91.255.195 10.91.255.253 ip:icmp
vl 6.836 10 ← C4:AD:34:9F:49:3B C4:AD:34:70:41:41 10.91.255.195 10.91.255.253 ip:icmp
vl 6.836 11 → C4:AD:34:70:41:41 C4:AD:34:9F:49:3B 10.91.255.253 10.91.255.195 ip:icmp
vl 6.836 12 → C4:AD:34:70:41:41 C4:AD:34:9F:49:3B 10.91.255.253 10.91.255.195 ip:icmp
vl 7.839 13 ← C4:AD:34:9F:49:3B C4:AD:34:70:41:41 10.91.255.195 10.91.255.253 ip:icmp
vl 7.839 14 ← C4:AD:34:9F:49:3B C4:AD:34:70:41:41 10.91.255.195 10.91.255.253 ip:icmp
vl 7.839 15 → C4:AD:34:70:41:41 C4:AD:34:9F:49:3B 10.91.255.253 10.91.255.195 ip:icmp
vl 7.839 16 → C4:AD:34:70:41:41 C4:AD:34:9F:49:3B 10.91.255.253 10.91.255.195 ip:icmp
vl 8.841 17 ← C4:AD:34:9F:49:3B C4:AD:34:70:41:41 10.91.255.195 10.91.255.253 ip:icmp
vl 8.841 18 ← C4:AD:34:9F:49:3B C4:AD:34:70:41:41 10.91.255.195 10.91.255.253 ip:icmp
vl 8.841 19 → C4:AD:34:70:41:41 C4:AD:34:9F:49:3B 10.91.255.253 10.91.255.195 ip:icmp
vl 8.841 20 → C4:AD:34:70:41:41 C4:AD:34:9F:49:3B 10.91.255.253 10.91.255.195 ip:icmp

To me, the timestamps in red indicate that it is the RouterBoard’s network stack which delays the 2nd fragment on departure, so it is worth a support ticket:



If the “however” is related to different number of columns in the output, it’s because some columns are suppressed if the window is not wide enough.

If you had something else in mind, please elaborate.