MPLS - massive throughput difference on CHR when using explicit nulls

I built a test evironment last week : 4 brand-new CHR, with minimal config running, interconnected through separate vswitches with no physical interfaces (so no nic driver involved), no vlans, tested MTU up to 65500, avoided btest which is buggy with tcp on chr (already filed a support case for that), and tested with ftp fetch directly on the CHR instead.

Long story short with MPLS disabled I download the 1GiB file in 10 to 12 secs (running through the 3 router), with MPLS enabled it gets down to a few hundred kiB in the same timespan.
Interestingly enough, UDP isn’t affected and runs properly with or without MPLS. It’s just TCP that’s affected.

I filed a support case today with the whole setups and screenshots, etc…
Took me a day to pinpoint the exact problem, show it reproductibly and fill a a proper bug report.
I hope it doesn’t get thrown to trash like yours.

In the mean time I’ll do some testing with KVM, just to see if I can reproduce the problem and let you know.

Nope, no UCS, we are using standalone servers, with local storage, no vCenter or dvs or anything tricky.

OK, I can reproduce it with KVM on totally different hardware.
I’m puzzled !
Does nobody use CHR to push MPLS labels ?

apparently not.. MPLS seems to work UNLESS labels are applied! will keep trying to resolve and post if i find anything.

Some more testing this morning, on KVM since it was easier to do without side effects :
I changed the NIC types from virtio to e1000 just to make sure the problem isn’t with virtio.

I get exactly the same behavior.
So the problem doesn’t seem to be in the virtio driver.

We reproduced the issue, currently looks like problem is related to packet size, stay tuned for updates.

YAY!!! thanks Mikrotik (and of course Tibobo) !

Problem appears because hosts are reassembling packets into large buffers (up to 65000) to reduce CPU load, this will cause problems because MTUs are not respected.

On KVM please try to disable TSO and GSO.

On esxi try to disable TSO and LRO
https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2055140

Problems are not observed on hyper-v.

I already did it when running on ESXi5.5 U2 (see post #18).
I think tazdan did try it too, but I don’t now on which version

Is this behavior also observed when using vswitches without physical nics ?

Anyway, we recently upgraded to esxi 6.5 buid 5310538.
I wil give it a try and keep you posted.

TSO, GSO and GRO need to be disabled also on guests, so you will have to wait for new CHR build.

On ESXi host :

login as: root
Using keyboard-interactive authentication.
Password:
The time and date of this login have been sent to the system logs.

VMware offers supported, powerful system administration tools.  Please
see www.vmware.com/go/sysadmintools for details.

The ESXi Shell can be disabled by an administrative user. See the
vSphere Security documentation for more information.
[root@ESX-BGP2:~] vmware -v
VMware ESXi 6.5.0 build-5310538
[root@ESX-BGP2:~] esxcli system settings advanced list -o /Net/UseHwTSO
   Path: /Net/UseHwTSO
   Type: integer
   Int Value: 0
   Default Int Value: 1
   Min Value: 0
   Max Value: 1
   String Value:
   Default String Value:
   Valid Characters:
   Description: When non-zero, use pNIC HW TSO offload if available
[root@ESX-BGP2:~] esxcli system settings advanced list -o /Net/UseHwTSO6
   Path: /Net/UseHwTSO6
   Type: integer
   Int Value: 0
   Default Int Value: 1
   Min Value: 0
   Max Value: 1
   String Value:
   Default String Value:
   Valid Characters:
   Description: When non-zero, use pNIC HW IPv6 TSO offload if available
[root@ESX-BGP2:~] esxcli system settings advanced list -o /Net/Vmxnet2HwLRO
   Path: /Net/Vmxnet2HwLRO
   Type: integer
   Int Value: 0
   Default Int Value: 1
   Min Value: 0
   Max Value: 1
   String Value:
   Default String Value:
   Valid Characters:
   Description: Whether to perform HW LRO on pkts going to a LPD capable vmxnet2
[root@ESX-BGP2:~] esxcli system settings advanced list -o /Net/Vmxnet3HwLRO
   Path: /Net/Vmxnet3HwLRO
   Type: integer
   Int Value: 0
   Default Int Value: 1
   Min Value: 0
   Max Value: 1
   String Value:
   Default String Value:
   Valid Characters:
   Description: Whether to enable HW LRO on pkts going to a LPD capable vmxnet3
[root@ESX-BGP2:~] esxcli system settings advanced list -o /Net/TcpipDefLROEnable
d
   Path: /Net/TcpipDefLROEnabled
   Type: integer
   Int Value: 0
   Default Int Value: 1
   Min Value: 0
   Max Value: 1
   String Value:
   Default String Value:
   Valid Characters:
   Description: LRO enabled for TCP/IP
[root@ESX-BGP2:~] esxcli system settings advanced list -o /Net/Vmxnet2SwLRO
   Path: /Net/Vmxnet2SwLRO
   Type: integer
   Int Value: 0
   Default Int Value: 1
   Min Value: 0
   Max Value: 1
   String Value:
   Default String Value:
   Valid Characters:
   Description: Whether to perform SW LRO on pkts going to a LPD capable vmxnet2
[root@ESX-BGP2:~] esxcli system settings advanced list -o /Net/Vmxnet3SwLRO
   Path: /Net/Vmxnet3SwLRO
   Type: integer
   Int Value: 0
   Default Int Value: 1
   Min Value: 0
   Max Value: 1
   String Value:
   Default String Value:
   Valid Characters:
   Description: Whether to perform SW LRO on pkts going to a LPD capable vmxnet3
[root@ESX-BGP2:~]

On CHR_BTEST4 :

[admin@CHR_BTEST_4] > /tool fetch keep-result=no url="ftp://100.65.2.1/1Gio.dat" user=admin password=""
      status: downloading
  downloaded: 192KiB
    duration: 13s

[admin@CHR_BTEST_4] > /tool fetch keep-result=no url="ftp://100.65.0.3/1Gio.dat" user=admin password=""
      status: finished
  downloaded: 1048576KiB
    duration: 13s

[admin@CHR_BTEST_4] >

OK, that explains it.
Do you know it it ill make it in bugfix ?
And if not, will it be in the next current ?

At first it will be in RC version, then it is possible that change will be pushed to bugfix also.

Excellent - could you give us an idea of when this may be released? or are we able to get an advance copy perhaps for testing? and my vmware Version for testing is 6.5.0 Build 5318154.

cheers
Dan.

Any news ? Did I miss that in the changelogs ?

Thanks !

I followed up with Maris, and below is the response :slight_smile:

-----Original Message-----
From: Maris (MikroTik Support) [mailto:support@mikrotik.com]
Sent: Thursday, 14 September 2017 6:29 PM
To: Dan French
Subject: Re: [Ticket#2017060822000545] CHR MPLS and MTU

Hello,

Currently it is not yet fixed, unfortunately I cannot tell when exactly it will happen.

Best regards,
Maris

Thanks Tazdan.
I can’t understand what could take so long.
It really looks like a simple switch to change.
For now CHR is basically unusable for MPLS …

My thoughts exactly :slight_smile: - hopefully the wait won’t be too much longer

Great troubleshooting work guys! I’m anxious to see the results of this fix as we have been planning to use CHR for a number of MPLS applications.

any update on this MikroTIk?