MPLS - massive throughput difference on CHR when using explicit nulls

tibobo · June 9, 2017, 5:23pm

Hi,

I’ve setup an MPLS between 2 routers : one is a CCR1009 the other one a CHR (with a PU license just for the record).
Both are running the latest bugfix release : 6.37.5.
The link between both routers has 200M bandwidth, <1ms latency.
MTU is not an issue on this link (link MTU is 1590, MPLS MTU set to 1508 on both sides, ESXi vswitch MTU set to 7500 on jumbo enabled ports, etc…)
OSPF is up and running, LDP runs smoothly too.

Lets consider the following topology :
A — CHR – 200M link – CCR1009 — B

When I run LDP without explicit nulls I can fill the 200M link in both directions.
When I run LDP with explicit nulls I get 200M from B to A also, but the throughput from A to B decreases to a few Mbps (fluctuating around 3 to 5 Mbps depending on the number of concurent sessions).
Obviously, I’ve taken care of using the same explicit-nulls setting on both routers.

My interpretation is that when the CHR has to add the MPLS label (only when using explicit-nulls, otherwise he only has to route the packet, not MPLS label it) the throughput goes down.
CPU on the CHR is really low (less than 2%), memory is amply available (90% free), no single core does any significant amount of work, esxi host nics are ok too, etc…

Have you already seen this problem ?
Were you able to fix it and if so, how ?

I did some research on the forum, internet, and the changelogs, but I could find anything similar.

Any idea would be welcome !

Have a nice week-end !

sten · June 19, 2017, 4:11pm

Without explicit nulls, are any labels actually applied when it’s so few hops in the link? Perhaps some fragmentation is occurring? You could try smaller packets while bandwidth testing or do a proper packet capture to see.

tibobo · June 19, 2017, 6:17pm

No labels applied without explicit nulls thanks to penultimate hop label popping.

MTU was the problem before I upgraded the vSwitch MTU to 7500. MTU is now correct (and tested).
Before I changed that MTU, full packets just didn’t go through.

Now they pass but slowly in one direction (when the CHR adds the MPLS label) and fast in the other one (when the CHR pops the label).
That’s what puzzles me.

I’ll try to see if I can setup a test environment and do proper packet capture.

Thanks for your help !

tazdan · June 23, 2017, 3:19am

I have exactly the same issue here, i have labbed it up and asked MT support for help and they say check MTU and driver for nic in vmware.. this is causing major issues for me progressing a project i have on the go. please post back if you find a result?

also for info i have R1 → R2 → R3 → R4. modifying the use of explicit null gets throughput up to 900Mbps on R1<->R3 however still stays low (almost 600bps) from R1 ↔ R4 - this is when it imposes an MPLS label so not sure it’s resolved until CHR is capable of imposing an MPLS label without reducing bandwith to almost zero!!

perhaps your masking the actual problem by setting the ‘explicit null’ tag to off - the actual problem is the imposing of MPLS labels as far as i can tell..

any help or suggestions gratefully appreciated

StubArea51 · June 23, 2017, 12:09pm

Which VNIC are you guys using in VMWARE. VMXNET3 or something else?

tibobo · June 23, 2017, 3:09pm

VMXNET3.
It works like a charm for everything else.

tibobo · June 25, 2017, 8:16am

Did you change the vSwitch MTU in vmware ? Did it change something for you ?

Of course, I will !

Which ones of R1,R2,R3,R4 are CHR routers ? Just R4 then ?
Are they all part off the MPLS ?

I had to put the customer in production (without explicit nulls, which will hinder some other projects at that customer, but allowed to stay within delays).
So I’ll have to build another setup to run further tests, but won’t have any time soon to do that.

tazdan · June 25, 2017, 11:18pm

Okay so:

I have set, checked, and verified, then had someone else verify the following:

MTU value on the physical host using Cisco UCSM - set to 9000 bytes
MTU value on ESXi host - set to 9000 bytes
MTU value on vSwitch in VMWare - set to 9000 bytes

All routers are CHR, all CHR routers are on the same physical host for testing. All routers have the same config and are all part of the MPLS.
setup is as follows

R1 ← vlan10 → R2
R2 ← vlan20 → R3
R3 ← vlan30 → R4

if i create a VPLS tunnel over the top of the MPLS setup from R1 to R4 it works flawlessly!

Happy to provide any more info needed
Cheers

tazdan · June 25, 2017, 11:20pm

I am using VMXNET3 - have tried also E1000E with no change in the result.

tibobo · June 28, 2017, 6:24pm

Just to be sure, what version of CHR are you using ?
I tried with the new bugfix (v6.38.7) and there’s still the same problem.

tazdan · June 28, 2017, 10:25pm

I have tried with versions 6.36, 6.39, and 6.39.2 - all the same result. still no response from Mikrotik support…

tibobo · June 29, 2017, 3:18pm

I think this will make me crazy…

I just tried on v6.39.2 with the following brand-new from scratch setup.
CHR1 ↔ CHR2 ↔ CHR3 ↔ CHR4 <-nompls-> BTEST4 (also CHR)

All links but CHR4 to BTEST4 are MPLS enabled.
All CHRx are configured with explicit-null and loop-detect, using a loopback (bridge) address as transport and LSR ID address.

When I run a bandwidth test from CHR2 to BTEST4 and do a “torch” on the CHR2 to CHR3 interface :

outgoing packet have a MAC proto of 8847 (MPLS)
incoming packets have MAC proto of 800 (ip)

But CHR2 is configured with explicit-nulls, so I should get incoming packets with MAC proto of 8847 too, or did I miss something ?
Packet sniffing shows the same results than torching, so I don’t think this is a torch bug.

Looking at the CHR3-CHR4 interface, I see outgoing and incoming 8847 packets.
So the CHR3 router is popping the label even if I ask for explicit nulls on CHR2.

If I uncheck “Use explicit nulls” on CHR2 I get the exact same behavior.
I should have two distinct behaviors if I check or uncheck explicit nulls no ?

I’ll try upgrading CHR2 and 3 to rc just to see if it changes anything and keep you posted

tibobo · June 29, 2017, 4:02pm

Upgrading to RC or going back to bugfix doesn’t change anything.
The problem doesn’t seem to be version related.

Key factor seem to be the protocol :
UDP works well wether you have explicit-nulls or not,
TCP seems to be affected.

tibobo · June 29, 2017, 5:01pm

More testing :
Bandwidth test from CHR4 to CHR1 (loopback), with explicit-nulls OFF (on all CHRx).
packet size 1200 bytes, direction both, tx speeds (local & remote) 5M
UDP : 5M
TCP with 2 sessions : less than 1kbps, CPU 50% showing half as unclassified in the profile tool (I opened a case for this one)

Bandwidth test from BTEST4 to CHR1 (loopback), with explicit-nulls OFF (on all CHRx).
packet size 1200 bytes, direction both, tx speeds (local & remote) 100M
UDP : 99.9 Mbps/99.3 Mbps
TCP with 20 sessions : 740.9 kbps/1793.1 kbps (CPU max out on both ends)

Bandwidth test from BEST4 to CCR beyond CHR1, with explicit-nulls OFF (on all CHRx).
packet size 1200 bytes, direction both, tx speeds (local & remote) 10M
UDP : 7.9 Mbps/6.9 Mbps
TCP with 20 sessions : 7.9 Mbps/6.9 Mbps (CPU max out on BTEST4)

Further testing :
/tool fetch keep-result=no url=“http://proof.ovh.net/files/1Gio.dat”
run from CHR1 , CHR2, CHR3 : several MBytes/sec
run from CHR4 : between 40 and 80 KBytes/sec

If I disable LDP on CHR1, the same command run on CHR4 get several MBytes/sec instantly, which corroborates our previous diagnostic : pushing MPLS labels on a CHR kills the performance.
And doing TCP btest on a CHR kills the CPU.

Maybe there is some special advanced setting that we need to give to vmware ?
What vmware version are you running ?

tazdan · June 29, 2017, 11:21pm

tibobo:

More testing :
Bandwidth test from CHR4 to CHR1 (loopback), with explicit-nulls OFF (on all CHRx).
packet size 1200 bytes, direction both, tx speeds (local & remote) 5M
UDP : 5M
TCP with 2 sessions : less than 1kbps, CPU 50% showing half as unclassified in the profile tool (I opened a case for this one)

Bandwidth test from BTEST4 to CHR1 (loopback), with explicit-nulls OFF (on all CHRx).
packet size 1200 bytes, direction both, tx speeds (local & remote) 100M
UDP : 99.9 Mbps/99.3 Mbps
TCP with 20 sessions : 740.9 kbps/1793.1 kbps (CPU max out on both ends)

Bandwidth test from BEST4 to CCR beyond CHR1, with explicit-nulls OFF (on all CHRx).
packet size 1200 bytes, direction both, tx speeds (local & remote) 10M
UDP : 7.9 Mbps/6.9 Mbps
TCP with 20 sessions : 7.9 Mbps/6.9 Mbps (CPU max out on BTEST4)

Further testing :
/tool fetch keep-result=no url=“http://proof.ovh.net/files/1Gio.dat”
run from CHR1 , CHR2, CHR3 : several MBytes/sec
run from CHR4 : between 40 and 80 KBytes/sec

If I disable LDP on CHR1, the same command run on CHR4 get several MBytes/sec instantly, which corroborates our previous diagnostic : pushing MPLS labels on a CHR kills the performance.
And doing TCP btest on a CHR kills the CPU.

Maybe there is some special advanced setting that we need to give to vmware ?
What vmware version are you running ?

I am running version 6.5 at one site and version 6.0 at another (i upgraded to 6.5 to make sure it wasn’t the version of VMWare causing the issue)… also this is the response from Mikrotik support:

“I have tested your exact setup, interface MTU 1500, MPLS MTU 1590 and VPLS MTU 1500
Default vswitch settings with MTU set to 9000.
We are using:
Supermicro SYS-5018D-FN8T
and ESXi-6.5.0-4564106-standard
I was able to push 900Mbps over VPLS tunnel as well as simpla label switching, so there are no problems with MPLS on CHRs or virtual interface drivers included in CHR. Problem is on your hardware and ESXi combination or vswitch settings. ESXi is known to be unstable/buggy.”

I’m happy to accept the problem is on my hardware/software but i need to know what to change in order to fix it!!!

tibobo · June 29, 2017, 11:25pm

I tried to remove vmware TSO and HRO and reboboted the ESXi host as suggested in some vmware docs.
No change.

I setup a traffic generator to test the throughput through CHR1-CHR2-CHR3-CHR4.
I get pretty good speeds going through this whole chain in UDP, but my TCP fetch is still ultra slow.

[admin@CHR_BTEST_4] /tool traffic-generator>  quick mbps=2000
SEQ    ID      TX-PACKET   TX-RATE     RX-PACKET   RX-RATE        RX-OOO   RX-BAD-CSUM   LOST-PACKET LOST-RATE LAT-MIN LAT-AVG LAT-MAX JITTER 
......
TOT    3       2 586 149 1999.9...     2 585 538 1999.4...                           0           611 472.5kbps 44us    396us   4.32ms  4.27ms 
TOT    4       2 586 156 1999.9...     2 584 995 1999.0...                           0         1 161 897.8kbps 49.1us  416us   4.22ms  4.17ms 
TOT    TOT     5 172 305   3.9Gbps     5 170 533   3.9Gbps                           0         1 772 1370.3... 44us    406us   4.32ms  4.27ms 

[admin@CHR_BTEST_4] /tool traffic-generator> /tool fetch keep-result=no url="http://proof.ovh.net/files/1Gio.dat"
      status: downloading
  downloaded: 1427KiB
       total: 1048576KiB
    duration: 20s

I’m out of ideas…
I guess I will just have to start sniffing on all involved interfaces at the same time to try to pinpoint where the packet(s) get lost…

tibobo · June 29, 2017, 11:32pm

tazdan:

tibobo:

More testing :
Bandwidth test from CHR4 to CHR1 (loopback), with explicit-nulls OFF (on all CHRx).
packet size 1200 bytes, direction both, tx speeds (local & remote) 5M
UDP : 5M
TCP with 2 sessions : less than 1kbps, CPU 50% showing half as unclassified in the profile tool (I opened a case for this one)

Bandwidth test from BTEST4 to CHR1 (loopback), with explicit-nulls OFF (on all CHRx).
packet size 1200 bytes, direction both, tx speeds (local & remote) 100M
UDP : 99.9 Mbps/99.3 Mbps
TCP with 20 sessions : 740.9 kbps/1793.1 kbps (CPU max out on both ends)

Bandwidth test from BEST4 to CCR beyond CHR1, with explicit-nulls OFF (on all CHRx).
packet size 1200 bytes, direction both, tx speeds (local & remote) 10M
UDP : 7.9 Mbps/6.9 Mbps
TCP with 20 sessions : 7.9 Mbps/6.9 Mbps (CPU max out on BTEST4)

Further testing :
/tool fetch keep-result=no url=“http://proof.ovh.net/files/1Gio.dat”
run from CHR1 , CHR2, CHR3 : several MBytes/sec
run from CHR4 : between 40 and 80 KBytes/sec

If I disable LDP on CHR1, the same command run on CHR4 get several MBytes/sec instantly, which corroborates our previous diagnostic : pushing MPLS labels on a CHR kills the performance.
And doing TCP btest on a CHR kills the CPU.

Maybe there is some special advanced setting that we need to give to vmware ?
What vmware version are you running ?

I am running version 6.5 at one site and version 6.0 at another (i upgraded to 6.5 to make sure it wasn’t the version of VMWare causing the issue)… also this is the response from Mikrotik support:

“I have tested your exact setup, interface MTU 1500, MPLS MTU 1590 and VPLS MTU 1500
Default vswitch settings with MTU set to 9000.
We are using:
Supermicro SYS-5018D-FN8T
and ESXi-6.5.0-4564106-standard
I was able to push 900Mbps over VPLS tunnel as well as simpla label switching, so there are no problems with MPLS on CHRs or virtual interface drivers included in CHR. Problem is on your hardware and ESXi combination or vswitch settings. ESXi is known to be unstable/buggy.”

I’m happy to accept the problem is on my hardware/software but i need to know what to change in order to fix it!!!

I am still running on ESXi 5.5. Upgrading was my best guess and last resort option (DC guy needed and downtime to be planned).
I agree that UDP works nice and VPLS is likely to be the same.
But plain TCP over MPLS just doesn’t work properly.
Guess I’ll have to upgrade then.

tibobo · June 29, 2017, 11:56pm

BTW if you want to try TSO/LRO settings :
https://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&externalId=2055140&sliceId=1&docTypeID=DT_KB_1_1&dialogID=512450636&stateId=0%200%20512464428
https://kb.vmware.com/selfservice/search.do?cmd=displayKC&docType=kc&docTypeID=DT_KB_1_1&externalId=1027511
Don’t forget to reboot the host afterwards.

tazdan · June 30, 2017, 12:30am

Thanks for all your help with this - will give it a go and see if it makes any difference.

tazdan · July 3, 2017, 11:22pm

Okay i have now tried this and rebooted the host and all the CHRs with no difference at all. I think this leaves either A. an advanced setting somewhere in VMWare that i don’t know about, or B) a setting in the UCS setup that VMWare is running on. are you running vmware on a cisco chassis? perhaps i could rule this out if your not and log a support case with VMWare?