Mon Apr 05, 2021 2:45 am
I am testing this - I am seeing promising results but still some weird behaviour.
When running a TCP btest on a hardware router (1100ahx2) going across an MPLS network to a CHR, I'm seeing full rates for send and receive.
When I run the btest on the CHR against the same 1100ahx2 as last, I get full rates for receive but only 2-3Mbps for send. That is really bizarre to me - why does btest show full rates when on receive from CHR to hardware but not on send from CHR to hardware since it is in the same direction? It seems the behaviour changes depending on what side initiated the btest.
VPLS seems fine from hardware to CHR, I can get full rates across the VPLS tunnel regardless of which side initiates the btest.
Things get even stranger with two CHR's routed to each other through a CCR, ex. CHR1 <--> CCR <--> CHR2. When running TCP btest on CHR2 against CHR1, I am seeing 1Gbps receive and ~3Mbps send. When I run btest on CHR1 against CHR2, I also see 1Gbps receive and ~3Mbps send. Whether the traffic is going from CHR1 to CHR2 or vice versa doesn't seem to matter, the only thing that seems to matter is sending is always slow from the CHR that initiated the btest but receiving is always fast.
Also, I tried connecting two CHR's via VPLS tunnel. It doesn't seem to pass traffic other than neighbor discovery - adding IPs to both sides of the tunnel and trying to ping the far side gives no response and ARP does not complete (but the dynamic entry without the C appears in the arp table with the far side mac). So CHR to CHR VPLS does not seem to work at all.
I have tried changing the use-explicit-null setting on both CHR's and it does not change this behavior.
Update: I figured out the reason for the different send/receive behaviour - I have advertise filters on MPLS so that only the loopbacks are advertised for tunnel purposes. Btest doesn't allow specifying the source interface so depending on the direction it is initiated in, the traffic may or may not have labels applied. When it is a label pushed, it is slow, so the same problem described before seems to exist.
So it looks like it is *not* fixed, but maybe I still have to disable LRO and TSO in the ESXi host. I will try that next.