i40e problem with DAC

  1. Version 7.1beta1
  2. CHR x86
  3. Using the i40e driver, bring an interface up with a DAC already inserted and linked. The interface does not register “link ok”. The DAC must be removed and plugged again to get the port to link.
/interface ethernet
set [ find default-name=ether6 ] mtu=9100 name=ether6

I have noticed a problem using the i40e driver. My adapter is an Intel X710-DA4. With a DAC in the SFP+ port, the interface does not register “link ok” when it is brought up. The DAC must be removed and plugged again to get the port to link. I also have a LR module (Edimax MG-10GAS1) and interestingly this does not have any problem. The problem occurs both on boot of the router and also if I manually disable and enable the interface.

The port shows it is receiving bytes, but there is no transmit. No surprise because it thinks the link is down.

   #      NAME                 RX-BYTE        TX-BYTE  RX-PACKET  TX-PACKET  R  T  TX-QU  R  T
   7  RS  ether6                 3 972              0         66          0  0  0      0  0  0

I didn’t see this problem when using Linux.

Here is my DAC from ethtool:

	Identifier                                : 0x03 (SFP)
	Extended identifier                       : 0x04 (GBIC/SFP defined by 2-wire interface ID)
	Connector                                 : 0x21 (Copper pigtail)
	Transceiver codes                         : 0x00 0x00 0x00 0x00 0x00 0x04 0x00 0x00 0x00
	Transceiver type                          : Passive Cable
	Encoding                                  : 0x00 (unspecified)
	BR, Nominal                               : 10300MBd
	Rate identifier                           : 0x00 (unspecified)
	Length (SMF,km)                           : 0km
	Length (SMF)                              : 0m
	Length (50um)                             : 0m
	Length (62.5um)                           : 0m
	Length (Copper)                           : 1m
	Length (OM3)                              : 0m
	Passive Cu cmplnce.                       : 0x01 (SFF-8431 appendix E) [SFF-8472 rev10.4 only]
	Vendor name                               : CISCO-MOLEX
	Vendor OUI                                : 00:09:3a
	Vendor PN                                 : 74752-9519
	Vendor rev                                : 09
	Option values                             : 0x00 0x00
	BR margin, max                            : 0%
	BR margin, min                            : 0%
	Date code                                 : 120917

This problem is unchanged with 7.1beta2.

I have also tried a couple different SFP+ to copper transceivers. They also have this problem.

My theory is that there is a race condition with the link status. When using the LR fiber transceiver (the one which works fine) the link is a bit slower to come up. This additional time delay allows for the Mikrotik software to register the link up status. With the DAC and copper transceivers, the link comes up faster. This causes the Mikrotik software to miss the link transition.

This problem still exists with 7.1beta3.

This problem still exists with 7.1beta4.

Contact directly with MikroTik Support.

Regards.

I have the same problem in 7.7rc3, do you have any progress



I noticed that 7.8beta2 has the following updates:

*) x86 - added support for TP-Link TG-3468;
*) x86 - fixed SR-IOV support for Intel X710 series NIC;
*) x86 - improved Intel 500 series 10G SFP module support;
*) x86 - improved stability for Intel X550 series NIC with SR-IOV;

But unfortunately when I pass through the x710 network card to chr on the pve 7.3-3 platform, I still have the same problem.

I noticed that 7.8beta2 has the following updates:

*) x86 - added support for TP-Link TG-3468;
*) x86 - fixed SR-IOV support for Intel X710 series NIC;
*) x86 - improved Intel 500 series 10G SFP module support;
*) x86 - improved stability for Intel X550 series NIC with SR-IOV;

But unfortunately when I pass through the x710 network card to chr on the pve 7.3-3 platform, I still have the same problem.

XXV710 SPF0 failed to get link on power up. The interface must be disabled and enabled once.
Auto negotiation is always failed
Snipaste_2023-01-30_11-07-00.png
Snipaste_2023-01-30_11-05-33.png

IRQs are read-only and cannot be changed manually. RPS uses a lot of CPU resources
Snipaste_2023-01-30_11-12-23.png

any update on this?

This appears to still be an issue even in 7.13.3

Minisforum Ms-01 with the X710 has the same issue. Disabling / enabling the interface on the X86 ROS box does nothing, but disabling/enabling on the CRS326 will bring it back up. Works fine until the X86 box is rebooted again. Really frustrating. Was hoping to build a killer routerOS box out of this thing but it looks like it’s going to have to go to OpnSense or something.

Having the same issue with MS-01 and X710 on latest Proxmox and passthrough.
I have to disable/enable the interface on the switch or unplug and plug back the DAC.
Tried two DACs and two SPF+ RJ45 modules. No difference,
Using CRS312-4C+8XG as switch and Mikrotik XS+DA0001 DAC. Mikrotik S+RJ10 also not working.
Contacted Mikrotik support and they advised to upgrade to a 7.15.something beta. Same issue.
So, 7.14.2, still same issue.

I found a strange “workaround” for this issue.
I was pretty sure this was an issue with Mikrotik CHR, but it seems it has nothing to do with it.

Youtube short recording with the issue and “workaround”: https://www.youtube.com/watch?v=Dt97NDDTiU0

Short description of what happens in the video:

Machine was just started.
SSH to Proxmox host and do an ip a to show the interface is there.
Login to Proxmox web.
Start Mikrotik CHR VM.
Open VM’s console and check ethernet for link.
Result: no link.

Reboot machine.
SSH to Proxmox host and issue ethtool enp3s0f0 (with or without watch -n1 does not matter)
Login to Proxmox web.
Start Mikrotik CHR VM.
Open VM’s console and check ethernet for link.
Result: link.

I have repeated this multiple times to make sure it was not just a fluke.

Now, I have absolutely no clue why just issuing ethtool enp3s0f0 before starting the VM fixes this problem.
There is nothing changed on the nic, just displaying some info about it.
Issuing the command after the VM has been started will throw some error that it cannot find the interface and does not fix the issue.


How to automate it until a fix will be available (if ever):

Edit /etc/network/interfaces file and add pre-up /usr/sbin/ethtool like:

auto enp3s0f0
iface enp3s0f0 inet manual
        pre-up /usr/sbin/ethtool enp3s0f0

auto enp3s0f1
iface enp3s0f1 inet manual
        pre-up /usr/sbin/ethtool enp3s0f1

Need auto enp3s0f0 and auto enp3s0f1 or else the pre-up command is not executed.



Maybe anyone more versed can shed a light on this?

Forget all the stuff above.
Just setting RX and TX flow control on the CHR interface to auto fixed my issue. (off by default)
Strangely, off or auto is still off when doing a

/interface/ethernet/monitor 0

, but somehow it works. I guess setting to auto changes something else also.

i guess something´s up on with new ixge drivers intel running on the routerOS V7.1XX

I am running bare metal x86_64 on server BGP with Mellanox Connect-x4 CX4121 2 Ports and a Intel X710da4 mezzanine card with 4 portas sfp+

and on the other server PPPOE server with Mellanox 2 ports and intel i5xx with 2 ports sfp+ and 2 ports 1gb

and plugged intel port i5xx and on the other bgp side intel x710.. bought intel gbic 850nm and patch cord multimode 3mts..

on the PPPOE server side intel i5xx i get auto negotiation done 10gbps.. but on the Intel x710da4 port i see connected but autonetogiation failed 10gbps..

connected and traffic passing by around 2gbps.. but throus errors on the tx queue drops rx-errors..

i have changed queues to multi-ethernet with queue-size packets 5000 on both servers.. but still get also errors..

which is stange enough.. in the mean time BGP server Mellanox connect-x4 ports and on PPPOE server OLTs connected on Mellanox ports .. we don´t get any errors.. so its something on the Ixgbe drivers for shure..

even do i was told my intel nic cards has firmwares outdated.. so i wen to intel website and have upgraded both cards with latest 2024 igxbe firmwares.. plugged them back in the servers.. and still get the same errors.. around 5k to 200000 rx-errors every 24h… aldo traffic total troughput on the nics.. around 100000,00 GiB passed through.. i also received some chelsio quadcard T540 will make some testing on them with the latest v7 stable versions.. to see if i get any better



One other thing do..is on the ethernet interface… it does not detect the ports as sfp-plus.. it detects them all as ethernet, and therefore we cannot see diagonstics on the gbics..

even do i had 2 different models of gbics.. from intel… and tested them also and still throws rx-errors..

whats funny about it is we left a ping and trouceroute running on the BGP server to external domain IP and we did not loose any packets on ping.. and traceroute.. which makes me wonder where are those RX-errors being lost os discarded inside the local network for shure..

looks like its a problem with cpu 0 only used on all queues..

whilst other card nic vendores mellanox chelsio etc.. use auto cpu and split it across all cpus.