S-RJ01 SFP Module in RB4011iGS+ flapping

We have a branch site with a leased 1GB Ethernet fibre uplink to our head office. The fibre enters the building in the basement, where it is converted to 1GB Ethernet by a ISP owned Cisco switch in a locked cabinet. The reason for this setup is reusing existing CAT6 cabling from the basement up to our floor. Tha cable lengths is about 70m and it was sucessfully checked with a professional Fluke cable tester.

As branch router, we use an RB4011iGS+ with an S-RJ01 SFP connected to the 1GB leased Ethernet uplink. The 1GB cable from the basement is planned to be updated to fibre later this year. So once this happend, we will just have to replace the S-RJ01 SFP module with an optical one.
Intallation and setup with VLANs, Queus and VoIP went smooth and everything was working. All was rainbows and unicorns until we got user complaints on sporadic short (3-5 seconds) interruptions in Teams/Zoom meetings and SIP VoIP.
The log of the RB4011 shows short link down/up events on the SFP(+) interface:

mar/04 11:48:11 interface,info sfp-sfpplus1 link down 
mar/04 11:48:12 interface,info sfp-sfpplus1 link up (speed 10M, half duplex) 
mar/04 11:48:14 interface,info sfp-sfpplus1 link down 
mar/04 11:48:15 interface,info sfp-sfpplus1 link up (speed 1G, full duplex) 
...
mar/04 14:12:17 interface,info sfp-sfpplus1 link down 
mar/04 14:12:18 interface,info sfp-sfpplus1 link up (speed 10M, half duplex) 
mar/04 14:12:20 interface,info sfp-sfpplus1 link down 
mar/04 14:12:21 interface,info sfp-sfpplus1 link up (speed 1G, full duplex) 
...
mar/04 14:26:14 interface,info sfp-sfpplus1 link down 
mar/04 14:26:15 interface,info sfp-sfpplus1 link up (speed 10M, half duplex) 
mar/04 14:26:17 interface,info sfp-sfpplus1 link down 
mar/04 14:26:18 interface,info sfp-sfpplus1 link up (speed 1G, full duplex)

The timing pattern seems random: The link sometimes is stable for many hours, at other times it flaps serveral times per hour. Another observation is the DHCP client on the SFP interface does not renew the WAN IP after the link comes back up. Either the link interruption is to short or the DHCP client does not see it.

Setup:
RB4011iGS+ running ROS 6.48.1
MikroTik S-RJ01 (supported for RB4011 according to the MikroTik SFP support table) 1GB Copper SFP module for WAN uplink to ISP Cisco switch

As a temporary solution, we changed the uplink connection from S-RJ01/SFP to the ether1 port of the RB4011. This works fine without issues and link flaps using the same cable to the basement. Is there any known issue with S-RJ01 in RB4011? Or shall we RMA the S-RJ01 module?

Known issue with most (if not all) RJ45 SFP modules is excessive heating due to high power needed to transmit data at 1Gbps through UTP cable. Long stretch (70 metres) adds to the problem. RB4011 being passively cooled device adds (big time) as well.

I suggest you to keep using the ether1 port until you start using optical SFP (those are less problematic as power needed is less).

Thanks for the reply.
I noticed the S-RJ01 SFP metal case part sticking out of thre RB4011 gets quite hot.
We will keep WAN on either1 until the uplink gets updated to fibre.

I also tried a link of 2 meters with 2 PowerBox Pro and S-RJ01 modules and port flapping is constant.

We replaced the MT S-RJ01 with a spare SwissGBIC SG-1G-T (OEM version of FS.com SFP-GB-GE-T https://www.fs.com/uk/products/75324.html).
The SG-1G-T is 1000BaseT only. After disabling auto neg and forcing 1G full duplex on sfp-sfpplus1 we got a stable link using the same 70m S/FTP cabling.

> interface ethernet monitor sfp-sfpplus1 once
                    name: sfp-sfpplus1
                  status: link-ok
        auto-negotiation: disabled
                    rate: 1Gbps
             full-duplex: yes
         tx-flow-control: no
         rx-flow-control: no
      sfp-module-present: yes
             sfp-rx-loss: no
            sfp-tx-fault: no
                sfp-type: SFP-or-SFP+
  sfp-link-length-copper: 100m
         sfp-vendor-name: SwissGBIC
  sfp-vendor-part-number: SG-1G-T
     sfp-vendor-revision: 2.0
       sfp-vendor-serial: EB2101XXXXX
  sfp-manufacturing-date: 21-01-20
         eeprom-checksum: good

When I find some time, I will test the MT S-RJ01 in a non-MT device to see if the problem is with the SFP itself or a S-RJ01 ↔ RB4011 compatibility issue.

Thanks for that.

Regards.

You could also try setting S-RJ01 to fixed speed to see if it behaves better …

We replaced the MT S-RJ01 with a spare SwissGBIC SG-1G-T (OEM version of FS.com SFP-GB-GE-T > https://www.fs.com/uk/products/75324.html> ).
The SG-1G-T is 1000BaseT only. After disabling auto neg and forcing 1G full duplex on sfp-sfpplus1 we got a stable link using the same 70m S/FTP cabling.

You could also try setting S-RJ01 to fixed speed to see if it behaves better …

The MT S-RJ01 does not support forcing link speed (https://wiki.mikrotik.com/wiki/MikroTik_SFP_module_compatibility_table#S-RJ01):

Use these modules only with auto-negotiation enabled, forced link speeds are not supported. They will negotiate to correct duplex and highest possible rate.

We anyway tried to force the S-RJ01 to 1000BasteTX FD in RB4011 interface settings. The result was the interface reported 1GB FD link OK, but did not move any data.

PS
The modules from fs.com are available under many different OEM brands for prices up to 5 times higher (SwissGBIC, StarTech, Ubiquiti, …). The only difference is the label and the Vendor/Type tags in the EEPROM. It is recommended to buy them directly from fs.com.

I’ve used without any port flapping problems something like this, but my RB4011 still runs on 6.46.7
Had to ditch the S-RJ01 SFP module when I had to use MTU>1500 over it for RFC 4638 (PPPoE MTU 1500). I couldn’t get it to work properly, the PPPoE session MRU was working properly with 1500 but MTU wouldn’t go higher than 1492.
Same module tested in RB2011, same ISP, same config, same RouterOS version, works fine. Guess there is something wrong with it when on RB4011.

I have the same thing happening on fiber and disabling and enabling the SFP let it reconnect at a MTU of 1500. I have made a script in the PPPoE connection to check after few seconds if the MTU is 1500, and if not restarts the SPF. Works fine for me.

In PPP profiles on-up:

{
:delay 4s
/interface
:if (([pppoe-client monitor pppoe-ikev2 as-value once]->"mtu") < 1500) do={
disable sfp-sfpplus1
:delay 50ms
enable  sfp-sfpplus1
:log warning "PPPoE MTU lower than 1500 so the SFP port is restarted"} 
}

That I didn’t try, will do when I get back there, thank you!!

According to our tests the RB4011 SFP+ port only works reliable with fixed-rate SFP modules and disabled autoneg in ROS interface settings.

Technically, copper 1000-Base-TX can not work without autoneg. 1GB autoneg includes essential things like clock master/slave role determination and link training. So most likely behind the scenes our “1000-Base-TX only” SFP is actually doing autoneg advertising only 1000-Base-TX FD. While exposing itself as a fixed speed device towards the SFP host.


Edit:
After some experiments in the Lab and some googling I came to the following conclusion:

  1. If the RB4011 SFP+ ethernet mode is forced to 1000-Base-TX with autoneg disabled, the SFP slot is put to SERDES mode (raw 1GB Ethernet 8B/10B).
    SERDES mode is intended for 1000-Base-FX fibre SFPs. Fixed mode copper SFPs use SERDES mode towards the MAC and 1000-Base-TX FD only autoneg towards the copper wire. For the SFP host they appear as 1000-Base-FX optical link and hence work in any slot originaly built for optical 1000-BaseFX modules at the price of not supporting 100MB nor 10MB.

  2. If the RB4011 SFP+ ethernet mode is set to 1000/100/10 autoneg, the SFP interface changes to SGMII mode.
    SGMII is electricaly compatible with SERDES, but has differrent 8GB/10GB out-of-band signaling. SGMII modules do support autoneg with 1000/100/10 MB, but require SFP slots with SGMII support. The technical bitrate for SGMII is always 1GB, for 100MB every byte is repeated 10x, for 10MB 100x.

  3. There are some copper SFPs available supporting SERDES and SGMII automatically adapting to the SFP host mode

It seems the RB4011 SFP+ has link/MTU issues in SGMII mode (autoneg enabled), but works stable with 1000GB only SFPs in 1GB SERDES mode (autoneg disabled).

So I’ve tried a disable and enable of sfp-sfpplus1 after a reboot and S-RJ01 now works properly on this RB4011, I’m getting 1500 MRU/MTU on the PPPoE Client :slight_smile: thank you for the hint! I’ll try to write to support about this too. Or .. maybe they are reading this?
I decided to go with a scripted disable/enable scheduled at startup with a delay of 15 seconds for now, I won’t add something like that in profile on-up because if my ISP decides to cut me out of RFC4638 this won’t play out well → endless loop of disable/enable and this is a pretty remote site for me.
I have to make the startup script a little better, to check every second or so if the interface is up and issue a disable/enable instead of guessing the timing. I’ll work on this.
Thanks again!

So I’ve tried a disable and enable of sfp-sfpplus1 after a reboot and S-RJ01 now works properly on this RB4011, I’m getting 1500 MRU/MTU on the PPPoE Client :slight_smile:

This was finally solved with 6.47.10 (long-term) and 6.48.3 (stable), see change logs.
We could remove all our scripts fiddling with SFP MTU and enabling/disabling and SFPs are running stable (RB4011 running 6.48.3).

This was a long-standing issue with RB4011 SFP(+) modules and I’m wondering why it took so long to be fixed.