CCR2004 with SFP ONU module: fail both PSU

Dears,

Few weeks ago (exactly 26th of April) i’ve ordered CCR2004-16G-2S+.
I setup this router on the begining of May, upgraded to stable version of 7.2.3 release which was working fine so far.
2 days ago i’ve added new SFP ONU, few hours later i’ve notice that on my units LED Fail is ON and i’m hearing strange noise with a FANs, LED Green USER is OFF (previously was after boot always ON - but might be im wrong)
Symptoms:
Fan speed grow up, and slow down, and step is repeating in a period of couple of seconds.
In the log i found that both PSU, fail in the same moment, however UNIT is working fine.
I thought it is a to fresh upgrade because previously it was everything fine and might be, because was working under UPS, so i decided to switch to direct Power Instead.
Additionally I’ve decided to change the software to version 7.1.5 and i have downgraded the firmware as well, rebooted, router unit started to work fine.
After few hours, issue come up again, every reboot is resolving issue for a couple of hours.

In the SYSTEM - HEALTH when the issue apears, UNIT is reporting Temperature and Fan Speed out of Range.

Currently unit is working fine (but i’ve rebooted it 2 hours ago)

Health are following:

board-temperature1 42C
board-temperature2 46C
cpu-temperature 57C
fan1-speed 3570 RPM
fan2-speed 3540 RPM
psu1-state OK
psu2-state OK
sfp-temperature: 47C
temperature: 47C

Is anybody have this issue before ?
Do you know what is the Root Cause ?

Im really dissapointed that units after 3 weeks of usage fail, which is quite new units. For me it was expensive router, this is my home router im using only SFP+ slots with ONU and SFP DAC Cable to uplink to CRS328.

Router is closed in to the Closed Rack Cabinet - but it is constantly ventilated, so the temperature is not higher then 22.8C inside of closed cabinet which is 1.8C higher then the outside of a Cabinet.

So, as i mentioned in my previous post.
I have rebooted router at 20:15 and was working fine without fail until 06:01

Below you can see images when the SYSTEM Health reports was OK

Moment of Fail

System Health reports after a fail

Temperature inside of the Cabinet

It seems like a firmware or board failure and I’d say you have to contact support@mikrotik.com with the problem. Let them comment on it.

The failure definitely looks like problem of health data collection (not PSU or fan hardware itself), but it is annoying as fan control does seem to be affected and you don’t have any health data available (some HW could well die for real but you wouldn’t know).

@jagrok
put on forum the results of:

/system routerboard print
/system package print
/system resource print
/system healt print

but censore the serial-number

I already created support case and sent to them supout file.

Please find output:

[myuser@MikroTik] > /system routerboard print
       routerboard: yes
             model: CCR2004-16G-2S+
     serial-number: HC....N
     firmware-type: al64
  factory-firmware: 7.1.2
  current-firmware: 7.1.5
  upgrade-firmware: 7.1.5
[myuser@MikroTik] > /system package print
Columns: NAME, VERSION
# NAME      VERSION
0 routeros  7.1.5  
[myuser@MikroTik] > /system resource print
                   uptime: 15h2m7s
                  version: 7.1.5 (stable)
               build-time: Mar/22/2022 11:03:33
         factory-software: 7.1.2
              free-memory: 3807.2MiB
             total-memory: 4032.0MiB
                cpu-count: 4
                 cpu-load: 0%
           free-hdd-space: 104.6MiB
          total-hdd-space: 129.0MiB
  write-sect-since-reboot: 10796
         write-sect-total: 734138
               bad-blocks: 0%
        architecture-name: arm64
               board-name: CCR2004-16G-2S+
                 platform: MikroTik
[myuser@MikroTik] > /system healt print
Columns: NAME, VALUE, TYPE
#  NAME                VALUE       TYPE
0  temperature         -274        C   
1  cpu-temperature     50          C   
2  sfp-temperature     -274        C   
3  fan1-speed          4294967295  RPM 
4  fan2-speed          4294967295  RPM 
5  board-temperature1  0           C   
6  board-temperature2  0           C   
7  psu1-state          fail            
8  psu2-state          fail            
[myuser@MikroTik] >

uh …
CPU 0% ???

If you can, as a last resort, export .rsc, netinstall with 7.2.3 without applying defaults, and also update RouterBOOT.
Warning: you will lose everything that export.rsc does not save (also fix if the problem is in the internal configuration database)

I have one suspection about my issue.
On a Friday i received ONU from FS.COM (GPON-ONU-34-20BI) and i have feeling that this unit is causing issue here.
I did netinstall and imieditally when i put ONU SFP unit after few seconds my CCR entered into fail mode, reporting the same fail.
So i swap to ONT and reconfigured to use ETH1 and removed SFP ONU to confirm my investigation.
Previously on this port it were working fine with SFP, so might be incompatibility issue.

Let’s see.
I will post my findings.

13 hours left from last restart of a router.
Nothing new happend, it looks like some incomatibility issue of ONU with Mikrotik ?
My next goal would be to swap interfaces and connect 10G connection to sfpplus1 and ONU in to sfplus2 to see if it something change.

I have checked the compatibility pages but i`m not sure if Mikrotik are listing other company sticks - probably not as they are so many on the market.

https://wiki.mikrotik.com/wiki/MikroTik_wired_interface_compatibility

I don’t think they even bother doing any compatibility tests with other vendors’ modules, as you noted there are way too many of them.
It does seem that throwing random SFP modules at Mikrotik fails more often than not. So the best way is to check this forum for any reports on compatibility (either positive or negative) and go with that information. And compatibility with non-trivial SFP modules (e.g. xPON ONU modules, xDSL modules, etc.) is even more problematic than with “normal” SFP modules (non-trivial modules usually offer/require management interface but ROS doesn’t provide any hooks for it).

Mikrotik Confirmed following:

“We have discovered that this problem is caused by SFP ONU module in the CCR2004-16G-2S+.
When the SFP ONU module initializes, usually, it is ~30-60 seconds when the CCR2004 does not receive necessary data on the I2C bus which results in interfering with system health operation.
After SFP ONU module initialization the system health should normalize.”

Currently they are working on some Software Fix probably to prevent this type of issues.

It’s probably running over the same I2C-bus as the sensors, and the ONU corrupts the data.

Hello,

I have the same issue, is there a solution?

hello guys
i have exactly the same issue , after few minutes following the insertion of the module, fan go crazy and a fail red light is on…
all interfaces are going up and down…the router is then totally not stable
an HW issue between the ONU module and the CCR 2004 for sure

I am also having the same issue on the CCR2004, I have bought the same FS sfp ONU, when I plug it in to the CCR after sometimes the fans blow on and off and the service fault red LED is on. I check on the system health page that all sensors are not reading. Every other things are still working fine likes internet, VPN, firewalls. Only the fans are annoying.

I not sure what is the problems, currently the simple fix is to reboot the CCR =.=

camp here for better solution.

I’m also having this same issue and have opened a support ticket.

The GPON module was working fine in my RB3011 but causes the health failure in my ccr2004 on the latest stable and testing releases.

The supout shows that the I2C bus is likely locked up as there are nothing but time outs waiting for the bus to be ready. Not sure if it relates to the single byte read request here:
http://forum.mikrotik.com/t/request-support-i2c-sfp-sfp-secuential-singlebyte-reads-to-obtain-transceiver-details-from-eeprom/142454/1

----------------------------------------------
root@SFP:/home/ONTUSER# sfp_i2c -h
usage: sfp_i2c [options]
 -h, --help
                help screen
 -v, --version
                print version
 -a, --activate-monitor
                activate monitoring daemon
 -d, --default
                set default values (CAUTION: will overwrite all previous settings)
                        Requires an argument "yes" for safety!
 -i, --index
                index selection
 -l, --length
                length in bytes of area to access
 -s, --set-string
                set string, index selects:
                        0: vendor name
                        1: vendor part number
                        2: vendor revision
                        3: serial number
                        4: date code
                        5: vendor data
                        6: equipment id of ONU2-G OMCI ME
                        7: vendor id of ONU-G OMCI ME
                        8: GPON Serial Number
                        9: LOID
                        10: Logic password
                        11: PLOAM password
 -0, --save-a0-low-128
                byte wide read/write access
 -1, --byte-access (default)
                word wide read/write access
 -2, --word-access
                dword wide read/write access
 -4, --dword-access
                read location
 -r, --read-location
                write location
 -w, --write-location
                show threshold values: (multiple times)
                        0: high alarms
                        1: low alarms
                        2: high warnings
                        3: low warnings
                        4: show raw (hex)values of the above selected
 -t, --show-thresholds
                mask
 -m, --mask
                set EEPROM 0 base I2C address

need to set to -0 or -1?

This issue still persists on ROS 7.9rc2, i also opened a case to Mikrotik Support.
Other people on forum get same problem: http://forum.mikrotik.com/t/ccr2004-1g-12s-2xs-psu1-and-psu2-enter-in-fail/157781/1
Anyone found a workaround ?

CCR2004-1G-12S+2XS | ROS 7.14 beta3 (2023/12/25)

Also fail with that GPON Sticks, It seems that the authorities have forgotten their promises.

  • Huawei | MA5671A
  • Nokia (aka: Alcatellucent) | G-010-S-A
  • Alcatellucent | G-010-S-P

I use GPON-ONU-34-20BI from FS.com on a CCR2004 without any of the issues described regarding the health stats.
I do get link flapping once every 7-10 days though.

My CCR2004-1G-12S+2XS has been working fine for over a year.
I encountered this strange behaviour recently and today it happened for the second time.
So I started searching and found this tread…

The FS ONU causing the issue seams likely since I plugged in one of those just a couple weeks ago.
But the fiber is not even active yet and the interface is marked with “Rx Loss”.