Community discussions

MikroTik App
 
itscmo
just joined
Topic Author
Posts: 15
Joined: Thu Jul 27, 2017 7:46 pm

CCR 2004 All SFP Crash

Wed Jun 30, 2021 11:22 pm

I'm having an issue with a CCR2004 where all of the SFP's go completely offline. The router itself seems to still be online, but all SFP's crash. I have not been able to get a console cable or ethernet into the eth0 port connected as it's in production. I'm going to replace the CCR2004 itself tomorrow so that I can run memtest on it.

I'm aware of viewtopic.php?t=164578 but my issue seems to be slightly different.

There are 6x 10Gb SFPs and 3x RJ SFPs in this unit. I have more than two dozen CCR2004 units in production, and haven't seen this behavior (yet) with the other ones that are deployed. Has anyone else run into this issue with all SFP's going offline?

Edit: perhaps this update in 6.49beta36 is relevant:
sfp - improved link stability for 10G, 25G and 40G modules on CRS309, CRS312, CRS326-24S+2Q+ CRS354 and CCR2004 devices;
Last edited by itscmo on Wed Jun 30, 2021 11:44 pm, edited 1 time in total.
 
Cablenut9
Long time Member
Long time Member
Posts: 542
Joined: Fri Jan 08, 2021 5:30 am

Re: CCR 2004 All SFP Crash

Wed Jun 30, 2021 11:23 pm

Give us the result of this: /export hide-sensitive
 
itscmo
just joined
Topic Author
Posts: 15
Joined: Thu Jul 27, 2017 7:46 pm

Re: CCR 2004 All SFP Crash

Thu Jul 01, 2021 12:11 am

/export hide-sensitive
(10233 messages not shown)
jun/30/2021 15:23:48 system,error,critical System rebooted because of ping watchdog timeout
jun/30/2021 15:29:39 system,error,critical System rebooted because of ping watchdog timeout
jun/30/2021 15:43:10 system,error,critical System rebooted because of ping watchdog timeout
jun/30/2021 15:46:24 system,error,critical router rebooted because some critical program crashed
[username@ROUTERBOARD] > /export hide-sensitive
# jun/30/2021 16:47:37 by RouterOS 6.48.3
# software id = XXXX-XXXX
#
# model = CCR2004-1G-12S+2XS
# serial number = D4F10C33XXXX
/interface gre
add allow-fast-path=no !keepalive name=gre-helpdesk remote-address=X.X.158.28
/interface vlan
add interface=sfp-sfpplus1 name=jc-tt-bv-vlan30 vlan-id=30
/interface wireless security-profiles
set [ find default=yes ] supplicant-identity=MikroTik
/ip pool
add name=pool1 ranges=192.168.16.10-192.168.16.20
add name=pool2 ranges=192.168.17.10-192.168.17.20
add name=pool3 ranges=192.168.18.10-192.168.18.20
/ip dhcp-server
add address-pool=pool1 disabled=no interface=sfp-sfpplus1 name=server1
add address-pool=pool2 disabled=no interface=sfp-sfpplus3 name=server2
add address-pool=pool3 disabled=no interface=sfp-sfpplus5 name=server3
/snmp community
set [ find default=yes ] addresses=X.X.56.0/22,X.X.156.0/22
/ip neighbor discovery-settings
set discover-interface-list=!dynamic
/ip address
add address=192.168.88.1/24 comment=defconf interface=ether1 network=192.168.88.0
add address=192.168.16.1/24 interface=sfp-sfpplus1 network=192.168.16.0
add address=192.168.17.1/24 interface=sfp-sfpplus3 network=192.168.17.0
add address=192.168.18.1/24 interface=sfp-sfpplus5 network=192.168.18.0
add address=192.168.1.50/24 interface=sfp-sfpplus1 network=192.168.1.0
add address=X.X.56.150/30 disabled=yes interface=sfp-sfpplus3 network=X.X.56.148
add address=X.X.56.161/30 interface=sfp-sfpplus5 network=X.X.56.160
add address=192.168.3.1/24 interface=sfp-sfpplus3 network=192.168.3.0
add address=X.X.6.34/30 interface=gre-helpdesk network=X.X.6.32
add address=X.X.104.5/24 interface=sfp-sfpplus1 network=X.X.104.0
add address=192.168.64.250/24 interface=sfp-sfpplus1 network=192.168.64.0
add address=X.X.159.153/30 interface=jc-tt-bv-vlan30 network=X.X.159.152
add address=192.168.19.1/24 interface=jc-tt-bv-vlan30 network=192.168.19.0
/ip dhcp-server network
add address=192.168.16.0/24 gateway=192.168.16.1
add address=192.168.17.0/24 gateway=192.168.17.1
add address=192.168.18.0/24 gateway=192.168.18.1
/ip dns
set servers=1.1.1.1,75.75.75.75
/ip firewall nat
add action=masquerade chain=srcnat dst-address=192.168.1.0/24
add action=masquerade chain=srcnat dst-address=!10.0.0.0/8 src-address=192.168.16.0/21
add action=dst-nat chain=dstnat dst-port=9976 in-interface=sfp-sfpplus5 protocol=tcp to-addresses=192.168.16.13 to-ports=443
add action=dst-nat chain=dstnat dst-port=9975 in-interface=sfp-sfpplus5 protocol=tcp to-addresses=192.168.16.5 to-ports=80
add action=masquerade chain=srcnat dst-address=192.168.18.0/24
add action=masquerade chain=srcnat dst-address=192.168.3.0/24
add action=masquerade chain=srcnat src-address=192.168.3.0/24
add action=masquerade chain=srcnat dst-address=X.X.104.11
add action=masquerade chain=srcnat dst-address=192.168.64.0/24
add action=dst-nat chain=dstnat dst-port=9965 protocol=tcp to-addresses=192.168.16.20 to-ports=443
add action=dst-nat chain=dstnat dst-port=9966 protocol=tcp to-addresses=192.168.1.3 to-ports=443
add action=dst-nat chain=dstnat dst-port=9967 protocol=tcp to-addresses=192.168.64.5 to-ports=80
add action=dst-nat chain=dstnat dst-port=9968 protocol=tcp to-addresses=192.168.16.5 to-ports=80
/ip route
add distance=1 gateway=X.X.56.162
/ip service
set telnet disabled=yes
set ftp disabled=yes
set api disabled=yes
set winbox disabled=yes
set api-ssl disabled=yes
/system clock
set time-zone-name=America/New_York
/system identity
set name=ROUTERBOARD
/system ntp client
set enabled=yes server-dns-names=pool.ntp.org
/system watchdog
set watch-address=8.8.8.8
[username@ROUTERBOARD] >

Interestingly to me, we found it with no SFP lights active. So the soft reboots that the watchdog timer was apparently trying to do (based on the logs), were not enough to reset the SFP chip.
IMG_3290.jpeg

Also interestingly, it was super hot when it rebooted and came online. Seems like the fan controller never activated the fans when it was crashed. This temp started to go down as soon as the fans spooled up after a hard power cycle:
temp-after-reboot.PNG
You do not have the required permissions to view the files attached to this post.
 
User avatar
rextended
Forum Guru
Forum Guru
Posts: 11967
Joined: Tue Feb 25, 2014 12:49 pm
Location: Italy
Contact:

Re: CCR 2004 All SFP Crash

Thu Jul 01, 2021 11:22 am

Suggestion: sfp1-sfp2-noting-sfp4-sfp5-nothing-sfp7-sfp8-nothing--sfp10-nothing-sfp12


Too much reboot will kill you:
One Reboot each 5 minutes to block them all...
10233+ reboot...
(10233 messages not shown)
jun/30/2021 15:23:48 system,error,critical System rebooted because of ping watchdog timeout
jun/30/2021 15:29:39 system,error,critical System rebooted because of ping watchdog timeout
jun/30/2021 15:43:10 system,error,critical System rebooted because of ping watchdog timeout
Are you sure the board can ping 8.8.8.8?
Chose another IP for watchdog, or some rule block the watchdog ping...
 
itscmo
just joined
Topic Author
Posts: 15
Joined: Thu Jul 27, 2017 7:46 pm

Re: CCR 2004 All SFP Crash

Thu Jul 01, 2021 3:22 pm

Thanks for your reply. The router can ping that address if the SFP ports are online. Are you suggesting that I increase the delay between reboots?
 
User avatar
rextended
Forum Guru
Forum Guru
Posts: 11967
Joined: Tue Feb 25, 2014 12:49 pm
Location: Italy
Contact:

Re: CCR 2004 All SFP Crash

Thu Jul 01, 2021 3:35 pm

Are you sure the board can ping 8.8.8.8?

I understand SPF can "crash" but exactly every 5 minutes???

No, I suggest the distance between sfp modules for increase vent.

All modules are mikrotik-maded?
 
itscmo
just joined
Topic Author
Posts: 15
Joined: Thu Jul 27, 2017 7:46 pm

Re: CCR 2004 All SFP Crash

Thu Jul 01, 2021 4:53 pm

What's happening is this:
-Router is fine
-All SFP modules go offline (lights go dark)
-The watchdog starts to not be able to ping beyond this router
-The watchdog soft reboots the router
-The SFP modules do not come back online
-We dispatch someone who physically unplugs the router and plugs it back in
-The SFP modules come back online and the router seems OK
-After a period of time (day to weeks), this process repeats

So yes, we can not ping 8.8.8.8 when the SFP modules crash. That's the point and that's why I pointed the watchdog at that address. I want it rebooted when the SFP's crash, so I pointed the watchdog at an address that requires the SFP's to be online in order to attempt to have the router self-resolve the problem.

The SFP+ modules that are fiber are made by 10GTek. I have something like 75 of them deployed in Mikrotik routers with no other issues so far.
 
Cablenut9
Long time Member
Long time Member
Posts: 542
Joined: Fri Jan 08, 2021 5:30 am

Re: CCR 2004 All SFP Crash

Thu Jul 01, 2021 5:17 pm

Contact the Big Mik's support because this sounds like a hardware problem.
 
itscmo
just joined
Topic Author
Posts: 15
Joined: Thu Jul 27, 2017 7:46 pm

Re: CCR 2004 All SFP Crash

Thu Jul 01, 2021 8:27 pm

Contact the Big Mik's support because this sounds like a hardware problem.
On the one hand, I was afraid this was going to be a PITA hardware issue. On the other hand, I'm glad it's not something systemic :).

Thanks for this feedback, I do have a case open with MT. I wanted to post here too though in case anyone else runs into this kind of thing.

Who is online

Users browsing this forum: No registered users and 16 guests