I’ve got two cloud core routers on the other side of the atlantic, and one of them is dropping out for 8.6 seconds at a time on average once an hour (but nowhere near metronomic – it happened 3 times in the last 60 minutes)
log print shows
12:44:04 interface,info sfp1 link down
12:44:04 interface,info ether1-toAX1 link down
12:44:04 interface,info ether2-AS2 link down
12:44:04 interface,info ether3-AS4b link down
12:44:04 interface,info ether4-AS3b link down
12:44:04 interface,info ether8-hubengswitch link down
12:44:04 interface,info ether11-BMI link down
12:44:04 interface,info ether12-internet link down
followed by
12:44:06 interface,info ether1-toAX1 link up (speed 1G, full duplex)
12:44:06 interface,info ether11-BMI link up (speed 1G, full duplex)
12:44:06 interface,info ether12-internet link up (speed 1G, full duplex)
12:44:07 interface,info ether2-AS2 link up (speed 1G, full duplex)
12:44:07 interface,info ether3-AS4b link up (speed 1G, full duplex)
12:44:07 interface,info ether8-hubengswitch link up (speed 1G, full duplex)
12:44:10 interface,info sfp1 link up (speed 1G, full duplex)
12:44:14 interface,info ether4-AS3b link up (speed 100M, full duplex)
(snipped out the pim, bgp, etc)
Pinging interfaces - both the local interface (ether2) and interfaces beyond (loop and sfp1) - show 8 or 9 lost pings when it happens, and looking at the traffic at the far end that routes through the router shows it drops for between 8.5 and 8.7 seconds.
Started happening on Friday, yesterday I took a software update (and reboot) from 6.45.6 to 6.47.1 yesterday but that didn’t improve things.
Health looks fine
system health print
fan-mode: auto
use-fan: main
active-fan: main
cpu-overtemp-check: yes
cpu-overtemp-threshold: 100C
cpu-overtemp-startup-delay: 1m
voltage: 23.7V
current: 1597mA
temperature: 35C
cpu-temperature: 56C
power-consumption: 37.8W
fan1-speed: 2986RPM
fan2-speed: 2949RPM
CPU
# CPU LOAD IRQ DISK
0 cpu0 0% 0% 0%
1 cpu1 0% 0% 0%
2 cpu2 1% 1% 0%
3 cpu3 0% 0% 0%
4 cpu4 0% 0% 0%
5 cpu5 0% 0% 0%
6 cpu6 0% 0% 0%
7 cpu7 0% 0% 0%
8 cpu8 43% 40% 0%
9 cpu9 0% 0% 0%
10 cpu10 2% 2% 0%
11 cpu11 0% 0% 0%
12 cpu12 0% 0% 0%
13 cpu13 0% 0% 0%
14 cpu14 0% 0% 0%
15 cpu15 0% 0% 0%
16 cpu16 0% 0% 0%
17 cpu17 0% 0% 0%
18 cpu18 0% 0% 0%
19 cpu19 1% 0% 0%
20 cpu20 0% 0% 0%
21 cpu21 0% 0% 0%
22 cpu22 24% 21% 0%
23 cpu23 0% 0% 0%
24 cpu24 0% 0% 0%
25 cpu25 0% 0% 0%
26 cpu26 0% 0% 0%
27 cpu27 0% 0% 0%
28 cpu28 0% 0% 0%
29 cpu29 0% 0% 0%
30 cpu30 0% 0% 0%
31 cpu31 0% 0% 0%
32 cpu32 21% 21% 0%
33 cpu33 0% 0% 0%
34 cpu34 0% 0% 0%
35 cpu35 0% 0% 0%
It started on Friday (at 09:34:08 GMT). No config changes Friday, but I did add a new bridge (with vrrp, vlan etc) on Thursday, but I did that to both router 1 and router 2, and only router 2 is having issues.
Any ideas? And increased debugging I can add?