SSH daemon causing high CPU load since RouterOS upgrade > 7.15

Hi folks!

Since RouterOS version 7.15 we are expecting problems connecting to switches using SSH on various switch models now and then.

If you want to establish an SSH connection to an affected device, a connection to the SSH service can be established, but no login prompt appears. After a while, the connection is terminated due to a timeout.

We expect this to happen from time to time on a couple of RouterOS devices. For example, but not limited to: CRS326-24G-2S+, CRS328-24P-4S+, CRS354-48P-4S+2Q+

I noticed that the CPU usage on at least one core is consistently extremely high when this error occurs. However, connecting to the devices using WinBox still works without problems.

When trying export the config, while a device is expecting this issue, I noticed that exporting “/ip/ssh” runs into a timeout.

Disabling/Enabling SSH in /ip/services has no effect. Also I have tried to block all incoming SSH connections using the routers firewall, but it has no effects on the CPU usage or the “/ip/ssh” export timeout.

The only solution seems to reboot the device using WinBox or simply by cutting the power. When rebooting the device using WinBox it takes quite a while (about approx. 45 - 90 seconds, depending on the model) before the devices is doing the actual reboot. WinBox stays connected during this time.

Anyone here expecting similiar issues or having any ideas what the actual f**k is going on? Any help on this would be much appreciated.

I will also add a config export and some pics of the resource usage in a couple of minutes.

This is on a CRS326, ROS 7.15.3, bootloader is on the same version

[admin@xxx] > /system/resource/monitor 
          cpu-used: 52%
  cpu-used-per-cpu: 5%,100%
       free-memory: 448176KiB

Here comes the config:

# 2024-08-14 21:38:13 by RouterOS 7.15.3
# software id = xxx
#
# model = CRS326-24G-2S+
# serial number = xxx
/interface bridge
add admin-mac=DC:2C:6E:53:7A:69 auto-mac=no fast-forward=no frame-types=admit-only-vlan-tagged name=bridge-VLANs port-cost-mode=short vlan-filtering=yes
/interface vlan
add interface=bridge-VLANs name=bridge-VLANs-luk-2999-mgm vlan-id=99
/interface lte apn
set [ find default=yes ] ip-type=ipv4 use-network-apn=no
/port
set 0 name=serial0
/routing bgp template
set default disabled=no output.network=bgp-networks
/routing ospf instance
add disabled=no name=default-v2
/routing ospf area
add disabled=yes instance=default-v2 name=backbone-v2
/snmp community
set [ find default=yes ] addresses=xxxxx authentication-protocol=SHA1 encryption-protocol=AES name=mgm security=private write-access=yes
/system logging action
set 3 remote=10.110.99.10
add bsd-syslog=yes name=remoteError remote=10.110.99.10 syslog-severity=error target=remote
add bsd-syslog=yes name=remoteWarning remote=10.110.99.10 syslog-severity=warning target=remote
add bsd-syslog=yes name=remoteCritical remote=10.110.99.10 syslog-severity=critical target=remote
/interface bridge port
add bridge=bridge-VLANs interface=ether2 internal-path-cost=10 path-cost=10 pvid=98
add bridge=bridge-VLANs interface=ether3 internal-path-cost=10 path-cost=10 pvid=98
add bridge=bridge-VLANs interface=ether4 internal-path-cost=10 path-cost=10 pvid=98
add bridge=bridge-VLANs interface=ether5 internal-path-cost=10 path-cost=10 pvid=98
add bridge=bridge-VLANs interface=ether6 internal-path-cost=10 path-cost=10 pvid=98
add bridge=bridge-VLANs interface=ether7 internal-path-cost=10 path-cost=10 pvid=98
add bridge=bridge-VLANs frame-types=admit-only-untagged-and-priority-tagged interface=ether8 internal-path-cost=10 path-cost=10 pvid=31
add bridge=bridge-VLANs frame-types=admit-only-vlan-tagged interface=sfp-sfpplus2
/ip firewall connection tracking
set udp-timeout=10s
/ip neighbor discovery-settings
set discover-interface-list=!dynamic protocol=lldp
/ip settings
set max-neighbor-entries=8192
/ipv6 settings
set accept-redirects=no accept-router-advertisements=no disable-ipv6=yes forward=no
/interface bridge vlan
add bridge=bridge-VLANs comment=LUK-pc-raum1 tagged=sfp-sfpplus2 vlan-ids=21
add bridge=bridge-VLANs comment=LUK-pc-raum2 tagged=sfp-sfpplus2 vlan-ids=22
add bridge=bridge-VLANs comment=LUK-pc-raum3 tagged=sfp-sfpplus2 vlan-ids=23
add bridge=bridge-VLANs comment=LUK-pc-raum4 tagged=sfp-sfpplus2 vlan-ids=24
add bridge=bridge-VLANs comment=LUK-drucker tagged=sfp-sfpplus2 vlan-ids=31
add bridge=bridge-VLANs comment=LUK-praesentation tagged=sfp-sfpplus2,ether2,ether3,ether4,ether5,ether6,ether7 vlan-ids=32
add bridge=bridge-VLANs comment=LUK-dsb tagged=sfp-sfpplus2 vlan-ids=33
add bridge=bridge-VLANs comment=LUK-labor disabled=yes tagged=sfp-sfpplus2 vlan-ids=34
add bridge=bridge-VLANs comment=LUK-buecherei tagged=sfp-sfpplus2 vlan-ids=35
add bridge=bridge-VLANs comment=LUK-schueler tagged=sfp-sfpplus2,ether2,ether3,ether4,ether5,ether6,ether7 vlan-ids=40
add bridge=bridge-VLANs comment=LUK-lehrer tagged=sfp-sfpplus2,ether2,ether3,ether4,ether5,ether6,ether7 vlan-ids=50
add bridge=bridge-VLANs comment=LUK-schulsoz tagged=sfp-sfpplus2 vlan-ids=61
add bridge=bridge-VLANs comment=LUK-hausmeister disabled=yes tagged=sfp-sfpplus2 vlan-ids=62
add bridge=bridge-VLANs comment=LUK-glt tagged=sfp-sfpplus2 vlan-ids=71
add bridge=bridge-VLANs comment=LUK-voip disabled=yes tagged=sfp-sfpplus2 vlan-ids=72
add bridge=bridge-VLANs comment=LUK-solar disabled=yes tagged=sfp-sfpplus2 vlan-ids=73
add bridge=bridge-VLANs comment=LUK-ela disabled=yes tagged=sfp-sfpplus2 vlan-ids=74
add bridge=bridge-VLANs comment=LUK-byod-lehrer tagged=sfp-sfpplus2,ether2,ether3,ether4,ether5,ether6,ether7 vlan-ids=80
add bridge=bridge-VLANs comment=LUK-gyod disabled=yes tagged=sfp-sfpplus2 vlan-ids=84
add bridge=bridge-VLANs comment=LUK-byod-schueler tagged=sfp-sfpplus2,ether2,ether3,ether4,ether5,ether6,ether7 vlan-ids=88
add bridge=bridge-VLANs comment=LUK-gast tagged=sfp-sfpplus2,ether2,ether3,ether4,ether5,ether6,ether7 vlan-ids=96
add bridge=bridge-VLANs comment=LUK-mgm-wlan tagged=sfp-sfpplus2 vlan-ids=98
add bridge=bridge-VLANs comment=LUK-mgm tagged=bridge-VLANs,sfp-sfpplus2 vlan-ids=99
/interface ovpn-server server
set auth=sha1,md5
/ip address
add address=10.129.99.11/24 interface=bridge-VLANs-luk-2999-mgm network=10.129.99.0
/ip cloud
set update-time=no
/ip dns
set servers=10.129.99.254
/ip route
add disabled=no dst-address=0.0.0.0/0 gateway=10.129.99.254
/ip service
set telnet disabled=yes
set ftp disabled=yes
set www disabled=yes
set ssh address=10.110.10.0/24,10.110.99.0/24,10.252.1.0/24,192.168.8.0/24
set api disabled=yes
set winbox address=10.110.10.0/24,10.110.99.0/24,10.252.1.0/24,192.168.8.0/24
set api-ssl disabled=yes
#error exporting "/ip/ssh" (timeout)
/snmp
set enabled=yes location="xxx"
/system clock
set time-zone-name=Europe/Berlin
/system identity
set name=xxx
/system logging
set 0 topics=info,!account
add action=remoteError topics=error
add action=remoteWarning topics=warning
add action=remoteCritical topics=critical
/system note
set show-at-login=no
/system ntp client
set enabled=yes
/system ntp client servers
add address=10.129.99.254
/system routerboard settings
set boot-os=router-os
/system scheduler
add name=upgradeRouterboardToCurrentFirmware on-event="/system routerboard\r\
    \n:if ([get current-firmware] < [get upgrade-firmware]) do={ \\\r\
    \n\t:put \"current ROUTERBOARD-FW is older than Upgrade-FW...\"\r\
    \n\t:put \"starting ROUTERBOARD upgrade in 5 seconds...\"\r\
    \n\t:delay 5\r\
    \n\t:execute { /system routerboard upgrade }\r\
    \n\t:put \"Done. System will be restarted in 5 seconds...\"\r\
    \n\t:delay 5\r\
    \n\t:execute { /system reboot }\r\
    \n}\r\
    \n" policy=ftp,reboot,read,write,policy,test,password,sniff,sensitive,romon start-time=startup

The “/ip/ssh” have run into timeout, just as I expected to happen, but there configured settings apart from “forwarding-enabled=remote” anyway (on some, not all devices).

What I have missed to mention before is, that after a device with this error is rebooted, the problem is gone for the moment but there a high possibility that it will reoccur in near future (minutes, hours, days after reboot). So far I haven’t been able to detect any regularity.
cpu_profile_cpu1.png
cpu_profile_total.png
cpu_profile_cpu0.png

Hi,


Try to regenerate your SSH host keys in rsa/2k.

I’ve had some strangers behaviors on a hap ac3 with rsa/4k and with rsa/8k on RB5009.
But at this time i was running 7.10, maybe 7.11… i don’t remember.

At this time, i’ve switched to ed25519 since 7.15 and don’t see any problem.

Hi.

Thanks for you input.
Haven’t tried to switch to an ed25519 host key yet, but will give it a try!

I am also using ed25519 public key for authentication since it has been implemented, but the error appears to occur regardless of the key type used for auth.

However, will give regenerating host keys a try and post a follow-up asap. :unamused:

A short follow up.

The last two weeks I have changed the ssh host-key to ed25519 and regenerated the host-key on every RouterOS device, followed by reboot.
Unfortunately, the problem still occurs. Subjectively, it has become somewhat less severe, but I could be mistaken.

What I also noticed is that an “autosupout.rif” is generated on every device at the time the error occurs. I will now submit another support request to Mikrotik in parallel with the “autosupout.rif”.

However, further ideas from the community are also much apprechiated!
2024-08-27_08h18_34.png
cpu_utilization.png