100% cpu on RB750Gr3 6.40.2

I have 2 x RB750Gr3 deployed running 6.40.2. Features used are DHCP + OSPF + BGP + PPPoE.

Both units have one of their four cores pinned at 100% all the time, also the web interface does not work on either unit.

Device 1

[x@y] > /system package print
Flags: X - disabled
 #   NAME                                               VERSION                                              SCHEDULED
 0   routeros-mmips                                     6.40.2
 1   system                                             6.40.2
 2 X ipv6                                               6.40.2
 3   wireless                                           6.40.2
 4   hotspot                                            6.40.2
 5   dhcp                                               6.40.2
 6   mpls                                               6.40.2
 7   routing                                            6.40.2
 8   ppp                                                6.40.2
 9   security                                           6.40.2
10   advanced-tools                                     6.40.2

profile…

[x@y] > /tool profile cpu=all
NAME                    CPU        USAGE
www                       0        87.5%
ethernet                  0           0%
networking                0         0.5%
management                0           7%
cpu0                                 95%
www                       1         4.5%
firewall                  1           0%
networking                1           0%
management                1         0.5%
routing                   1           1%
cpu1                                  6%
console                   2         0.5%
networking                2           0%
management                2           0%
routing                   2           0%
unclassified              2           0%
cpu2                                0.5%
ssh                       3           0%
firewall                  3           0%
networking                3           0%
management                3           0%
cpu3                                  0%

Device 2

[y@z] /system package> print
Flags: X - disabled
 #   NAME                                               VERSION                                              SCHEDULED
 0   routeros-mmips                                     6.40.2
 1   system                                             6.40.2
 2 X ipv6                                               6.40.2
 3   wireless                                           6.40.2
 4   hotspot                                            6.40.2
 5   dhcp                                               6.40.2
 6   mpls                                               6.40.2
 7   routing                                            6.40.2
 8   ppp                                                6.40.2
 9   security                                           6.40.2
10   advanced-tools                                     6.40.2

profile..

[x@z] > /tool profile cpu=all
NAME                    CPU        USAGE
networking                0           1%
management                0         5.5%
routing                   0           2%
cpu0                                8.5%
firewall                  1           0%
networking                1         1.5%
management                1           0%
routing                   1         0.5%
cpu1                                  2%
console                   2           0%
networking                2         0.5%
management                2         0.5%
routing                   2           0%
profiling                 2           0%
unclassified              2           0%
cpu2                                  1%
firewall                  3           0%
networking                3           0%
management                3          95%
bridging                  3           0%
cpu3                                 95%

SSH access to both devices is fine, they route traffic and terminate RADIUS authenticated PPPoE ok, but the web interface never loads. LibreNMS shows one CPU core maxed out all the time.

thanks!

How much PPPoE sessions do your Gr terminates? If there are a lot and you have redundant link and masquerade, might be the problem in masq, because if one link fails, CPU will need much more power, than normal to flush end recalculate tracking connections. In this case recommended to use srcnat instead of masq.

Under 50 PPPoE sessions each, no local masquerading - all public IPs.

I upgraded both overnight from 6.40.2 to 6.40.4 and CPU on the pegged core dropped from 100% to 1-2%.

user firewall connections to see if you are being attacked , disable www telnet and ssh, in ip services, manage mikrotik by winbox.

upgrade winbox to latest 3.11 version, upgrade routeros to latest 6.40.4 version and upgrade routerboot to latest 3.41 version