I’ve started to get this from the Dude a couple times a week about a specific router, usually followed within a minute by, “is now up.” I’m not sure exactly what it is I’m being told… that is, what does it mean for the “cpu service” to be “down?” Clearly the router hasn’t stopped running and magically revived itself (which I guess does fairly describe a crash and reboot, but there is no rebooting going on here). What exactly is the Dude looking at to determine whether a router’s “CPU service” is up or down?
It queries the CPU load figures via SNMP. If there’s an outage, it always seems to take multiples of 3:30 to recover.