How can people have faith in these products for business critical work?
a reasonably big cisco network with ~90 asr9010 routers experienced random line card reboots and therefore outages without any good reason. incidents followed each other randomly, but sometimes after just couple minutes, essentially creating some chain-reaction like effects: routers connected to each other via long-haul transmission connections, but otherwise up to 100km apart from each other just suddenly died within minutes. no one could tell why. the issue repeated itself - the only resolution was to power-cycle the boxes using out-of-band management. a reboot meant 30-40 minutes
(! correct, minutes, not seconds) time taken to recover.
some even suspected cyber-criminal activity or cyber-terrorism must be behind it.
all the linecards were replaced (that is a bit over 250 pieces) with next generation ones as repair.
later it was found and demonstrated, that the packet parser mistakenly identified an ethernet frame carrying an ipv4 packet as ipv6 packet, branched to the wrong point on the NP, resulting a processor lockup - a situation from where the device cannot recover, not even with the implemented watchdog.
total secrecy was kept, not even the owner & operator of the network was told the exact mechanism and conditions how the bug could be triggered.
several months later the issue was disclosed as "severity 2 DDTS" with the appropriate patches.
and look, even after the effect, we still see some NP lockups in the network, caused by other mis-parsed packets, but until now the NPUs were able to recover from it using "fast reset" - resulting a couple of second outage, instead of turning unresponsive. does that mean, all the issues got fixed? actually no, it's just an automated workaround.
bottom line: routers are computers, running software people write, even if they test lots of things, there's nothing like live production network, where shit just happens. regardless of the make.
i could have been whining at the TAC hours long because most of the "cisco enterprise APs" were affected by the WPA2 issue and there was no software fix available, even after weeks the bug was publicly disclosed. and look, RouterOS current & bug fix releases had the fix on the day of the disclosure. serious gear for serious business?