We are experiencing big issues with The Dude running with about 1000 devices. I would like to know the limits and make a desicion whether I should fix it (if possible) or move to another system, like Zabbix. I have people saying that Dude is designed to run with 10 - 100 devices and I don't kwow if this is true.
The problem is: The whole Mikrotik virtual machine just hangs. It happend twice during this month and we had to reboot the whole virtual machine. Another issue is that some of the graphics just stop reading and drawing until the dude process reboots, but when this graphs start to work, other ones stop. There are also many failures when reading custom SNMP probes (like RSSI on an ubiqity links, UPS voltages). This just randomly fails and Dude says there is no response from the device.
We moved half of our network to a Dude Agent, but the problem is exactly the same. The agent did not help at all!
Does anybody has a similar problem? What is the maximum amount of devices that Dude can support? How can I debug or troubleshoot this situation? What factors can decrease Dude performance, like snmp version, router os checks, etc..?
- ROS 6.43.16 on a CHR.
- The host is running an ESXi-6.5.0-Update1
- The host has Xeon X5670 @ 2.93GHz with an HP P410 raid controller with 2 Samsung Pro SSD
- The Dude VM has: 4 GB RAM where only 0,16 GB active and 6 CPU(s). The CPU load does not reach more than 25% at peaks.