What is the default number of threads used by the poller ?
Everything has been running fine, but now I added some more devices and I data on the map screens keeps disappearing randomly on various devices. It really looks like some sort of timeout or response time issue. The Dude server is an i7 machine with plenty of RAM, etc. There does not appear to be any performance or network problems on the server. Suggestions ?
I am using Dude to monitor about 50 WinXP and Win7 PC’s plus two routers, two firewalls, 100+ Cisco Switch ports and another 10 Linux servers and 40 LAN Printers on snmpv2 so it is collecting a pretty decent amount of data. Additionally
I also just saw something about a 2gb limit in the db and noticed that my dude.db is 2085 MB. What is the issue and what do I need to do to correct or work around this ? My historic data is not really critical.
vacuum the database now, if it reaches the critical amount you will get a weird crash of some sort.
http://forum.mikrotik.com/t/the-dude-4-0-beta-3-graphs-stop-updating/56136/4
Turn your settings for Chart Raw Value keep time to a maximum of 2 days. If your database is not much smaller after a vacuum set it to 1 days. It will take a couple days for the dude to reduce the data but the database will not shrink so you may have to vacuum again. For all other chart values 10m,2h,1day Keep the approximate storage size as low as reasonable, mine is 1250M.
The dude looks to be single threaded. I force the Dude service to real time and I set polling for 1 minute for ~1200 probes. Make sure the dude is not collecting the BGP routing table from a internet connected router. This could take a toll on dudes collect process.
(I would have to completely agree that something is fishy about the collection process/timers and is in some conflict with Disk IO or something else. i.e. html probes time out instantly with the error “connection closed” all the time even though I watched the web page come across the interface with Wireshark. A probe will show as unstable and I will re-probe and it will be back up in a few seconds. Note: I force the negative cache time to 5 seconds instead of 300, if you don’t a single “unstable” will have an outage and will not re-probe until the negative cache time expires…)
Remove any probes that are non essential, i.e. if you are collecting the CPU utilization from a Cisco switch remove ping since the actual collection of a cpu utilization value is just as telling as ping.
HTH,
Lebowski