The Dude: Best practices for efficiency?

What intervals should I be using to scan the most common probes?

Also what kind of footprint does monitoring via RouterOS rather than SNMP generate? I’m not sure if I should scan bandwidth usage using SNMP or if I should just pull it real time from RouterOS?

I run with intervals of 1 minute for most things, the dude is really efficient, generating an average bit rate of 500kbps. That is with about 700 devices.
I don’t think the dude generates enough traffic to worry about unless you are using a t1 line. I have added ping to over 2000 devices as a test (scan 192.168.0.0/16) and let that run for a few weeks with no problem. That was to prove that people shut their computer off at night, no need to add a “green” server if everyone is already shutting down their machines. Either monitoring is the same snmp or routeros since both require a packet to be sent and received. The dude Caches positive reads for 5 seconds so if a value is used often it is able to re-use the recent poll.

Lebowski

Will running 1 minute intervals be less efficient for anything in particular?

I don’t believe so, the default is 30 seconds so that is already 1/2 as much traffic. If you want to go back to the stone age of 5 minute intervals that would be 10x less traffic. I like frequent polling so I can see changes quickly. I would run 30 second intervals but i wanted to make sure my database never hit that critical 2gb. The raw value keep time on the chart tab should never be set it above 2 days unless you are certain you will not make the database too big. I am still running 4.0b3 so it crashes the database if it gets too big.

Good Questions on efficiency as I get more and more comfortable with the dude monitoring and SNMP (I’ve always found historically in DUDE Graphing to be unreliable).

I’m moving alot of our graphing and more intensive monitoring to Dude and monitoring multiple links, so I was wondering from those who know

SNMP or RouterOS? For reliability, load, effeciency, etc..

We’ve got close to 5000 devices actively polling in our dude currently (generally ICMP only) however, we’re going to start turning on more logging on links, say one per graph for outbound per MT on site on 400 maps, and monitoring all ports on core devices, approximately 60 ports including cross connects and handovers with sustained load.. One of our core CCR routers running at about 10% sustained CPU with only router protection on firewall. If I’m monitoring all interfaces what will that load look like is SNMP a better option or RouterOS?

We may also increase some of the other monitoring.

Polling is currently ever 2 minutes except on some core devices / servers.
We’re also periodically scanning and polling our CPE’s just to do inventory.

We’ve stepped up the granularity on the graphing too to hold the data for longer

Under settings - Chart
Raw Data 2 Days
10 Min Interval 7 Days
2 Hour 30 Days
1 Day 365 Days

We may look at storing more granular data for longer, but this will at least give us some indication of link performance over time, we love Cacti and Smokeping but are slowing starting to consolidate these operations into the DUDE.

We also use SNMP to pull voltage and PV charge rates out of remote sites as well as some custom tools / probes / functions.

And since I think we may have one of the bigger Dude instances some stats…
about 15 concurrent Dude client sessions
Dude Server X86 (due for upgrade to a few more cores)
2 Cores, 1600Mhz.CPU 31 - 40%
Dude.db approx 500MB, total including maps / files / logs is 1GB.
Sustained monitoring traffic of around 1Mbps on both interfaces.

Hope that helps people decide to use the Dude and push the MT team for development of features. Keep up the good work boys!