There are some conditions which could lead to high cpu on cisco. I know (have realy experienced) 2 conditions
You have a lot of interfaces on your device. It could be, for a sample, on router holding a lot of pptp sessions.
You have a lot of routes on your device. It could be, for a sample on a router holding eBGP sessions.
In such a cases high cpu is detected when The Dude is trying to construct tables for device snmp tabs.
The solution is to disable corresponding snmp subtrees on cisco device.
Besides that, you also contributed a lot to the Cisco probe back then.. would you like to do some experimenting how to get the Dude to use less CPU?
I didn’t mean that I only want to discuss with Lebowski he just posts a lot about that
I was out all last week but gsandul is sharp as usual. Call manager training…
I have to agree with gsandul in that the dude doesn’t run the CPU up very high on Cisco devices unless you build the probe poorly. i.e. oid_column(“1.3.6”) that would be bad . I did turn off SNMP of BGP routes for the external routers though.
#sho proc cpu sort | e 0.00
CPU utilization for five seconds: 82%/16%; one minute: 39%; five minutes: 34%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
196 2988263362440177117 0 60.43% 26.83% 23.45% 0 SNMP ENGINE
184 456210516 23618850 19315 2.23% 1.20% 1.02% 0 IP SNMP
191 4625225643723604616 0 1.75% 1.58% 1.45% 0 IP Input
185 761816562165763256 0 0.79% 0.26% 0.18% 0 PDU DISPATCHER
Kinda unhealthy, eh? These should be able to handle a little SNMP traffic.
CPU goes back to like 3-6% when I disable Dude server… so back to the reading room for me
It seems that it’s enough to have the dude server enabled and this device is added to the map with IP and snmp profile.. I will check all tooltips and refresh-intervals now and (if everthing else fails) re-build the Dude from scratch
I do monitor CPU, Temperature, Traffic and memory on all my cisco routers. Also number of Active Calls, DSP resources, ASR (Answer Seizure Ratio), number of E1 in UP state on VoIP Access Servers.
Not any high cpu caused by my probes.
Interesting.. wouldn’t have imagined monitoring call status using the Dude (Cisco must hate you for not using their fine software!)
how are you polling this data in your interface/device labels? Just created a function “oid(1.2.3.4.5.6.7…)” or did you create a function with conditions for each? How often do you refresh the labels?
I am monitoring CPU and free RAM to be able to see things like memory leaks and other issues that only become apparent when monitored for several weeks but I also need to monitor the interface status and interface description (needed for automatic alarms/notifications to our NOC for customer connections)
BTW: I like the pps in your labels.. somehow more useful than bits/second for big trunks!
Something else I wonder about: Just about how many devices are you guys monitoring and with how many probes for each device? Also how many links that are monitored do you have? (interface Rx/Tx is polled, too)
I am monitoring about 300 devices (200 switches/routers, ~50 servers and some other devices that are monitored just because we can monitor them and I exercised writing probes for them (like firewalls, laser output levels or the paper level in a xerox printer)
I have 10-20 services monitored per device (depending on type) so about 2650 services in total and about 400 interfaces with Rx/Tx polled+graphed
(1 minute polling interval, label refresh set to 30 seconds (10 sec. for the important links)
I totally love the Dude by now and have not been able to reproduce many of its features in Cacti/Nagios, yet - and definitely not the ease of use and accessibility of it so I’d like to stick to it.
Some people have suggested that the Dude might not scale well so I think about setting up distributed polling by several Dude servers as I am getting too many timeouts of lately (but that might also be related to SNMP so I will attempt the Cisco snmp view hacks Gsandul sent me, first)
More black magic.. and just what I needed! Thanks!
Well, reading the interface description is, but not getting the interface description dynamically into a notification email, I had to create a lot of probes for that because I could not come up with a better solution than this:
Example ifindex: 10001 (interface FastEthernet1/0/1)
<sys-name>if_10001_status</sys-name>
<code>if(array_size(oid_column("1.3.6.1.2.1.2.2.1.8",10,29)), oid_raw("1.3.6.1.2.1.2.2.1.8.10001", 10, 29),"False")</code>
<descr>polls the status of ifindex 10001 (1 means 'up', 2 means 'down')</descr>
I had to use a script to create probes for all interfaces that I want autodiscovered and when an alarm is sent instead of “down” the error status is the interface number, description, the location and name of the device - and the NOC knows what is connected there and can look at it (instead of calling me on my mobile at night) - which was the whole purpose of this exercise
There are probably easier/smarter ways to get the interface description into an email (filling an array, using a database, modifying the xml file, etc…) but I wanted to use as little external tools (none so far) as possibe.
Test Patient: Cisco 7606 router/switch with SUP720 supervisor
created a new community for testing purposes:
snmp-server community ********* view TheDude RO 6
I am allowing only the CPU and Memory OIDs (for testing)
snmp-server view TheDude lsystem.57 included
snmp-server view TheDude ciscoMemoryPoolEntry.5 included
snmp-server view TheDude ciscoMemoryPoolEntry.6 included
the ACL allows only one IP. The dude creates this in less than 5 minutes:
Standard IP access list 6
10 permit 10.16.147.0, wildcard bits 0.0.0.255 (24614 matches)
20 deny any log
And the CPU skyrockets:
DIST76-01.AMS4(config)#do sho proc cpu sort | i SNMP ENGINE
196 3601375082453772088 0 86.62% 80.99% 46.39% 0 SNMP ENGINE
There is one more test I have to run: Start from scratch… try it all again but I am starting to lose confidence…
Anything I am doing wrong? Why is the Dude polling so much?
I found out when you open device and go to SNMP tab it creates like 20000 packets extra on the ACL - is this a feature? Half of the packets is UDP, the other half TCP
The Dude is polling by default.
Yes you are wrong in cisco config.
You had only included some subtrees, but you also have to exclude all the others. So all the snmp tree is included in your config.
The best practice is to include all by default, and exclude all you do not want to be accessible.
In your case the config should be.
snmp-server view TheDude iso included
snmp-server view TheDude iproute excluded
snmp-server community ********* view TheDude ro 6
First - include the root (and all the subtrees).
Second - exclude routing information.
That is all.
The Dude will try to get your routing information, and cisco will return nothing to snmp request. So The Dude will not send more snmp packets.
Ok, did it “textbook style” applying the snmp views
access-list 6 permit 10.xxx.yyy.zzz 0.0.0.255
access-list 6 permit x.x.x.x 0.0.1.255
access-list 6 deny any log
access-list 6 remark SNMP view for The Dude - snmp polling system
snmp-server view TheDude iso included
snmp-server view TheDude at excluded
snmp-server view TheDude snmpUsmMIB excluded
snmp-server view TheDude snmpVacmMIB excluded
snmp-server view TheDude snmpCommunityMIB excluded
snmp-server view TheDude ip.21 excluded
snmp-server view TheDude ip.22 excluded
snmp-server community ******** view TheDude RO 6
and guess what:
CPU usage dropped 20-40% all over the network, gaps in graphs have disappeared and there are no more false alarms. The Dude now accounts for 2-3% CPU total (instead of 10 %)
You were right somehow about the routing table / arp table - the devices that were hit the hardest were either aggregation switches for a lot of customers or routers carrying the internet routing table.
I can poll now every 10 seconds and still no outages, do entire snmpwalks to find interesting OIDs and only the occasional timeout.. like it was many years ago when the network was small and easy to maintain
I will play with remote polling now once I can get my hands on some dedicated servers.
My Cisco CPUs has never been that high. Even with the dude polling every 30 seconds I have routers that are so low CPU that a 0% would be returned for the average and the dude would false positive on it.
I have been watching this thread and the only idea I have as to why your seeing such high CPU is maybe you modified a global device label but I like your solution and would consider it if I needed it.
Hey guys, The CPU usage of my RB1000U increases when using Dude monitoring. I have seen from profiler it shows management process is 20%-28%. I`m monitoring only CPU usage, PPPoE active users, 2 queue rules using OID. How can I disable other SNMP sources (like interfaces, routes, IP addresses…) on my RB1000?
I have the same issue – CPU is through the roof due to SNMP.. I don’t think we can limit the viewable OID’s like on a Cisco device…
CPU usage drops from 100% to 40-50% when disabling SNMP on the router. The router I’m monitoring is a RB1200 running RouterOS 5.5 monitored by Dude 4.0beta3
I have 450+ PPPoE interfaces on this device and each PPPoE interface has their own respective routes, and queues.