Probe Thread

I would like to start a thread that has custom Probe example on it. My hope is not only to further my Knowledge but hopefully help others with this great software.


Here are a few I have made.

Cisco CPU
Type: Function
Available: if(oid(“1.3.6.1.4.1.9.2.1.58.0”)>0, 1, -1)
Error: “”
Value: oid(“1.3.6.1.4.1.9.2.1.58.0”)
Unit: % of cpu load

APC PDU LOAD
Type:Function
Available:if(oid(“1.3.6.1.4.1.318.1.1.12.2.3.1.1.2.1”)>0, 1, -1)
Error: if(oid(“1.3.6.1.4.1.318.1.1.12.2.3.1.1.2.1”)>0, “”, “No Load”)
Value: oid(“1.3.6.1.4.1.318.1.1.12.2.3.1.1.2.1”)
Unit: Load amps in decimal

Check if a certain program is running on a Windows system (‘OUTLOOK.EXE’ in this example):

Type: function
Available: if(array_find(oid_column(“1.3.6.1.2.1.25.4.2.1.2”),“OUTLOOK.EXE”)>0, 1, -1)
Error: if(array_find(oid_column(“1.3.6.1.2.1.25.4.2.1.2”),“OUTLOOK.EXE”)>0, “”, “OUTLOOK.EXE not detected by SNMP probe”)
Value: 1 (or anything else, is purely for charting purposes and I return 1 if the service is running)
Unit: running (or whatever you want to call the above values)
Rate: none

This of course requires the SNMP agent is running and configured properly on the Windows system.

Add this to a device notes (right-click the device, Notes)

[oid(“1.3.6.1.2.1.1.1.0”)]

Then the device’s popup will list a description of the system.

For example, for Windows systems it will show the hardware and software platform, for Cisco devices the hardware and firmware revisions, etc.

The standard CPU load figure (“34%”) is the average of all available CPU’s, but if you add this to a devices ‘Appearance’

Load on [array_size(oid_column(“iso.org.dod.internet.mgmt.mib-2.host.hrDevice.hrProcessorTable.hrProcessorEntry.hrProcessorLoad”))] CPU('s): [oid_column(“iso.org.dod.internet.mgmt.mib-2.host.hrDevice.hrProcessorTable.hrProcessorEntry.hrProcessorLoad”)]

and the device label will show the number of CPU’s in the system and the load on each separate CPU (for example: 'Load on 4 CPU(‘s): 12, 15, 46, 2’).

(Only tested for Windows target systems…)

This is my default label
Thanks for the addin guys


Device [Device.Name] ([Device.Type])
IP: [Device.AddressesCommaList]
Services ([Device.ServicesCount]):
Up: [Device.ServicesUp]
Unstable: [Device.ServicesUnstable]
Down: [Device.ServicesDown]
Acked: [Device.ServicesAcked]
Unknown: [Device.ServicesUnknown]
Dell Tag [oid(“iso.3.6.1.4.1.674.10892.1.300.10.1.11.1”)]
[oid(“1.3.6.1.2.1.1.1.0”)]
[snmp_name][snmp_description][snmp_uptime][snmp_contact][snmp_location]
Load on [array_size(oid_column(“iso.org.dod.internet.mgmt.mib-2.host.hrDevice.hrProcessorTable.hrProcessorEntry.hrProcessorLoad”))] CPU('s): [oid_column(“iso.org.dod.internet.mgmt.mib-2.host.hrDevice.hrProcessorTable.hrProcessorEntry.hrProcessorLoad”)]
Notes:
[Device.NotesColumn]

Dell temperature alert if it gets over 95

available
if(oid(“iso.3.6.1.4.1.674.10892.1.700.20.1.6.1.3”)>0, 1, -1)

error
if(oid(“iso.3.6.1.4.1.674.10892.1.700.20.1.6.1.3”)<350, “”, “Over Temp 95”)

value
iso.3.6.1.4.1.674.10892.1.700.20.1.6.1.3
C
none

Warning when disk usage goes over 89%

Type: function
Available: if(hdd_usage()>0, 1, -1)
Error: if(hdd_usage()<90, “”, “Disk usage > 89%”)
Value: hdd_usage()
Unit: %
Rate: none

Note: this probe uses the built in hdd_usage function, so for devices with multiple hard disks it looks at the average disk space usage. Of course, the above example is easily adapted for a specific hard drive. Just replace ‘hdd_usage’ with the appropriate oid(“xxxxx”) call.

First of all, I like very much this tool.

I managed to plot a reachability graph. 100% reachability=no packet loss. “reachability (%)” = 100 - “packet loss (%)”

1st I have created a function:
Name: packet_loss_test
Desc: number of replied pings from 10 ping requests (0-10)
Code:
if( array_element(ping(device_property(“FirstAddress”)) , 0)<0 , 0 , 1 ) +
if( array_element(ping(device_property(“FirstAddress”)) , 0)<0 , 0 , 1 ) +
if( array_element(ping(device_property(“FirstAddress”)) , 0)<0 , 0 , 1 ) +
if( array_element(ping(device_property(“FirstAddress”)) , 0)<0 , 0 , 1 ) +
if( array_element(ping(device_property(“FirstAddress”)) , 0)<0 , 0 , 1 ) +
if( array_element(ping(device_property(“FirstAddress”)) , 0)<0 , 0 , 1 ) +
if( array_element(ping(device_property(“FirstAddress”)) , 0)<0 , 0 , 1 ) +
if( array_element(ping(device_property(“FirstAddress”)) , 0)<0 , 0 , 1 ) +
if( array_element(ping(device_property(“FirstAddress”)) , 0)<0 , 0 , 1 ) +
if( array_element(ping(device_property(“FirstAddress”)) , 0)<0 , 0 , 1 )

2nd) I have created the probe
Name: reachability
Type: function
Available: ping(device_property(“FirstAddress”)) >= 0
Error: “”
Value: packet_loss_test()*10
Unit: %

If you want finer values you can make an addition of 20 pings instead of 10 and change Value in the probe for
packet_loss_test()*5
but the probe will be more intrusive.
-------------------Post edit;-------------------------------
I have noticed that the Dude only performed two pings of the ten i wrote in the function. Options are:

  1. to change the function to make only two pings
    and to change the probe Value: packet_loss_test()*50
  2. to execute an external ping. I’m working on this.
  3. to enhace the ping function, with a parameter that should be the number of packets to send, and to return the number of answered packets.
    -------------------Post edit, even later;-------------------------------
    Now testing Dude 4beta3. Dude stores somewhere the answer of the first ping so it only sends one ping, so this probe is almost useless.
    The only interesting result is if you understand it as a pulse-code-modulated signal.
    -------------------Post edit, even later than the previous one;-------------------------------
    Not so useless. I still have my graphs.
    Clipboard01.gif
    Here you may see three graphs.
    First is mostly in yellow the reachability of four of my WiFi computers.
    Second, several computers in Internet
    Third, a place that has suffered of a communications problem today.
    It is not perfect, mostly when it is under the time to change from raw draws information to the 10 min summary as you may see at the right side of the third chart.

Regards

This is almost a copy and paste of the http probe.

Probe definition:
Name: Google through proxy
Type: TCP
Port: 8080 (this may change at your company)
Connect Only UNCHECKED
First Receive, Then Send UNCHECKED
Send: GET http://www.google.com/ HTTP/1.0\r\n\r\n
Receive: HTTP/1.1 200 OK

Some other probes.
Bluecoat is a http cache proxy. I have probes to measure the CPU of a solaris host and CPU and pages/sec of the Bluecoat

Function definition ---------------------------------
Name: cpu_bluecoat_usage
Desc: cpu usage for blue coat device
Code: oid(“1.3.6.1.4.1.3417.2.4.1.1.1.4.1”)

Probe definition ------------------------------------
Name: cpu_bluecoat
Type: Function
Available: cpu_bluecoat_usage()
Error: if(cpu_bluecoat_usage(), “”, “down”)
Value: cpu_bluecoat_usage()
Unit: %




Function definition ---------------------------------
Name: cpu_solaris_idle_ticks
Desc: timeticks in idle mode for solaris host
Code: oid(“1.3.6.1.4.1.42.3.13.4.0”)

Function definition ---------------------------------
Name: cpu_solaris_usage_ticks
Desc: timeticks not in dle mode for solaris host
Code: oid(“1.3.6.1.4.1.42.3.13.1.0”) +
oid(“1.3.6.1.4.1.42.3.13.2.0”) +
oid(“1.3.6.1.4.1.42.3.13.3.0”)

Function definition ---------------------------------
Name: cpu_solaris_total_ticks
Desc: cpu timeticks for solaris host
Code: oid(“1.3.6.1.4.1.42.3.13.1.0”) +
oid(“1.3.6.1.4.1.42.3.13.2.0”) +
oid(“1.3.6.1.4.1.42.3.13.3.0”) +
oid(“1.3.6.1.4.1.42.3.13.4.0”)

Function definition ---------------------------------
Name: cpu_solaris_usage
Desc: cpu usage for solaris host
Code: 100 *
rate(cpu_solaris_usage_ticks()) /
rate(cpu_solaris_total_ticks())

Probe definition ------------------------------------
Name: cpu_solaris
Type: Function
Available: cpu_solaris_idle_ticks()
Error: if(cpu_solaris_idle_ticks(), “”, “”)
Value: cpu_solaris_usage()
Unit: %



Function definition ---------------------------------
Name: http_requests_bluecoat
Desc: http requests for a bluecoat device
Code: oid(“1.3.6.1.3.25.17.3.2.1.1.0”)

Function definition ---------------------------------
Name: http_rate_bluecoat
Desc: http requests for a bluecoat device
Code: rate( oid(“1.3.6.1.3.25.17.3.2.1.1.0”) )

Probe definition ------------------------------------
Name: http_pages_bluecoat
Type: Function
Available: http_requests_bluecoat()
Error: if(http_requests_bluecoat(),“”, “down”)
Value: http_rate_bluecoat()
Unit: pages/sec

Have I said I like this tool?
Regards

I have the following probes

Check CPU, warning @ 80% CPU usage
Name: CPU usage < 80%
Type: Funtion
Available: if(cpu_usage()>0, 1, -1)
Error: if(cpu_usage()<80, “”, “CPU usage > 79%”)
Value: cpu_usage()
Unit: %
Rate: none

Check Memory, warning @ 80 % Memory usage
Name: Memory usage < 80%
Type: Funtion
Available: if(mem_usage()>0, 1, -1)
Error: if(mem_usage()<80, “”, “Memory usage > 79%”)
Value: mem_usage()
Unit: %
Rate: none

Check Virtual Memory, warning @ 80% Virtual Memory usage
Name: Virtual Memory usage < 80%
Type: Funtion
Available: if(virtual_mem_usage()>0, 1, -1)
Error: if(virtual_mem_usage()<80, “”, “Virtual Memory usage > 79%”)
Value: virtual_mem_usage()
Unit: %
Rate: none

thanks to winkelman

Are there any probes to check the disk status?

What do you mean? Status like ‘up’ or ‘down’ (perhaps for external disks)? Or status like ‘80% full’?

with a raid 5 i want to check if al disk are oke,
So when one of the disks go down i gets a notification

it is possible to use parameters in created function like it is in builtin ones?
Now i got error message “too many parameters for functioname”
How to read parameters? $1 $2 ?

By itself that is not possible. The OS (and thus the standard SNMP agent) just sees a RAID-set as a ‘single disk’. It wouldn’t know the status of any of its sub-parts. However, I do know that for example IBM ServeRAID adapters allow you to install the IBM ServeRAID Manager program, which optionally includes an additional SNMP agent. That makes RAID-info available through SNMP and thus to the Dude.

Perhaps your brand of RAID adapter also has such management software available.

So its not possible to check disk status from any disk?
Because when i do a SNMP walk i see the physical disks oid’s.
So then it must be posible te make a probe that checks the disk is ok and when the status is failed there will be a notification?

I have a question about this probe
if you want to check somthing els like sqlservr.exe
You only need to change outlook.exe for sqlservr.exe or you need another oid?

I would be interested in a probe/function that would allow detection of DHCP servers on the wire. It would prove invaluable in tracking down the odd rogue that pops up from time to time when someone puts a misconfigured router on the network. Can anyone assist in this?

thanks!

[edit] OK I am still trying to work this out and would like to know if I am on the right track. If I create a probe with the following setup, my thought would be that that I would get an alert if another dhcp server is detected.

name: dhcp probe> …> //just a name
type:snmp> …> //may or may not be the right way to go
oid: 1.3.6.1.4.1.5.1.1.55.1.1.22> …> //DhcpSrvDomainServer IpAddress used to match against known dhcp server address
oid type: IP Address
compare method: !=(not equal)> …> //this will provide my comparison
ip address:xxx.xxx.xxx.xxx> …> //ip addy of known dhcp server

So, what I am thinking is that something like this should detect a rogue on the wire and if I have notifications setup for this, I should get a near immediate alert when it is detected. Does anyone have any input or suggestions, I’m open to them.

cheers

Bump, and a request to make this thread a sticky. I’ll be more than happy to share any probes I get working if the rest of the community is willing. :smiley:

double bump and I’ll contribute my only custom probe. =( wish i was better at this stuff.

Name: wirlessID
Type: snmp
OID:iso.anonymous#62.anonymous#63.ieee802dot11.dot11smt.dot11StationConfigTable.dot11StationConfigEntry.dot11DesiredSSID.1
OID type: octet string
compare method: ==
String Value: (your SSID)

i use this to take auto discovered wireless devices and auto ID them into Wirless APs device type.