Disk Probe Status alway shows OK even device is down

I have setup disk probe to monitor C: drive space on all servers, the problem is that even if all the servers are down, the disk probe status always shows ok therefore device color doesn’t change to RED, it remains orange. Why disk probe is showing OK even the device is down.
How to overcome this issue ?
disk-probe.png
low disk space in c PROBE.png

It is most likely to do with the logic of your probe Error statement. If the device is down then the OID query will not be able to retrieve any value - this will be evaluated as a 0 and may end up making your error statement come out as a TRUE. I always write my probes with an if inside an if statement. The outer if statement checks if the device is contactable and the inner if statement does the error check. There a are others that guard against timeouts by adding +1 to numeric values and then subtracting it later on.

so what is the solution ?

my probe is as below . . .

Name: Low Disk Space in C:
Type : Function

Available: 
if((oid("1.3.6.1.2.1.25.2.3.1.6.1")/oid("1.3.6.1.2.1.25.2.3.1.5.1"))*100>0, 1, -1)

Error: 
if((oid("1.3.6.1.2.1.25.2.3.1.6.1")/oid("1.3.6.1.2.1.25.2.3.1.5.1"))*100<85, "", concatenate("Disk C:  Used space is currently at (", string_substring (oid("1.3.6.1.2.1.25.2.3.1.6.1")/oid("1.3.6.1.2.1.25.2.3.1.5.1"))*100 , ") % "))

Value:
hdd_usage()

Unit: %

[edit - as geoffsmith31 already said an if inside an if gets you clearer results]

I like to use a function then call the function from the probe. This way you can determine if the device is down or if the device is in error.

Make a function called disk01test
if(array_size(oid_column(“1.3.6.1.2.1.25.2.3.1.6”,10,5)),round((oid(“1.3.6.1.2.1.25.2.3.1.6.1”,10,5)/oid(“1.3.6.1.2.1.25.2.3.1.5.1”,10,5))*100),“False”)
Repeat the above for as many disks as you have, incrementing the names and oid values by one. Note Drive C is normally disk02.
disk02test
if(array_size(oid_column(“1.3.6.1.2.1.25.2.3.1.6”,10,5)),round((oid(“1.3.6.1.2.1.25.2.3.1.6.2”,10,5)/oid(“1.3.6.1.2.1.25.2.3.1.5.2”,10,5))*100),“False”)

Create a probe DriveC
Available disk02test()<>“False”
Error if(disk02test()<>“False”,if(disk02test() < 80, “”, concatenate("Warning: Drive C = ", disk02test(), “%”)), “Failed to read Drive C the server might be down”)
Value disk02test()
Unit %
disk02test.png
DriveC.png

lebowski you are Legend of DUDE :sunglasses:
Thank you it worked perfectly.

However one minor issue,
The OID for C: or D: drive is not same for all of my server’s, on few servers I get C: drive Data on .1 and on some server’s .2 works,
for example
on server1 , C: drive have OID = 6.1
on server2 , C: drive have OID = 6.2

Why there is a difference in OID for C: or D: drive on various servers?
For this reason, I have to create 2 functions (with .1 and .2) and 2 probes to monitor C: drive on 2 servers.

Maybe one server has a floppy drive? The lack of variables make this a real problem. Gsandul is really who deserves the credit for the probe design. you will just have to manually deal with the disks.

This is the drive space probe that I use. If the server is down the return value is “no response” which is a keyword in the alerting script that I use. All of my probes have this feature and it helps prevent “alert spam” and false alarms when timeouts happen.

I am not promoting this as “better” than what Lebowski has posted, just a different way of dealing with the same issue.

if(oid("1.3.6.1.2.1.25.2.3.1.5.2")>0,if((oid("1.3.6.1.2.1.25.2.3.1.6.2")/oid("1.3.6.1.2.1.25.2.3.1.5.2"))*100<90, "", concatenate("Disk 2 (", string_substring(oid("1.3.6.1.2.1.25.2.3.1.3.2"),0,2), ") used space (", string_substring (oid("1.3.6.1.2.1.25.2.3.1.6.2")/oid("1.3.6.1.2.1.25.2.3.1.5.2"))*100 , ") % ")),"no response")

The if(oid(“1.3.6.1.2.1.25.2.3.1.5.2”)>0 checks to see if The Dude can determine the size of the disk. If theer is a timeout (or the disk does not exist) then the return value is “no response”. If a non-zero value is found then the real disk probe is evaluated.

Hmm thanks for the clearification, Yes I have Floppy only in few servers, and for those servers, i have to create another function/probe with different OID.


geoffsmith31 , lebowski , I really appreciate your help, You people are really great in helping the community without any benefits. Thank you and keep up the good work :slight_smile: please.

I use snmp-informant for my disk space probes: http://snmp-informant.com - just install the free version.

You can create a probe for each disk, then when you run a discover, it will only add the probes for disks that exist in the system.

Here is an example of the disk space probe: http://i.imgur.com/mgJEa.jpg

To add additional drives, just increment the second from last number, e.g. d:\ drive = iso.org.dod.internet.private.enterprises.wtcs.informant.standard.logicalDiskTable.logicalDiskEntry.lDiskPercentFreeSpace.2.68.58

Hello emmdeeess!

i like to try the WTCS snmp information too, but my dude wont import the mib file :frowning:

how did you install the WCTS MIB ?

I tried both
#1 http://www.wtcs.org/informant/files/MIBS/Advanced/SMIv1/WTCS.MIB.TXT
#2 http://www.wtcs.org/informant/Files/MIBS/Advanced/SMIv2/WTCS.MIB.TXT
source: http://www.wtcs.org/informant/mibs.htm

The import of the MIB: Informant-Standard (freeware) was no problem.

See the screenshots, maybe you have an idea? i use the dude 4b3
2012-12-27 14_04_27-Transfers.jpg
2012-12-27 14_03_58-dex@172.20.200.73 - The Dude 4.0beta3.jpg

bump

@vikdex

You probably don’t have to have the MIB to access the OIDs directly. Try to SNMP walk your device and see if the table gets populated. The dude doesn’t like your file so either you edit it and try to determine what is wrong with it or find one that follows the standard.

HTH,
Lebowski