Page 1 of 1

netwatch is not working on 2.9.17

Posted: Sat Mar 11, 2006 1:17 pm
by gottin
I have hoped so much that version 2.9.17 will be a stable one, however it looks that it's not. Last night I experienced lots of problems after upgrading core routers to 2.9.17. For example after trying to setup OSPF one of the routers get loaded up to 100% (3.3 Ghz Intel). There were problems and wtih RIP - all rip routes were with metric of 16. There was no such a problem with version 2.8.26/2.8.28. Well I'm not 100% certain about my OSPF config and RIP is working now (after some reboots) :(. I think there are still problems with these routing protocols.

Moreover netwatch is not working for sure. There are hosts that are up and pingable , however their status is down. If I disable/enable a host it is correctly reported the first time. After elapsing the configured interval - the host is again repored as down (It is really up). This problem is not regarding only one host.

Is there someone who does not have any problems with 2.9.17?

Posted: Sun Mar 12, 2006 2:33 am
by meister
I upgraded our spare from 2.8.28 to 2.9.17. Everything seems to be working. I have tested netwatch for sure. Can you ping the netwatch host manually from the box?

On another box OSPF is running trouble free at 2.9.17 whereas 2.9.11 was giving a little grief.

Posted: Sun Mar 12, 2006 4:03 pm
by gottin
Yes, I can ping manualy the netwatch host.

I'll read again the documentation about OSPF more carefully than before and I'll try to set it up again. I suspicious thing about my current ospf config is that I have two networks on the interfaces that are sending ospf messages to each other. I have and on my eth interface, for example, and I'm adding and networks in the ospf config. Is it possible to be a misconfiguration?

Posted: Sun Mar 12, 2006 5:22 pm
by mag
netwatch is working for me. (only 4 mt-boxes with 2.9.17 so far )

Posted: Sun Mar 12, 2006 8:50 pm
by gottin
Netwatch is not working for me for sure. What interval and timeout are you using? I've tested with:
timeout / interval
2s 30s
2s 1m
1s 30s
1s 1m

I have almost 100 monitored hosts.

Posted: Sun Mar 12, 2006 9:16 pm
by mag
one example:
/ tool netwatch 
add host=x.y.8.158 timeout=1s interval=1m up-script="" down-script="" comment="rt-wil-hs041" disabled=no 
add host=x.y.8.160 timeout=1s interval=1m up-script="" down-script="" comment="rt-wdn-hm014" disabled=no 
add host= timeout=1s interval=5m up-script="" down-script="" comment="VPN Weidenau" disabled=no 
add host= timeout=1s interval=5m up-script="" down-script="" comment="VPN Wilnsdorf" disabled=no 
but with 100 devices i would go for some real network-management. i use netwatch only for a few devices and mostly without scripts.

Posted: Mon Mar 13, 2006 9:47 am
by macgaiver
Netwatch is working for me too.

Did you upgradet from 2.8.xx???? Maybe try to recreate the setup manualy from the the beggining on 2.9!!!

Posted: Mon Mar 13, 2006 10:09 am
by gottin
I upgraded from 2.9.12 in which everithing was working.

Now I will test if it will be working with less than 60 monitored hosts. There are about 90 hosts with interval 2m20s and timeout 1s - it is still not working properly. There are hosts that are marked with down and in the same time they are fully pingable.

Posted: Tue Mar 14, 2006 3:03 am
by gottin
I tried on a second box with 2.9.17 installed and the same netwatch rules installed - the same problems occured.

Posted: Tue Mar 14, 2006 8:20 am
by macgaiver
So how many hosts it can handle now?

Posted: Tue Mar 14, 2006 11:06 am
by gottin
Well it looks it's not problem conserning the number of hosts, because with 50 hosts the situation was same :(.

Posted: Tue Mar 14, 2006 12:16 pm
by sergis
I have 50 monitoring hosts and similar problems with netwatch,they going up and down and if there is configured sms notification this is nightmare, but all hosts in this time are up and running with no problems..

Posted: Tue Mar 14, 2006 5:41 pm
by vklimovs
Yes, same problem here. Could easily ping from linux box desired host, but netwatch shows it as down. I though is was a router pretty high load issue, but after your posts it seems that it is not. I wrote a little script as a workaround for that.
:local time
:local mailsent
:set time [/system clock get time]
:set mailsent false
:while ([/ping HOST IP ADDRESS count=1] = 0) do={
  :if ( !( $mailsent ) ) do={    
    :if ( [/system clock get time] - $time >= 1m ) do={
      /tool e-mail send to="MY_EMAIL" subject="host is down"
      :set mailsent true
  :delay 10s
:if ($mailsent) do={  
  /tool e-mail send to="MY_EMAIL" subject="host is up"

E-mail and sms sending is what personally I need from netwatch, but you may substitute actions with any other. The thing is that script sends an e-mail only if host is down for a certain amount of time, 1 minute here, but checks the host status every 10 sec. So long outage (1 min) never happens due to netwatch.