I have been having some issues with the NTP component on one of my core MikroTik routers lately.
I use it to sync to a special non-public remote NTP server, and servers on my network then use the router as NTP server. Therefore the NTP package is installed on the router.
According to my monitoring system, the issues had been going on for at least a few months, with the NTP server slowly drifting out of sync more and more.
I can see that the clock on the router is off by more than 60 seconds now.
So I concluded that the NTP client isn’t setting the router’s clock anymore, and likely hasn’t for months.
Looking at the NTP client settings I don’t see anything useful, it simply says “Reached” at the button.
I did a quick Wirehark sniff, and I could see the server replied, so I would assume it isn’t an issue with the remote NTP server or the connection to it.
(I can recapture the trafik if anyone think it is useful to see it.)
Do any of you have any ideas about what the issue could be?
I can see that the issue started after upgrading the router to RouterOS v6.43.16 (long-term) from v6.42.10 (long-term).
I tested RouterOS v6.44.5 (long-term) on test router, but the same issue appear on that.
Monitoring setup shortly after router is upgraded.
You should probably generate a supout file and send it to support@mikrotik.com … this is clearly some internal working (either a bug or problem with your particular device) and none of ordinary users can help you with that.
[Deantwo@NTP Client Server Router] > system ntp client print
enabled: yes
mode: unicast
primary-ntp: xxx.xxx.xxx.xxx
secondary-ntp: 0.0.0.0
dynamic-servers:
status: started
Running the print command seems to say the router is just “started”, and I can’t seem to get it to say “reached” right now.
It might just be the status label on the NTP Client window is a little buggy, so that it can get stuck showing an old state.
Problem with it not getting synchronized is still there though. Waiting on the next reply from MikroTik support.
Looking through the patch notes I did notice this one change:
What’s new in 6.43.4 (2018-Oct-17 06:37):
Changes in this release:
…
*) ntp - fixed possible NTP server stuck in “started” state;
…
Could it be that the stratum of the remote NTP server isn’t low enough? (Just a suggestion. I don’t know whether the RouterOS’ NTP client even looks at the stratum value.)
Thanks to help from MikroTik support, we found out that the remote NTP server was mismanaged.
The owners had started taking the server offline some weeks/months back and hadn’t told anyone about it. Supposedly they didn’t think anyone anyone was using IP-address instead of domain name or some nonsense like that.
The NTP client can only synchronize with a NTP server if the “Root Dispersion” value is less than 16 seconds. In my case here, the remote NTP server was reporting a Root Dispersion a little over 22 seconds. This caused the NTP client status to simply report “started”.
The SNTP client ignores Root Dispersion, which I guess is both good and bad.
I am not totally sure why it was giving the status “reached” when I first started this. Either a Winbox UI error or because the Root Dispersion originally exceeded the limit gradually over the last few weeks/months. And I can’t explain why the Root Dispersion was only 5.358322 seconds in the above package sniff, maybe because there had been nothing to compare it with until I had messed with it or something.
Either way, I gave the NTP server owners a kick in the butt and it all works now.
Also got this reply from support, when I asked for a bug report to be made about the missing NTP logging and user feedback. So I guess all is good now.
RouterOS v7 will have a completely rewritten NTP client and server implementation which will include more logging and more possibilities than before.