NTP Reached but not Synchronized

I have been having some issues with the NTP component on one of my core MikroTik routers lately.
I use it to sync to a special non-public remote NTP server, and servers on my network then use the router as NTP server. Therefore the NTP package is installed on the router.

According to my monitoring system, the issues had been going on for at least a few months, with the NTP server slowly drifting out of sync more and more.
I can see that the clock on the router is off by more than 60 seconds now.
So I concluded that the NTP client isn’t setting the router’s clock anymore, and likely hasn’t for months.

Looking at the NTP client settings I don’t see anything useful, it simply says “Reached” at the button.
NTP Reached Issue.png
I did a quick Wirehark sniff, and I could see the server replied, so I would assume it isn’t an issue with the remote NTP server or the connection to it.
(I can recapture the trafik if anyone think it is useful to see it.)

Do any of you have any ideas about what the issue could be?

I can see that the issue started after upgrading the router to RouterOS v6.43.16 (long-term) from v6.42.10 (long-term).
I tested RouterOS v6.44.5 (long-term) on test router, but the same issue appear on that.
NTP Reached Issue Monitoring.png

  1. Monitoring setup shortly after router is upgraded.
  2. Alarms start triggering.

You should probably generate a supout file and send it to support@mikrotik.com … this is clearly some internal working (either a bug or problem with your particular device) and none of ordinary users can help you with that.

Yeah, I was gonna do that here in a bit after getting some package captures and such.

EDIT: Mail sent and waiting reply as [Ticket#2019102222004601].

[Deantwo@NTP Client Server Router] > system ntp client print 
          enabled: yes
             mode: unicast
      primary-ntp: xxx.xxx.xxx.xxx
    secondary-ntp: 0.0.0.0
  dynamic-servers: 
           status: started

Running the print command seems to say the router is just “started”, and I can’t seem to get it to say “reached” right now.
It might just be the status label on the NTP Client window is a little buggy, so that it can get stuck showing an old state.

Problem with it not getting synchronized is still there though. Waiting on the next reply from MikroTik support.

Looking through the patch notes I did notice this one change:

What’s new in 6.43.4 (2018-Oct-17 06:37):

Changes in this release:


*) ntp - fixed possible NTP server stuck in “started” state;

So that is interesting.

Could it be that the stratum of the remote NTP server isn’t low enough? (Just a suggestion. I don’t know whether the RouterOS’ NTP client even looks at the stratum value.)

Using Wireshark I can see the stratum the server replies with as 3.

Network Time Protocol (NTP Version 4, server)
    Flags: 0x24, Leap Indicator: no warning, Version number: NTP Version 4, Mode: server
    Peer Clock Stratum: secondary reference (3)
    Peer Polling Interval: 6 (64 seconds)
    Peer Clock Precision: 0.000001 seconds
    Root Delay: 0.003113 seconds
    Root Dispersion: 5.358322 seconds
    Reference ID: xxx.xxx.xxx.xxx
    Reference Timestamp: Oct 18, 2019 10:39:31.012301732 UTC
    Origin Timestamp: Oct 22, 2019 12:35:21.523945431 UTC
    Receive Timestamp: Oct 22, 2019 12:25:24.820052570 UTC
    Transmit Timestamp: Oct 22, 2019 12:25:24.820092977 UTC

The NTP client is really bad at giving feedback.
The SNTP client is a lot more useful with its status and even allows domain names.

Ok, the issue is resolved.

Thanks to help from MikroTik support, we found out that the remote NTP server was mismanaged.
The owners had started taking the server offline some weeks/months back and hadn’t told anyone about it. Supposedly they didn’t think anyone anyone was using IP-address instead of domain name or some nonsense like that.

The NTP client can only synchronize with a NTP server if the “Root Dispersion” value is less than 16 seconds. In my case here, the remote NTP server was reporting a Root Dispersion a little over 22 seconds. This caused the NTP client status to simply report “started”.
The SNTP client ignores Root Dispersion, which I guess is both good and bad.

I am not totally sure why it was giving the status “reached” when I first started this. Either a Winbox UI error or because the Root Dispersion originally exceeded the limit gradually over the last few weeks/months. And I can’t explain why the Root Dispersion was only 5.358322 seconds in the above package sniff, maybe because there had been nothing to compare it with until I had messed with it or something.
Either way, I gave the NTP server owners a kick in the butt and it all works now.

Also got this reply from support, when I asked for a bug report to be made about the missing NTP logging and user feedback. So I guess all is good now.

RouterOS v7 will have a completely rewritten NTP client and server implementation which will include more logging and more possibilities than before.