Leap second bug present on TILE devices?

Did anyone else see any TILE-based RouterOS devices go unresponsive at leap second insertion today? (00:00 UTC)

Three of my CCRs (one running 6.27 and two running 6.29.1) became unresponsive at 00:00 UTC on the dot. LCD screens were also unresponsive, and I equally couldn’t get any output via serial console. The only fix was a hard power cycle.

Yes, they all crashed: http://forum.mikrotik.com/t/all-ccr-crashed/86731/1

All of our CCRs were impacted. Those running 6.28 required a hard reset. We had some on 6.17 that seemed to just reboot.

+1 here on two CCR 1016 :confused:

I can confirm that some CCR units experienced a crash due to introduction of leap second

Only those CCR units were affected, that use the client inside NTP npk package. It currently seems the issue was in linux kernel, the bug was fixed, but RouterOS did not have this kernel fix yet.

If the CCR uses the default SNTP client (ie. NTP.npk is not installed) then nothing happened.

False. 95% of my routers don’t have the NTP package installed and they all crashed badly.

I can confirm we suffered from this problem at the stroke of 00:00 GMT.

http://forum.mikrotik.com/t/leap-second-insertion/87724/6

2 of our 3 CCRs failed and required a hard power cycle to function again.

All CCRs are on 6.27 and have the NTP package installed, including the 1 that stayed online.

Not good that we find out about this problem before the leap second event. :blush:

My my CCR1009 (v6.29.1), using “SNTP Client” (NTP package not installed), and no BGP didn’t hang or reboot.

All of my border routers (ALL CCR) that were synced with an ntp.org pool crashed. All of my edge routers were synced to the border routers so i didn’t have to power cycle those. Still caused a significant outage. Unfortunately, this is the last straw and i will be replacing all these devices, border or not, with more reliable cisco equipment.

If mikrotik worked more closely with the open source linux community, I’m sure this wouldn’t have happened.

Little too late, don’t you think? When is the next leap second? I won’t have any mikrotik devices on my networks when it happens.

For this one, yes, but next leap second will be added in around 2 years.
Could you please tell me if you had NTP package on all the servers, or you used SNTP?

If I may respond as well …

NTP, no SNTP. 6.29, build time May/27/2015 11:19:36.

Unless a bug in the hardware driver of some NTP server triggers an unexpected leap second (like it happened to me on 1st April, http://forum.mikrotik.com/t/all-ccr-crashed/86731/1 , or unless a malicious user wants to bring down an entire ISP network by hacking one public NTP server.

Dear Normis!

Besides the probably driver/NTPd bug in the kernel I don’t understand why the routers hang and why were not restarted by the hardware watchdog? As I remember Tilera processors have hardware watchdog and seems it doesn’t function properly!

Should we trust in the watchdog in these cases? The main problem is that operators had to restart the routers on-site.

Happened to all our CCR today.

Multinational meltdown on our BGP networks, right now network engineer run around to physically reboot it.

:frowning:

Is it just me or is Normis incapable of say “sorry - we screwed up”?? I have read through all his replies and I don’t see the apology anywhere - but frankly I am not in the least bit surprised…

Let me explain a bit more what happened tonight in my situation:

Being aware of the problems that the leap second caused accidentally to my routers on 1st April, I disabled NTP ( /system ntp client set enabled=no ) on every router, except those I could reach easily. The result is that the routers with NTP disabled didn’t crash.

The ones where NTP hasn’t been disabled all crashed. Including those who hadn’t NTP package installed. This means that if the NTP package causes the problem, there are chances that it causes something else to fail, for example it could be a BGP routing update that triggers the bug/crash and someway propagates it to other routers (this is just a guess).

The point of this post is to warn and emphasize that I found MANY routers in an irresponsive state and only a couple of them had the NTP package installed.

CCR1009, CCR1016 and CCR1036, 6.19 and 6.27, sntp client. None crashed.

Hi, we have Mikrotiks everywhere + around 20 CCRs. NTP is configured on all devices in our network.
Conclusion is below :

  1. RBs with NTP client or SNTP client were not affected. versions from 6.13 to 6.28
  2. Affected were only CCRs with version after 6.20 and NTP client running on it.
    For example CCR with 6.18 was not affected, even it has NTP running.

That is very interesting, maybe those units used a different NTP server? Because NTP package and Kernel were not changed in 6.18 or even since any v6 version