Community discussions

 
Clbh
just joined
Topic Author
Posts: 20
Joined: Tue May 12, 2015 5:22 am

Leap second bug present on TILE devices?

Wed Jul 01, 2015 3:50 am

Did anyone else see any TILE-based RouterOS devices go unresponsive at leap second insertion today? (00:00 UTC)

Three of my CCRs (one running 6.27 and two running 6.29.1) became unresponsive at 00:00 UTC on the dot. LCD screens were also unresponsive, and I equally couldn't get any output via serial console. The only fix was a hard power cycle.
 
coylh
Member Candidate
Member Candidate
Posts: 160
Joined: Tue Jul 12, 2011 12:11 am

Re: Leap second bug present on TILE devices?

Wed Jul 01, 2015 3:53 am

 
drb
just joined
Posts: 9
Joined: Mon Jul 25, 2011 8:20 am

Re: Leap second bug present on TILE devices?

Wed Jul 01, 2015 6:47 am

All of our CCRs were impacted. Those running 6.28 required a hard reset. We had some on 6.17 that seemed to just reboot.
 
SwissWISP
Member Candidate
Member Candidate
Posts: 181
Joined: Fri Sep 23, 2011 12:16 pm

Re: Leap second bug present on TILE devices?

Wed Jul 01, 2015 9:49 am

+1 here on two CCR 1016 :?
 
User avatar
normis
MikroTik Support
MikroTik Support
Posts: 24206
Joined: Fri May 28, 2004 11:04 am
Location: Riga, Latvia

Re: Leap second bug present on TILE devices?

Wed Jul 01, 2015 9:55 am

I can confirm that some CCR units experienced a crash due to introduction of leap second

Only those CCR units were affected, that use the client inside NTP npk package. It currently seems the issue was in linux kernel, the bug was fixed, but RouterOS did not have this kernel fix yet.

If the CCR uses the default SNTP client (ie. NTP.npk is not installed) then nothing happened.
No answer to your question? How to write posts
 
User avatar
paoloaga
Member Candidate
Member Candidate
Posts: 222
Joined: Tue Mar 08, 2011 2:52 am
Location: Vaprio d'Agogna (NO) - Italy
Contact:

Re: Leap second bug present on TILE devices?

Wed Jul 01, 2015 9:56 am

I can confirm that some CCR units experienced a crash due to introduction of leap second

Only those CCR units were affected, that use the client inside NTP npk package. It currently seems the issue was in linux kernel, the bug was fixed, but RouterOS did not have this kernel fix yet.

If the CCR uses the default SNTP client (ie. NTP.npk is not installed) then nothing happened.
False. 95% of my routers don't have the NTP package installed and they all crashed badly.
 
bmv
just joined
Posts: 17
Joined: Sun Aug 15, 2010 1:15 pm

Re: Leap second bug present on TILE devices?

Wed Jul 01, 2015 9:58 am

I can confirm we suffered from this problem at the stroke of 00:00 GMT.

http://forum.mikrotik.com/viewtopic.php ... 59#p488659

2 of our 3 CCRs failed and required a hard power cycle to function again.

All CCRs are on 6.27 and have the NTP package installed, including the 1 that stayed online.

Not good that we find out about this problem before the leap second event. :oops:
 
MartijnVdS
Frequent Visitor
Frequent Visitor
Posts: 93
Joined: Wed Aug 13, 2014 9:36 am

Re: Leap second bug present on TILE devices?

Wed Jul 01, 2015 10:08 am

My my CCR1009 (v6.29.1), using "SNTP Client" (NTP package not installed), and no BGP didn't hang or reboot.
 
madman2233
just joined
Posts: 6
Joined: Wed Feb 06, 2013 1:31 am

Re: Leap second bug present on TILE devices?

Wed Jul 01, 2015 10:14 am

All of my border routers (ALL CCR) that were synced with an ntp.org pool crashed. All of my edge routers were synced to the border routers so i didn't have to power cycle those. Still caused a significant outage. Unfortunately, this is the last straw and i will be replacing all these devices, border or not, with more reliable cisco equipment.

If mikrotik worked more closely with the open source linux community, I'm sure this wouldn't have happened.
 
madman2233
just joined
Posts: 6
Joined: Wed Feb 06, 2013 1:31 am

Re: Leap second bug present on TILE devices?

Wed Jul 01, 2015 10:33 am

We have found how to fix the issue in the kernel, fix is coming soon.
Little too late, don't you think? When is the next leap second? I won't have any mikrotik devices on my networks when it happens.
 
User avatar
normis
MikroTik Support
MikroTik Support
Posts: 24206
Joined: Fri May 28, 2004 11:04 am
Location: Riga, Latvia

Re: Leap second bug present on TILE devices?

Wed Jul 01, 2015 10:49 am

Little too late, don't you think?
For this one, yes, but next leap second will be added in around 2 years.
Could you please tell me if you had NTP package on all the servers, or you used SNTP?
No answer to your question? How to write posts
 
User avatar
antondollmaier
just joined
Posts: 2
Joined: Fri Mar 07, 2014 4:09 pm

Re: Leap second bug present on TILE devices?

Wed Jul 01, 2015 11:03 am

If I may respond as well ...
Could you please tell me if you had NTP package on all the servers, or you used SNTP?
NTP, no SNTP. 6.29, build time May/27/2015 11:19:36.
 
User avatar
paoloaga
Member Candidate
Member Candidate
Posts: 222
Joined: Tue Mar 08, 2011 2:52 am
Location: Vaprio d'Agogna (NO) - Italy
Contact:

Re: Leap second bug present on TILE devices?

Wed Jul 01, 2015 11:07 am

next leap second will be added in around 2 years.
Unless a bug in the hardware driver of some NTP server triggers an unexpected leap second (like it happened to me on 1st April, http://forum.mikrotik.com/viewtopic.php?f=3&t=95455 , or unless a malicious user wants to bring down an entire ISP network by hacking one public NTP server.
 
viacomkft
just joined
Posts: 1
Joined: Wed Jul 01, 2015 11:22 am

Re: Leap second bug present on TILE devices?

Wed Jul 01, 2015 11:31 am

Dear Normis!

Besides the probably driver/NTPd bug in the kernel I don't understand why the routers hang and why were not restarted by the hardware watchdog? As I remember Tilera processors have hardware watchdog and seems it doesn't function properly!

Should we trust in the watchdog in these cases? The main problem is that operators had to restart the routers on-site.
We have found how to fix the issue in the kernel, fix is coming soon.
 
wenasong
just joined
Posts: 20
Joined: Thu Jul 10, 2014 6:54 pm

Re: Leap second bug present on TILE devices?

Wed Jul 01, 2015 11:34 am

Happened to all our CCR today.

Multinational meltdown on our BGP networks, right now network engineer run around to physically reboot it.

:sad:
 
mstead
Member Candidate
Member Candidate
Posts: 113
Joined: Sat Mar 04, 2006 2:41 am

Re: Leap second bug present on TILE devices?

Wed Jul 01, 2015 11:52 am

Is it just me or is Normis incapable of say "sorry - we screwed up"?? I have read through all his replies and I don't see the apology anywhere - but frankly I am not in the least bit surprised....
 
User avatar
paoloaga
Member Candidate
Member Candidate
Posts: 222
Joined: Tue Mar 08, 2011 2:52 am
Location: Vaprio d'Agogna (NO) - Italy
Contact:

Re: Leap second bug present on TILE devices?

Wed Jul 01, 2015 12:00 pm

Let me explain a bit more what happened tonight in my situation:

Being aware of the problems that the leap second caused accidentally to my routers on 1st April, I disabled NTP ( /system ntp client set enabled=no ) on every router, except those I could reach easily. The result is that the routers with NTP disabled didn't crash.

The ones where NTP hasn't been disabled all crashed. Including those who hadn't NTP package installed. This means that if the NTP package causes the problem, there are chances that it causes something else to fail, for example it could be a BGP routing update that triggers the bug/crash and someway propagates it to other routers (this is just a guess).

The point of this post is to warn and emphasize that I found MANY routers in an irresponsive state and only a couple of them had the NTP package installed.
 
User avatar
pukkita
Trainer
Trainer
Posts: 2982
Joined: Wed Dec 04, 2013 11:09 am
Location: Spain

Re: Leap second bug present on TILE devices?

Wed Jul 01, 2015 12:09 pm

CCR1009, CCR1016 and CCR1036, 6.19 and 6.27, sntp client. None crashed.
Simplicity is the Ultimate Sophistication - Da Vinci
Getting the most out of this forum
 
User avatar
alexcherry
just joined
Posts: 20
Joined: Tue Jan 11, 2011 5:01 pm

Re: Leap second bug present on TILE devices?

Wed Jul 01, 2015 12:12 pm

Hi, we have Mikrotiks everywhere + around 20 CCRs. NTP is configured on all devices in our network.
Conclusion is below :
1. RBs with NTP client or SNTP client were not affected. versions from 6.13 to 6.28
2. Affected were only CCRs with version after 6.20 and NTP client running on it.
For example CCR with 6.18 was not affected, even it has NTP running.
 
User avatar
normis
MikroTik Support
MikroTik Support
Posts: 24206
Joined: Fri May 28, 2004 11:04 am
Location: Riga, Latvia

Re: Leap second bug present on TILE devices?

Wed Jul 01, 2015 12:14 pm

Hi, we have Mikrotiks everywhere + around 20 CCRs. NTP is configured on all devices in our network.
Conclusion is below :
1. RBs with NTP client or SNTP client were not affected. versions from 6.13 to 6.28
2. Affected were only CCRs with version after 6.20 and NTP client running on it.
For example CCR with 6.18 was not affected, even it has NTP running.
That is very interesting, maybe those units used a different NTP server? Because NTP package and Kernel were not changed in 6.18 or even since any v6 version
No answer to your question? How to write posts
 
eehan
just joined
Posts: 10
Joined: Fri Aug 18, 2006 2:45 am

Re: Leap second bug present on TILE devices?

Wed Jul 01, 2015 12:23 pm

Hi,

I can report the same issue here. I have two Routerboad CCR1036-12G-4S in operation. Only one was affected.

Both routerboards synchronise with external Australian official NTP pool servers.

The unit that froze up was running v6.28, wheras the unit that was not affected was running v6.15

The unit that froze up was a lab router. I did not realise there was an issue until I was in the lab 2 hours later and I could not login to anything. The CCR required a power-cycle to restore operation.
It wasn't until I checked the logs a few hours later that I saw the last entry before it froze was 09:59:57 (the leap second occured at 10am local time) and so here we are...
Last edited by eehan on Wed Jul 01, 2015 12:52 pm, edited 1 time in total.
 
User avatar
maznu
Member Candidate
Member Candidate
Posts: 197
Joined: Tue May 05, 2015 11:12 am
Location: Manchester, UK
Contact:

Re: Leap second bug present on TILE devices?

Wed Jul 01, 2015 12:29 pm

Two CCR1009-8G-1S-1S+

Both v6.27 with v3.22 firmware.
Both with "ntp" package installed and enabled.
Both with "NTP Server" enabled.
Both with "NTP Client" enabled and syncing to (different) NTP servers.

One crashed at 23:59:60.
One running ok.
Marek
 
User avatar
rudihnio
just joined
Posts: 5
Joined: Tue Aug 28, 2012 4:10 pm
Location: Luxembourg

Re: Leap second bug present on TILE devices?

Wed Jul 01, 2015 12:31 pm

Hi,

We have about 65 CCR1036 in service, with NTP-package enabled and NTP used to 2 MT1100AHx2 as NTP-Server.
About 90%-95% crashed at exactly 2h00 CET.

But we experienced 2 versions of the problem:
Release 6.24, all those rebooted and came back after watchdog-timer
Release 6.27 & 6.28 stuck, until Power-cycle on site.

Perhaps this info helps.

Steve
 
Clbh
just joined
Topic Author
Posts: 20
Joined: Tue May 12, 2015 5:22 am

Re: Leap second bug present on TILE devices?

Wed Jul 01, 2015 12:40 pm

2x CCR1036-8G-2S+ running 6.29.1 w/ NTP package installed (client enabled, server disabled)
1x CCR1016-12G-1S+ running 6.27 w/ NTP package installed (client enabled, server disabled)

All those configurations crashed for me and were not rebooted by the watchdog (required hard power cycle; thank god for out of band access & switched PDU outlets).
 
User avatar
czolo
Member
Member
Posts: 418
Joined: Fri Mar 04, 2005 9:49 am
Location: Poland (Warsaw)
Contact:

Re: Leap second bug present on TILE devices?

Wed Jul 01, 2015 1:05 pm

Hi Normis
In our network we are using NTP package an all routers. All our CCRs crashed today. I couldn't connect and only hard power reboot helps. I was trying to touch LCD, but nothing happend. It ssems that the dvices was in hang state.

When can we expect the fix?
| --= Czo|_o =--
| http://wifi4eu.pl
| Innovation in WiFi
 
User avatar
normis
MikroTik Support
MikroTik Support
Posts: 24206
Joined: Fri May 28, 2004 11:04 am
Location: Riga, Latvia

Re: Leap second bug present on TILE devices?

Wed Jul 01, 2015 1:09 pm

Status update!

We have found that this issue is related to a Linux Kernel issue that was patched in Linux Kernel v3.4 (RouterOS v6 uses Linux Kernel v3.3.5).

The problem happens only if the following criteria is met:
1) 64bit RouterOS (only tile)
2) any RouterOS v6.x
3) installed and synchronized NTP client from NTP package (NOT the default SNTP client)
4) synchronization to server that have proper Leap Second implementation, not just time adjustment on next synchronization

We have currently not been able to reproduce the issue on the default SNTP client (non NTP package) or confirm that problem doesn't happen on older RouterOS versions - we are still working on this issue, so we might confirm those later.
No answer to your question? How to write posts
 
User avatar
maznu
Member Candidate
Member Candidate
Posts: 197
Joined: Tue May 05, 2015 11:12 am
Location: Manchester, UK
Contact:

Re: Leap second bug present on TILE devices?

Wed Jul 01, 2015 1:23 pm

[quote="normis"]4) synchronization to server that have proper Leap Second implementation, not just time adjustment on next synchronization
[/quote]

This is interesting.

Of the two identical CCRs we have, the one that crashed was synchronised to our stratum 1 DCF time server. Our DCF time server was running at stratum 2 at the time, because of signal quality problems. During the night, our "DCF" server was actually synchronised via NTP to our stratum 1 GPS+PPS time server instead.

The CCR that kept running was synchronised to our stratum 1 GPS+PPS time server.

Both our DCF and GPS+PPS time servers run the same version of ntpd, on Debian Linux.
Marek
 
marrold
Member
Member
Posts: 406
Joined: Wed Sep 04, 2013 10:45 am

Re: Leap second bug present on TILE devices?

Wed Jul 01, 2015 1:33 pm

that post is from April?
I'm a SIP / VoIP engineer. Feel free to ask questions...
 
User avatar
macgaiver
Forum Guru
Forum Guru
Posts: 1721
Joined: Wed May 18, 2005 5:57 pm
Location: Sol III, Sol system, Sector 001, Alpha Quadrant

Re: Leap second bug present on TILE devices?

Wed Jul 01, 2015 1:41 pm

same leap second problem, only misconfiguration in those particular NTP servers.
early warning that was missed by everyone :) But you must admit - the choice of date was just too good :)
With great knowledge comes great responsibility, because of ability to recognize id... incompetent people much faster.
 
marting
Member Candidate
Member Candidate
Posts: 160
Joined: Thu Aug 21, 2014 2:07 pm

Re: Leap second bug present on TILE devices?

Wed Jul 01, 2015 1:44 pm

that post is from April?
When you scroll down, you will find more recent posts from today. The reason is the same. On 1st of April there was a leap second insertion on some Italian nameservers and today it was worldwide: http://forum.mikrotik.com/viewtopic.php ... 99#p488599
My CCR1036-12G-4S also crashed completely (NTP package enabled).
 
eehan
just joined
Posts: 10
Joined: Fri Aug 18, 2006 2:45 am

Re: Leap second bug present on TILE devices?

Wed Jul 01, 2015 2:15 pm

The problem happens only if the following criteria is met:
1) 64bit RouterOS (only tile)
2) any RouterOS v6.x
3) installed and synchronized NTP client from NTP package (NOT the default SNTP client)
4) synchronization to server that have proper Leap Second implementation, not just time adjustment on next synchronization
Can you please explain what you mean by "server that have proper Leap Second implementation" ? That is, how does the "proper" implementation differ from "time adjustment on next synchronization" ?
 
User avatar
rudihnio
just joined
Posts: 5
Joined: Tue Aug 28, 2012 4:10 pm
Location: Luxembourg

Re: Leap second bug present on TILE devices?

Wed Jul 01, 2015 2:36 pm

Status update!

4) synchronization to server that have proper Leap Second implementation, not just time adjustment on next synchronization


Does the MT1100AHx2 on release 5.26 have this proper Leap Second implementation.

What changed between 6.24 where we had 50 CCR's rebooted by watchdog timer and newer release, as those needed power cycle on site?

Thanks,
Steve
 
User avatar
normis
MikroTik Support
MikroTik Support
Posts: 24206
Joined: Fri May 28, 2004 11:04 am
Location: Riga, Latvia

Re: Leap second bug present on TILE devices?

Wed Jul 01, 2015 3:07 pm

Can you please explain what you mean by "server that have proper Leap Second implementation" ? That is, how does the "proper" implementation differ from "time adjustment on next synchronization" ?
This:
The NTP packet includes a leap second flag, which informs the user that a leap second is imminent. This, among other things, allows the user to distinguish between a bad measurement that should be ignored and a genuine leap second that should be followed.
No answer to your question? How to write posts
 
User avatar
normis
MikroTik Support
MikroTik Support
Posts: 24206
Joined: Fri May 28, 2004 11:04 am
Location: Riga, Latvia

Re: Leap second bug present on TILE devices?

Wed Jul 01, 2015 3:07 pm

Does the MT1100AHx2 on release 5.26 have this proper Leap Second implementation.
only CCR was affected by this. RB1100 and all other devices worked fine
No answer to your question? How to write posts
 
User avatar
rudihnio
just joined
Posts: 5
Joined: Tue Aug 28, 2012 4:10 pm
Location: Luxembourg

Re: Leap second bug present on TILE devices?

Wed Jul 01, 2015 3:10 pm

Does the MT1100AHx2 on release 5.26 have this proper Leap Second implementation.
only CCR was affected by this. RB1100 and all other devices worked fine
Normis, what I meant, has the NTP-Server based on release 5.26 on MT1100AHx2 this proper SERVER implementation?
 
japaeye4u
just joined
Posts: 6
Joined: Tue Jun 20, 2006 1:21 pm
Location: Rio de Janeiro

Re: Leap second bug present on TILE devices?

Wed Jul 01, 2015 3:26 pm

We have about 150 CCRs (1009/1016/1036) and 24 CCRs with version above 6:24 crashed.
All ( about 150) using NTP client package and enable.
I am now looking if there CCR with the same versions that not crashed. Soon return with new information.
 
jakubj
just joined
Posts: 5
Joined: Mon Nov 07, 2011 5:58 pm
Contact:

Re: Leap second bug present on TILE devices?

Wed Jul 01, 2015 3:31 pm

We have two CCR1016-12G running Router OS v6.27 firmware v3.22 one crashed and other did not. So not sure why one but not the other... odd.
 
petrisimo
just joined
Posts: 14
Joined: Sat Apr 06, 2013 8:15 pm

Re: Leap second bug present on TILE devices?

Wed Jul 01, 2015 3:59 pm

Hello Mikrotik,

which ROS and firmware it is safe to use on CCR1036-8G-2S+EM ?

+++

> system health print
fan-mode: auto
use-fan: main
active-fan: main
use-fan2: main
active-fan2: main
cpu-overtemp-check: yes
cpu-overtemp-threshold: 100C
cpu-overtemp-startup-delay: 1m
voltage: 23.6V
current: 1906mA
temperature: 40C
cpu-temperature: 54C
power-consumption: 44.9W
fan1-speed: 10155RPM
fan2-speed: 9953RPM

> system resource print
uptime: 30m7s
version: 6.28
build-time: Apr/15/2015 15:18:31
free-memory: 3745.2MiB
total-memory: 3966.7MiB
cpu: tilegx
cpu-count: 36
cpu-frequency: 1200MHz
cpu-load: 0%
free-hdd-space: 703.6MiB
total-hdd-space: 1024.0MiB
architecture-name: tile
board-name: CCR1036-8G-2S+
platform: MikroTik

> system routerboard print
routerboard: yes
model: CCR1036-8G-2S+
serial-number: 52A002DC3D1F
current-firmware: 3.18
upgrade-firmware: 3.22

> system package print
Flags: X - disabled
# NAME VERSION SCHEDULED
0 ntp 6.28
1 routeros-tile 6.28
2 system 6.28
3 X wireless-fp 6.28
4 X ipv6 6.28
5 X wireless 6.28
6 X hotspot 6.28
7 dhcp 6.28
8 mpls 6.28
9 routing 6.28
10 ppp 6.28
11 security 6.28
12 advanced-tools 6.28
13 X openflow 6.28
14 multicast 6.28
 
denke
just joined
Posts: 20
Joined: Sun Jun 27, 2010 12:49 pm
Location: Hungary

Re: Leap second bug present on TILE devices?

Wed Jul 01, 2015 4:49 pm

Hello Normis!

I have another question for you:

We have 2 CCR1036 which were connected to the internet, with NTPD enabled and syced at the time of the incident. Both were affected, both had WatchDog enabled.

As far as I know Tile cpus has hw watchdog integrated, and it has support in the linux kernel, so the watchdog feature should have been hardware based.

Why didn't the watchdog reset the routers when they were both frozen solid?
 
royalpublishing
Frequent Visitor
Frequent Visitor
Posts: 50
Joined: Mon Sep 23, 2013 5:47 pm

Re: Leap second bug present on TILE devices?

Wed Jul 01, 2015 4:54 pm

All 6 of my CCR's crashed and burned this morning because of the leap second issue so I came into an office of nothing working and all of my branch offices being down and all the routers required a cold boot. Fun times.
 
Beeski
newbie
Posts: 34
Joined: Sat Apr 02, 2005 4:42 pm

Re: Leap second bug present on TILE devices?

Wed Jul 01, 2015 5:11 pm

We have over 20 CCR's in production.
The only one that locked up was acting as an NTP Server.
All other CCR's are NTP clients that sync with our Cisco Core/Edge router.
 
User avatar
IPANetEngineer
Trainer
Trainer
Posts: 1053
Joined: Fri Aug 10, 2012 6:46 am
Location: Jackson, MS, USA
Contact:

Re: Leap second bug present on TILE devices?

Wed Jul 01, 2015 5:37 pm

It would have been helpful to have a patch from MikroTik, but for most of the customer networks we manage, we began leap second planning a while ago and removed any equipment from an NTP server that was suspect until the leap second passed and then re-enabled it. That proved to be a very simple, yet effective mitigation technique to script even on some of the larger networks we work on (50,000+ network devices)
Global - MikroTik Support & Consulting - English | Francais | Español | Portuguese +1 855-645-7684
https://iparchitechs.com/services/mikro ... l-support/ mikrotiksupport@iparchitechs.com
 
User avatar
paoloaga
Member Candidate
Member Candidate
Posts: 222
Joined: Tue Mar 08, 2011 2:52 am
Location: Vaprio d'Agogna (NO) - Italy
Contact:

Re: Leap second bug present on TILE devices?

Wed Jul 01, 2015 6:07 pm

It would have been helpful to have a patch from MikroTik, but for most of the customer networks we manage, we began leap second planning a while ago and removed any equipment from an NTP server that was suspect until the leap second passed and then re-enabled it. That proved to be a very simple, yet effective mitigation technique to script even on some of the larger networks we work on (50,000+ network devices)
This won't protect you from unexpected (wrong) leap seconds. A few public NTP servers here in Italy have been affected during March and applied the leap second on 1st of April. A deeper investigation related the cause to a bug into an hardware clock driver...

This won't protect also from a malicious hacker who could break into a public NTP server and crash the whole network.
 
rkj
just joined
Posts: 15
Joined: Sun Jun 11, 2006 7:38 pm

Re: Leap second bug present on TILE devices?

Wed Jul 01, 2015 6:12 pm

Leap Second was a one time only event. It has passed. You can use any release now.

We will make a fix today that will make sure you don't see this issue again in 2-3 years, when next leap second happens
One CCR crashed just 10 minutes ago, so it might not be a one time event.

Also, some users reported the same issue with SXT using RouterOS 6.27 starting 0000 UTC, and some reported while not using NTP.

So, using NTP on CCR is sure to be the largest contributing factor, but it's not 100% limited to that scope.
 
lele
just joined
Posts: 5
Joined: Thu Apr 02, 2015 1:20 am

Re: Leap second bug present on TILE devices?

Thu Jul 02, 2015 12:38 am

So, using NTP on CCR is sure to be the largest contributing factor, but it's not 100% limited to that scope.
While unrelated bugs can't be ruled out, this specific issue is tied to the processing of a leap second event from the NTP subsystem to the linux kernel.

So it can not happen if
  1. you are not using some kind of NTP client
  2. there are no leap seconds (real or spurious) propagated through NTP
Also, the specific occurrence of april 1 can not repeat now, because that ntpd bug (there was an ntpd bug behind the spurious leap-second propagation) required a real leap-second to happen within 3 months.
 
scampbell
Trainer
Trainer
Posts: 457
Joined: Thu Jun 22, 2006 5:20 am
Location: Wellington, NZ
Contact:

Re: Leap second bug present on TILE devices?

Thu Jul 02, 2015 2:38 am

Little too late, don't you think?
For this one, yes, but next leap second will be added in around 2 years.
Could you please tell me if you had NTP package on all the servers, or you used SNTP?
I can confirm CCR's with SNTP were OK and CCR's with NTP crashed and became unresponse.
MTCNA, MTCWE, MTCRE, MTCTCE, MTCSE, MTCINE, Trainer
___________________
Mikrotik Distributor - New Zealand
http://www.campbell.co.nz
 
User avatar
IPANetEngineer
Trainer
Trainer
Posts: 1053
Joined: Fri Aug 10, 2012 6:46 am
Location: Jackson, MS, USA
Contact:

Re: Leap second bug present on TILE devices?

Thu Jul 02, 2015 3:42 am

It would have been helpful to have a patch from MikroTik, but for most of the customer networks we manage, we began leap second planning a while ago and removed any equipment from an NTP server that was suspect until the leap second passed and then re-enabled it. That proved to be a very simple, yet effective mitigation technique to script even on some of the larger networks we work on (50,000+ network devices)
This won't protect you from unexpected (wrong) leap seconds. A few public NTP servers here in Italy have been affected during March and applied the leap second on 1st of April. A deeper investigation related the cause to a bug into an hardware clock driver...

This won't protect also from a malicious hacker who could break into a public NTP server and crash the whole network.
Certainly getting the code patched is the ideal, but planning for a known network issue that will happen at a specific date and time and defending against daily attacks are two different animals.
Global - MikroTik Support & Consulting - English | Francais | Español | Portuguese +1 855-645-7684
https://iparchitechs.com/services/mikro ... l-support/ mikrotiksupport@iparchitechs.com
 
darkorigins
just joined
Posts: 2
Joined: Thu Jul 02, 2015 9:28 am

Re: Leap second bug present on TILE devices?

Thu Jul 02, 2015 9:34 am

Having also suffered from all our CCRs locking solid (no network / serial or LCD) what I would now like to know is;

What happened to the watchdog? This was enabled on all devices but failed to save all but a couple of them.

Mark
 
wildbill442
Forum Guru
Forum Guru
Posts: 1050
Joined: Wed Dec 08, 2004 7:29 am
Location: Sacramento, CA

Re: Leap second bug present on TILE devices?

Thu Jul 02, 2015 7:48 pm

I can confirm that my CCRs running NTP-server package all Crashed @ 5:00PM PST Jun/29/15 (00:00 GMT).

Only fix was physical reboot of routers.

CCR's using NTP Client for time set were not effected.

All CCRs were CCR1036-12G-4S
William Burnett
Network Engineer
 
User avatar
normis
MikroTik Support
MikroTik Support
Posts: 24206
Joined: Fri May 28, 2004 11:04 am
Location: Riga, Latvia

Re: Leap second bug present on TILE devices?

Fri Jul 03, 2015 8:40 am

No answer to your question? How to write posts

Who is online

Users browsing this forum: MSN [Bot] and 85 guests