Community discussions

MikroTik App
 
User avatar
paoloaga
Member Candidate
Member Candidate
Topic Author
Posts: 227
Joined: Tue Mar 08, 2011 2:52 am
Location: Lugano - Switzerland
Contact:

all CCR crashed

Wed Apr 01, 2015 4:53 am

I am writing from my mobile phone since I still didn't revive all my network. Suddenly, soon after 00:00 UTC (maybe between 00:00 and 00:15) all the CCR routers in my network crashed hard at the same time. The only way to make them working again is to physically disconnect power.

It happened on ~60 routers at the same time, various models, the only thing in common: they are all Ccr.

I read quickly online that an unexpected leap second caused troubles to various network equipments. I don't know yet if this is related.

Is there any info/advice to avoid it happening again? Our network is big and there are routers installed far apart.
 
User avatar
BartoszP
Forum Guru
Forum Guru
Posts: 2879
Joined: Mon Jun 16, 2014 1:13 pm
Location: Poland

Re: all CCR crashed

Wed Apr 01, 2015 9:04 am

It is the 1'st April's joke or IMHO there was an attack on your equipment.
I could not imagine that different CCR models, from different production batches, with different uptimes, with different power source, with ....... are crashing all at same time. There should be external reason.
 
User avatar
macgaiver
Forum Guru
Forum Guru
Posts: 1764
Joined: Wed May 18, 2005 5:57 pm
Location: Sol III, Sol system, Sector 001, Alpha Quadrant

Re: all CCR crashed

Wed Apr 01, 2015 9:44 am

Brace yourself
You do not have the required permissions to view the files attached to this post.
 
User avatar
paoloaga
Member Candidate
Member Candidate
Topic Author
Posts: 227
Joined: Tue Mar 08, 2011 2:52 am
Location: Lugano - Switzerland
Contact:

Re: all CCR crashed

Wed Apr 01, 2015 3:40 pm

It's not an april fool. It just happened in the wrong day of the year. Unfortunately I spent all the night back and forth all my nodes to restart the routers.

I would exclude an attack because it would have been made starting from the ending nodes backwards in order to reach all the routers, and who did it would have known the exact topology of the network. This happened to exactly ALL the CCR at the same time.

If it was a DOS attack, anyway, it means that every CCR router is in danger and can be crashed anytime.
 
User avatar
paoloaga
Member Candidate
Member Candidate
Topic Author
Posts: 227
Joined: Tue Mar 08, 2011 2:52 am
Location: Lugano - Switzerland
Contact:

Re: all CCR crashed

Wed Apr 01, 2015 3:42 pm

Or it could be something caused by a bug in the IGP routing, (OSPF or BGP). It's the only thing all the router have in common.
 
User avatar
BartoszP
Forum Guru
Forum Guru
Posts: 2879
Joined: Mon Jun 16, 2014 1:13 pm
Location: Poland

Re: all CCR crashed

Wed Apr 01, 2015 4:27 pm

Yours CCRs could be hacked earlier and poisoned today.
 
User avatar
paoloaga
Member Candidate
Member Candidate
Topic Author
Posts: 227
Joined: Tue Mar 08, 2011 2:52 am
Location: Lugano - Switzerland
Contact:

Re: all CCR crashed

Wed Apr 01, 2015 4:40 pm

Yours CCRs could be hacked earlier and poisoned today.
It could be everything. If this happened only to me and nobody else, then it's either an attack directed to my network or something triggered by my setup.

All the other routers, which include RB1100 (various models), RB1200, RB2011 (various models), RB750, RB951, CRS (various models), Netmetal, Netbox, SXT, QRT were not affected. I may think this trouble is related to the TILE architecture.
 
lele
just joined
Posts: 18
Joined: Thu Apr 02, 2015 1:20 am

Re: all CCR crashed

Thu Apr 02, 2015 1:33 am

Since it has been mentioned, there *is* something weird concerning leap seconds that *could* be related.

We had a number of Linux hosts incorrectly adding a leap second last night:
2015-04-01T01:59:59.003687+02:00 fe-a-01 kernel: [9475817.256006] Clock: inserting leap second 23:59:60 UTC
we are still investigating the cause. I also heard rumors of unrelated (ie: not MT) network devices having issues at midnight UTC because of spurious NTP leap seconds insertions, but can't comment much on those, will ask for more details.
 
User avatar
doneware
Trainer
Trainer
Posts: 647
Joined: Mon Oct 08, 2012 8:39 pm
Location: Hungary

Re: all CCR crashed

Thu Apr 02, 2015 11:25 am

not sure whether this will apply to each device running linux, but there are vendors out there pretty much concerned about the "leap seconds bug"
https://access.redhat.com/articles/15145
 
lele
just joined
Posts: 18
Joined: Thu Apr 02, 2015 1:20 am

Re: all CCR crashed

Thu Apr 02, 2015 12:13 pm

There is at least a report of spurious leap seconds observed in Italy on march 31, 23.59.60, with a possible explaination:

http://lists.ntp.org/pipermail/pool/201 ... 07338.html

Given what happened. I would try and simulate the addition of a leap second on the CCRs well before june 30.

cheers,
L.
 
AlexS
Member Candidate
Member Candidate
Posts: 272
Joined: Thu Oct 10, 2013 7:21 am

Re: all CCR crashed

Wed May 06, 2015 7:45 am

Any news on this ?
 
User avatar
paoloaga
Member Candidate
Member Candidate
Topic Author
Posts: 227
Joined: Tue Mar 08, 2011 2:52 am
Location: Lugano - Switzerland
Contact:

Re: all CCR crashed

Wed May 06, 2015 1:34 pm

Any news on this ?
No news. I wrote a script that connects and removes NTP servers from the configuration of every CCR router on the last hour / last day of the month and adds it back an hour after the first day of the month.

I don't have enough resources to mock a NTP server and introduce a fake leap second to reproduce the bug.

There is an open ticket [Ticket#2015040166000161] where I provided all the details, supouts, and everything I could document about this.
 
User avatar
paoloaga
Member Candidate
Member Candidate
Topic Author
Posts: 227
Joined: Tue Mar 08, 2011 2:52 am
Location: Lugano - Switzerland
Contact:

Re: all CCR crashed

Wed Jul 01, 2015 2:48 am

Tonight a leap second will be introduced by NTP servers from all the world. I disabled NTP on all CCRs, excluding a few ones that I have at hand. Let's see how it is going this time.
 
jmorby
just joined
Posts: 13
Joined: Sat Jan 04, 2014 6:06 pm

Re: all CCR crashed

Wed Jul 01, 2015 3:21 am

Tonight a leap second will be introduced by NTP servers from all the world. I disabled NTP on all CCRs, excluding a few ones that I have at hand. Let's see how it is going this time.
Well at about 00:00 UTC we had pretty much all of our CCRs crash and lock solid .. at the same time

Thankfully we've already removed (and ebay'd) most of the CCRs so these were just a few edge cases and not the whole network

Still a pita :(
 
User avatar
paoloaga
Member Candidate
Member Candidate
Topic Author
Posts: 227
Joined: Tue Mar 08, 2011 2:52 am
Location: Lugano - Switzerland
Contact:

Re: all CCR crashed

Wed Jul 01, 2015 3:30 am

I confirm that the bug is due to the leap second. All the CCRs where the NTP configuration has not been removed have crashed very hard.

Last time it happened because of a bug in a few italian NTP servers, now that the leap second was introduced officially, it might have happened worldwide.

There is a ticket opened for this issue but it hasn't been taken too much in consideration: [Ticket#2015040166000161] .

I think it should be investigated and solved because it's critical and dangerous. It happens only on CCR routers and not on MIPSBE / PPC.
 
coylh
Member Candidate
Member Candidate
Posts: 159
Joined: Tue Jul 12, 2011 12:11 am

Re: all CCR crashed

Wed Jul 01, 2015 3:37 am

I also had all CCRs reboot themselves by watchdog at 17:00 PST (probably after locking up). Even the one running 6.29.1 crashed. :-x

Really disappointing.
 
dhoulbrooke
Trainer
Trainer
Posts: 65
Joined: Sun Apr 19, 2015 7:24 am
Location: Whakatāne, New Zealand

Re: all CCR crashed

Wed Jul 01, 2015 3:53 am

Same here :(

3x CCR1009's all hard locked no winbox/serial. Had to go on-site and power cycle.

2x Running 6.29.1 + NTP package (whether that makes any diff). And 1x running latest rc22.
 
mstead
Member Candidate
Member Candidate
Posts: 114
Joined: Sat Mar 04, 2006 2:41 am

Re: all CCR crashed

Wed Jul 01, 2015 4:00 am

I can confirm all my border CCR crashed at 01:00BST. The common factor was BGP and NTP server. All other CCR in my network were just using OSPF and NTP client. What a pile of shite - seriously!!!!

All running v6.27 and were CCR1036-8G-1S
 
User avatar
andressis2k
Member Candidate
Member Candidate
Posts: 104
Joined: Mon Apr 18, 2011 12:47 am

Re: all CCR crashed

Wed Jul 01, 2015 4:06 am

CCR1016-12G
RouterOS 6.27
Rouberboot 3.19
NTP Server running

At 0:00 UTC (2:00 Spain local time), it died

Not reachable by MAC ping, or ping. All interfaces working and blinking. Can't see it on IP > Neighbors from other devices.

After rebooting, all working again.

We've 3 more CCR1016, but they're 12S-S+. All them remained working fine (they had no NTP package installed)

Any explanation from Mikrotik?
Last edited by andressis2k on Wed Jul 01, 2015 4:47 am, edited 1 time in total.
 
IntrusDave
Forum Guru
Forum Guru
Posts: 1286
Joined: Fri May 09, 2014 4:36 am
Location: Rancho Cucamonga, CA

Re: all CCR crashed

Wed Jul 01, 2015 4:06 am

WTF!?

I had 7 CCR1009's crash at 5:00PST!

What is going on?
 
mstead
Member Candidate
Member Candidate
Posts: 114
Joined: Sat Mar 04, 2006 2:41 am

Re: all CCR crashed

Wed Jul 01, 2015 4:14 am

Can people please edit their posts to include ROS version and if BGP, NTP server etc was running?
 
coylh
Member Candidate
Member Candidate
Posts: 159
Joined: Tue Jul 12, 2011 12:11 am

Re: all CCR crashed

Wed Jul 01, 2015 4:22 am

I have 28 tile devices running NTP (and no routing protocols), versions are between 6.22 and 6.29.1. All crashed, though 25 of them rebooted themselves via watchdog timer. 3 needed power cycling.
 
tagno25
newbie
Posts: 38
Joined: Wed Feb 25, 2009 11:24 pm
Location: Kansas City, MO

Re: all CCR crashed

Wed Jul 01, 2015 4:39 am

Have three CCRs, and they all crashed. They were each running a different version (6.26, 6.28, and 6.29)

Have a customer that has a CCR that didn't lockup, but don't know if it is poling ntp. It could also be that the others caused it to not be able to update it's clock, so it didn't get the time 23:59:60.
 
User avatar
paoloaga
Member Candidate
Member Candidate
Topic Author
Posts: 227
Joined: Tue Mar 08, 2011 2:52 am
Location: Lugano - Switzerland
Contact:

Re: all CCR crashed

Wed Jul 01, 2015 5:10 am

I just got back from the trip around my network nodes (it took about 2 hours by car. Fortunately I disabled in time the NTP server on most of the routers, I have more than 60 installed and some are very far)...

Same here :(
3x CCR1009's all hard locked no winbox/serial. Had to go on-site and power cycle.
2x Running 6.29.1 + NTP package (whether that makes any diff). And 1x running latest rc22.
My routers have 6.26, 6.27, 6.28, 6.29, 6.29.1 and they all crashed. All kinds of CCR (1009, 1016, 1032 on various flavors).

They have all BGP routing protocol, only a few have the full routing table, while most of them have only my internal routes.

They have all the clock synchronized through NTP.

It happened both to routers with and without NTP package, at 00:00 UTC (02:00 local time).


The funny thing is that by chance it happened for the first time on the 1st of April and other forum users thought that it was an april fool's joke.

Mikrotik didn't take the ticket seriously because it said it happened only to me. (As I stated it was by chance, due to a driver bug on public italian NTP services, and it hasn't been easy to find out!).
 
madman2233
just joined
Posts: 6
Joined: Wed Feb 06, 2013 1:31 am

Re: all CCR crashed

Wed Jul 01, 2015 5:31 am

20+ CCRs all dead.


Time to replace all my mikrotiks.
 
umraf
just joined
Posts: 2
Joined: Fri Aug 23, 2013 4:48 am

Re: all CCR crashed

Wed Jul 01, 2015 5:38 am

I have two CCR with NTP package installed. Both crashed today.

No NTP package - no crash.
 
texmeshtexas
Member Candidate
Member Candidate
Posts: 151
Joined: Sat Oct 11, 2008 11:17 pm

Re: all CCR crashed

Wed Jul 01, 2015 7:33 am

All our CCR's locked up right at 7pm CDT. 3 x86 units did not, MT493 units did not.
This is very strange.
 
Abdock
Member Candidate
Member Candidate
Posts: 261
Joined: Sun Sep 25, 2005 10:50 pm

Re: all CCR crashed

Wed Jul 01, 2015 8:08 am

Same here, almost all CCR down, and the only way was to power reboot. NTP, BGP, OSPF, var 6.29.1, OSPF 3,

does setting up watchdog helps ?
 
User avatar
antondollmaier
just joined
Posts: 2
Joined: Fri Mar 07, 2014 4:09 pm

Re: all CCR crashed

Wed Jul 01, 2015 8:36 am

all CCR down here as well, until manual power cycle. 6.29, no BGP, static routing, NTP as client from 192.53.103.104/192.53.103.108.
 
andersonlich
Frequent Visitor
Frequent Visitor
Posts: 55
Joined: Thu Feb 26, 2009 1:05 pm

Re: all CCR crashed

Wed Jul 01, 2015 8:53 am

i got 4 ccr1036 v6.27, 4 of them crashed. the others ccr is safe, among them using v6.29. and v6.19.
:lol:
 
User avatar
macgaiver
Forum Guru
Forum Guru
Posts: 1764
Joined: Wed May 18, 2005 5:57 pm
Location: Sol III, Sol system, Sector 001, Alpha Quadrant

Re: all CCR crashed

Wed Jul 01, 2015 8:59 am

OK, first of all sorry about previous joke poster, cause looks like i had similar problem, but only on few CCRs.

Am i right to say that only CCRs with NTP package installed and configured was affected?
 
User avatar
LatinSuD
Member Candidate
Member Candidate
Posts: 181
Joined: Wed Jun 29, 2005 1:05 pm
Location: Spain
Contact:

Re: all CCR crashed

Wed Jul 01, 2015 9:05 am

2 CCR crashed, i hope the bug will get fixed soon.

More details:
  • - CCR1036-12G-4S
  • - version 6.27
  • - NTP package installed. NTP client configured to a public server
Last edited by LatinSuD on Wed Jul 01, 2015 9:29 am, edited 2 times in total.
 
andersonlich
Frequent Visitor
Frequent Visitor
Posts: 55
Joined: Thu Feb 26, 2009 1:05 pm

Re: all CCR crashed

Wed Jul 01, 2015 9:07 am

yes, it seems so, i got v6.27 which NTP package installed and the CCR was crashed.
but the others which have v6.27 without NTP package installed, they are normal.
but some of my CCR using v6.19 with NTP package installed are not crashed.

OK, first of all sorry about previous joke poster, cause looks like i had similar problem, but only on few CCRs.

Am i right to say that only CCRs with NTP package installed and configured was affected?
 
User avatar
macgaiver
Forum Guru
Forum Guru
Posts: 1764
Joined: Wed May 18, 2005 5:57 pm
Location: Sol III, Sol system, Sector 001, Alpha Quadrant

Re: all CCR crashed

Wed Jul 01, 2015 9:09 am

yes, it seems so, i got v6.27 which NTP package installed and the CCR was crashed.
but the others which have v6.27 without NTP package installed, they are normal.
but some of my CCR using v6.19 with NTP package installed are not crashed.
And all were synchronized (configured) to public server?
 
lele
just joined
Posts: 18
Joined: Thu Apr 02, 2015 1:20 am

Re: all CCR crashed

Wed Jul 01, 2015 9:29 am

Given what happened. I would try and simulate the addition of a leap second on the CCRs well before june 30.
Now, tell me I did not warn you.
 
User avatar
paoloaga
Member Candidate
Member Candidate
Topic Author
Posts: 227
Joined: Tue Mar 08, 2011 2:52 am
Location: Lugano - Switzerland
Contact:

Re: all CCR crashed

Wed Jul 01, 2015 9:30 am

Am i right to say that only CCRs with NTP package installed and configured was affected?
Most of my CCRs don't have the NTP package installed, and they all crashed badly.
 
1obro
just joined
Posts: 4
Joined: Fri Oct 10, 2014 7:00 pm

Re: all CCR crashed

Wed Jul 01, 2015 9:31 am

I have one ccr (ccr1036-8g-2s+) running BGP, OSPF, OSPF3 on v6.19 (uptime 200+ days)

Yesterday, I saw this thread while searching for possible issues with the upcoming leap second,
disabled the ntp client on time.

It did not crash.

Maybe the time on the router is not that important,
maybe, we could let it disabled?
Could a time drift cause any problems for example BGP?
 
User avatar
normis
MikroTik Support
MikroTik Support
Posts: 26376
Joined: Fri May 28, 2004 11:04 am
Location: Riga, Latvia

Re: all CCR crashed

Wed Jul 01, 2015 9:34 am

I have one ccr (ccr1036-8g-2s+) running BGP, OSPF, OSPF3 on v6.19 (uptime 200+ days)

Yesterday, I saw this thread while searching for possible issues with the upcoming leap second,
disabled the ntp client on time.

It did not crash.

Maybe the time on the router is not that important,
maybe, we could let it disabled?
Could a time drift cause any problems for example BGP?
you can also use "IP Cloud" automatic time. It will not be as precise, within 1-2 seconds usually.
 
User avatar
paoloaga
Member Candidate
Member Candidate
Topic Author
Posts: 227
Joined: Tue Mar 08, 2011 2:52 am
Location: Lugano - Switzerland
Contact:

Re: all CCR crashed

Wed Jul 01, 2015 9:37 am

Having the time synchronized is a *must*, because otherwise it's difficult (if not impossible) compare logs of multiple devices, and in a complex network it helps debug troubles quickly.

Normis ip cloud time synchronization is not an option for many reasons, and it's not always applicable.
 
User avatar
normis
MikroTik Support
MikroTik Support
Posts: 26376
Joined: Fri May 28, 2004 11:04 am
Location: Riga, Latvia

Re: all CCR crashed

Wed Jul 01, 2015 9:44 am

Having the time synchronized is a *must*, because otherwise it's difficult (if not impossible) compare logs of multiple devices, and in a complex network it helps debug troubles quickly.

Normis ip cloud time synchronization is not an option for many reasons, and it's not always applicable.
of course. but better than no NTP at all
 
User avatar
normis
MikroTik Support
MikroTik Support
Posts: 26376
Joined: Fri May 28, 2004 11:04 am
Location: Riga, Latvia

Re: all CCR crashed

Wed Jul 01, 2015 9:52 am

Let's move this discussion to more specific title: http://forum.mikrotik.com/viewtopic.php?f=2&t=98138

Who is online

Users browsing this forum: donkeyKong, jmszuch1 and 17 guests