A CCR1036-8G-2S+, with about 180 days uptime. Mysteriously just starts flapping all ports (Ethernet + SFP), and on reboot the device is just dead? Reboots, shows Booting Router → Loading Kernel, and then just reboots, over and over?
You call this rubbish enterprise ready? This will be the last CCR I purchase, ever. A D-Link ADSL router is more stable than your “flagship” enterprise equipment. I guess at the end of the day, you get what you pay for yes.
Let me rather go and buy that Cisco ASR like I should have done right from the start.
It probably is possible to re-install with netinstall yes… There’s just a few problems with doing that…
The router is some 9500km away from me,
Netinstall is Windows only, there’s no Linux equivalents (such as bootp/tftp, like 99% of all other SANE network equipment vendors support),
Now I have to pay over EUR200 to ship the routers (both ways) to reinstall with Netinstall because I don’t have (and never will have) Windows of X-Windows servers in a datacenter,
It still doesn’t change the fact that the router just point blank, out of the blue, started flapping all ports,
It still doesn’t change the fact that the router just point blank, refused to reboot successfully
Your equipment, is not reliable.
EDIT: From 3 CCR’s purchased, 1 has been RMAed already, 2nd one now died. 2/3 failure rate - rather concerning to say the least.
Any device can die, even expensive Cisco/Juniper kit. We have over 50 CCR’s in our network for the last three years, not one has died. The nice thing is that if one were to die, it will not cost an arm and leg to replace.
Not within 180 days of purchasing it. And not 2 out of 3 devices purchased either (of which one has already been RMA’ed too).
Frankly, whilst I’m sure it does (and can) happen, being a CCNA and CCNP and working with Cisco my entire life basically, I have not once had a Cisco device die on me personally.
Only three devices doesn’t sound like statistically significant…
Have you bought them more or less at the same time? Are all of them of the same model? Faulty batch, perhaps?
That’s your opinion and you’re entitled to it.
These forums are FULL of complaints about ethernet ports just starting to flap. Across a LOT of different type of devices too (I just had a RB912 a few days ago, also just randomly flapping ethernet ports but that one thankfully still had it’s software and rebooted successfully). CCR’s are also riddled with the one problem after the next regarding their SFP(+) ports, again, just have a look on this very forum.
The last time I upgraded software on a CCR, I uploaded the packages, rebooted, and the router never came back again. After putting a console on it, no software on the router at all. So even a SOFTWARE UPDATE on a CCR, is buggy… Cisco (for example), you upload two images - the new one and the existing old one. If the new one fails to load, the device falls back and loads the old image, simple. Hell, if the packages aren’t uploaded to the MT devices correctly, a SIMPLE MD5 checksum is enough for MT to dump the package file and not attempt to update the router, but nooooo. Then there’s more complicated solutions too where NAND storage could be made to two partitions (one being for recovery software), again, MT can’t be bothered.
MT is frankly more interested in dumping the one feature after the next onto ROS, instead of FIXING problems (and simple things, I’m not even talking about complicated things). Then I’m not even going to START about “netinstall” being your ONLY recourse to recover a router either.
i spent my 20 years of experience building BF networks with Cisco and Juniper. My last 10 years were infected with RouterOS. there are good sides, there are bad sides. the cisco world you look for is long time over. their big boxes run IOS-XR, which is more like the “windows of Internet”, literally takes 20+ minutes to boot, and upgrades and SMUs are kind of nightmares in many cases, you may need to re-boot the box twice, remove conflicting SMUs. even Huawei manages that better. JunOS is far better on the ISSU area, basically their MX960 (and all MX2+ series boxes) were the first one i experienced to do ISSU as expected.
coming back to IOS-XR: the “single binary on the flash is decompressed to ram and booted” times are long over. basically IOS-XR uses the same concept as routeros, there is an actual filesystem on the flash (ok, not on linear NAND but say an SSD) and there’s the filesystem the router will boot from. and updates modify that. and there are actual routers (3-4 year old asr9k boxes with RSP4G) that are not physically capable of hosting 2 different images. at some time it was a viable option to do Z/X-modem and upload stuff even through RomMon, but today’s images are in the 100x of megabytes and even over 1GB. and those are not enterprise but service provider devices. and they seem to have the weirdest issues - like not being able to do packet fragmentation on PPPoE sessions, or simply discarding to big packages with DF bit set w/o sending the ICMP unreachable message. and it takes more than 1 year to get this fixed.
regarding the ASR1k, we have some in our network, not too many, but enough. they are pretty solid boxes. but the prices associated to them are way in another league. you can buy a CCR1072 for merely 3k USD, whereas you’ll be charged 10kUSD only for IOS-XE software. and then come the licenses (right to use, and scalability). you may not use all 1G/10G ports available on the box, and so on. i just finished 2 asr1002-FX configuration. just the bare metal with 4x10GE ports and 4xGE ports enabled on it, it’s a 2-slot pizza box, crypto module for up to 10G throughput, up to 16k users PPPoE termination and IPSEC licenses. guess what: the IPSEC box w/o PPPoE goes for 136kUSD and the PPPoE w/o IPSec for 153k USD. even if we get say 50% discount, it will be way more than 10x the price of the CCR, whereas the features are more or less in par. all this without support or access to software updates - that will cost additional 4-8% of the list price.
i guess the lessons are the following: distant devices shall have 2 partitions, and console access. and you shall upgrade one at a time.
to my best knowledge, routerOS validates the packages found on the disk before updating the base OS with them.
however i totally agree with you on that windows-ish toolkit: i hate it as well. you can get the device up&asking for the “net install” image, but you don’t have access to that directly. you could (theoretically) extract it from the netinstall binary, but i would call it rather tinkering than actual troubleshooting. probably mikrotik doesn’t want to hand out scripts which reveal how license management is done. i don’t know, and i don’t care. i just don’t like this windows-only stuff.
i’d rather have a “netinstall image” feature set for say any (or a specific) mikrotik device with USB slot (say map2n) and deploy it as last resort to those locations to act as a boot server. cause getting transparent L2 connectivity is not easy from 1000+ kms. and this is required for netinstall.
OTOH, it just fell into my mind, and probably it would be faster/easier than ship the box back and forth.
start netinstall on your PC which is connected to the internet.
forward TFTP port on the outside of your router to that PC. set up CCR to use DHCP as boot protocol.
next to the dead CCR on any device (say a mikrotik with dhcp package) start dhcp server and set option 66 to the router’s IP address and option 67 to the boot filename (/image?) and don’t forget to send the gateway so the net-booted CCR will be able to access the internet. (if it is connected with it’s net-bootable-if to the network).
it should be able to load the required stuff from your PC all over the internet and show up in netInstall.
or you could ask someone to plug any mikrotik device to the CCR’s boot port and netinstall it over an eoip tunnel
It happens all the time with Cisco if you work on very large networks (10,000+ routers/switches/firewalls/etc) . In the last year I’ve had to deal with failures (hardware and software) on Cisco Nexus 2K, 3K, 5K, 7K , Cisco ASR 1002/1006 as well as Cisco 6500VSS. 4500XVSS and 2960/2960X. That’s just the stuff I remember…i’m sure there are more.
It’s also worth noting that we have clients running CCRs in multi-billion dollar enterprises that have had no issues and great uptime/stability.
it’s impossible to account for all of the hardware and software issues you could come across with any given vendor, which is why as a network engineer and designer, it is better to assume that all hardware will fail at some point and design the right HA strategy
It happens enough in the ASR series, Cisco has a howto guide on diagnosing crashes
For YEARS NOW Mikrotik suffers (and continues to suffer) from random, mysterious port flaps.
The CCRs has NUMEROUS issues, which aren’t getting fixed and/or better.
I couldn’t care less on what works, and what doesn’t work, and this vendor vs that vendor. MIKROTIK (which is what hardware we are discussing here), is NOT enterprise ready. I simply don’t buy it that a “flagship product” just randomly “looses” it’s software image, or that a “flagship product” just randomly starts flapping ports. If Mikrotik expects me to pay thousands of $ for a CCR, with a shelf life of a few months, then sorry, that’s not buying hardware - that’s wasting money.
The majority of the Internet is built on Cisco, Lucent, Brocade, etc. for a reason… Finished.
Couldn’t care less about the rest - I won’t ever buy a CCR again, and I will advise everyone that asks me, strongly against buying a CCR too.
Ok. Bad personal experience. It can be. You just have lost the reason to keep your networks running on mikrotik. OK. So it can be. Never mind. Just use further whatever make of routers you still believe in. But this can always happen in the future even with Cisco. My habit is to keep a backup device on the site when it is critical and out of my direct access. In such cases I can easily instruct someone who can swap the devices fastly and there is no significant outage. Just hint for you…
The result of ( budget * flexibility * uptime * stability ) is all about.
If you can afford more expensive solutions then go for it. But even than you cannot be sure that devices are stable enough to not fail.
This starts to sound more and more the usual " give me all the Cisco,Juniper “benefits”, but for 1/10 or 1/20 the price" topic that we have time after time here. i will not go into discussion, is it wrong or right, it is just the way it is. and from where i sit, every next year is an overall improvement.
On critical sites i usually deploy 3x devices ( we do buy them from different distributors ) - 1st online now, 2nd backup device, that is online and just need to trigger some scripts to initiate instead of 1st. and 3rd, that is not even attached to electricity. , but with all the configuration and cables attached.
worst case scenario - someone need to go there and unplug/plug power cords.
YES, it is 3 times more expensive to have 3 devices, but it is still several times cheaper that have one device from competition. So CCR can be “enterprise ready” simply your solution isn’t
Thanks, again, three CCRs, 1 already failed, another failed now (after unplug/plug the power cable as you so nicely put it - it doesn’t boot up again!! PS: Not the first time either - multiple devices)
The point is, these devices being ‘flagship’ devices, aren’t stable at all. If I can’t even TRUST a Mikrotik (that does MULTIPLE gbps in traffic) to come up again after a simple reboot, HOW ON EARTH can you trust it for mission critical services?
I mean really now. Seriously. A SIMPLE REBOOT!!! I guess the standard of ‘reliability’ is just lower when it comes to Mikrotik then, because if THIS is acceptable to Mikrotik and to a community that uses Mikrotik, then sorry. It’s COMPLETELY unacceptable that a device won’t even reboot (and that there’s NO way to recover it either - except for a crappy netinstall that runs on Windows only).
Failure of multiple CCRs are not normal, you should contact your distributor to make sure these devices are not from bad batch and get replacement or RMA.
Things to think about - You have RMA’d a router at this location before? Maybe it’s the location? Do you have stats on it? Do you monitor the power or temp?
Personally, every router I manage “calls home” every minute to report it’s health. Also, why didn’t you have a second partition?
That said… If you have had issues for years with Mikrotik, why do you keep using them?
It’s not an argument…what I presented are facts. Doesn’t mean I never have an issue with CCRs, but it’s not any more frequent than Cisco or any other vendor.
If you’re relying on the inherent stability of hardware / software on a singe platform to keep critical services online, then you need to rethink the design which includes power and physical layer concerns as well as the network hardware.
Please give us reasons or the only one reason why you have bought CCR if you trust other brands as core devices ?
Why have you choosen NNN dollars over 3 x, 4x, 5 x times more expensive devices ?
Why haven’t you bought backup device if the location is so far from your office ? It seems to be not so clever move.
Remeber: I do not suggest that cheap could mean less quality.
I agree with this 100%. Every router that is more than 3 hours from me, has a second, identical router in a box already configured enough to get online.