Community discussions

MikroTik App
 
xrobau
just joined
Topic Author
Posts: 9
Joined: Wed Jan 03, 2018 4:18 am

CCR2004-1G-2XS-PCIe causes INSTANT host crash when it's rebooted

Sat Dec 10, 2022 4:54 am

Unfortunately, it looks like there's something in the way the PCIe card reboots that upsets HP servers.

When I reboot the CCR, the server INSTANTLY hangs, with errors like "Uncorrectable Machine Check Exception (Board 0, Processor 1, APIC ID 0x00000000, Bank 0x00000013, Status 0xFE200000'000C110A, Address 0x00000000'80500000, Misc 0x44FC3816'00402086)" and "PCI Bus Error (Slot 2, Bus 0, Device 2, Function 0)"

Has anyone managed to get these cards working with enterprise-ish servers at all? I can understand that a normal PC without all the failsafe/monitoring would probably not notice, or even mind, that a PCI card vanishes, but these bigger servers obviously do!

This SPECIFIC host is a HP DL360 G9, with 2 x E5-2687W v3's, but I suspect that almost all servers are going to have a similar problem.
 
User avatar
mkx
Forum Guru
Forum Guru
Posts: 11593
Joined: Thu Mar 03, 2016 10:23 pm

Re: CCR2004-1G-2XS-PCIe causes INSTANT host crash when it's rebooted

Sat Dec 10, 2022 12:34 pm

Or perhaps not. If reboot of this device indeed makes it (momentarily) vanish from PCIe bus, then this indeed causes problems to hardware and OSes that can not handle hot-plug events on PCIe. This is fairly recent development (e.g. less than 10 years) and not all hardware engineered around that time already supports it ... which is true for OS kernels as well. I would expect that latest hardware (e.g. HPE servers gen11) and OS (e.g. linux kernels 5.x ...) handle this without issues. Older servers (possibly gen9 is affected) and older OSes (e.g. linux kernel 4.x and older, Windows server 2016 and lower) don't expect this to happen and freak out.

Mind that a proper CCR2004-PCIe reboot may be harder on PCIe host than reset of chip on a more ordinary PCIe card ...
 
xrobau
just joined
Topic Author
Posts: 9
Joined: Wed Jan 03, 2018 4:18 am

Re: CCR2004-1G-2XS-PCIe causes INSTANT host crash when it's rebooted

Sat Dec 10, 2022 2:04 pm

It causes a crash even if the machine is still in POST. It's pretty clear that the ILO equipped devices are far more picky about how pcie devices are hotplugged than a normal dumb machine. I plugged it into a generic dell PC and rebooting the CCR didn't bother the PC at all.

I don't have a Dell server with a DRAC, so I can't check with that, but two HPs servers both crashed with similar errors when the 2004 was rebooted.

There's obviously something that the HPs don't like about them unplugging and replugging themselves - even though I can literally PHYSICALLY unplug a normal NIC while the machine is on.
 
User avatar
mkx
Forum Guru
Forum Guru
Posts: 11593
Joined: Thu Mar 03, 2016 10:23 pm

Re: CCR2004-1G-2XS-PCIe causes INSTANT host crash when it's rebooted

Sat Dec 10, 2022 2:34 pm

Another possibility would be power surge following full board reset ... it could momentarily overload PCIe bus power capabilities. I wasn't able to find a good reference on that, but from the top of my memory PCIe capability is somewhere around 30-50W, possibly depending on PCIe version and particular mainboard implementation.
 
r00t
Long time Member
Long time Member
Posts: 674
Joined: Tue Nov 28, 2017 2:14 am

Re: CCR2004-1G-2XS-PCIe causes INSTANT host crash when it's rebooted

Sat Dec 10, 2022 6:20 pm

Typical max PCIe slot power is 75W and it's quite common for GPUs or things like AI/Crypto accelerators to use all of it... continously.
Mikrotik card should not be consuming so much power. And any brief spikes should be smoothed out by capacitors on board.
So I'd say the hardware side of this problem is far less likely then some software issue in ILO or BIOS, especially if it keeps happening just on range of HP servers and not other vendors...
 
pe1chl
Forum Guru
Forum Guru
Posts: 10221
Joined: Mon Jun 08, 2015 12:09 pm

Re: CCR2004-1G-2XS-PCIe causes INSTANT host crash when it's rebooted

Sat Dec 10, 2022 8:02 pm

When initially reading about this card I though "wow! that is interesting! that could be the ideal router for a 1U server in a colocation datacenter!".
But issues like what you mention here (and also the reverse: what will happen to the card when the host is rebooted? will it remain running?), plus the lack of drivers for VMware ESXi, has made me put it aside as "unfortunately, not so interesting".
What we need is a card that can function almost independently from the host, with only the data path via the PCI bus. So we can tie a cable from the ethernet port to the ILO/DRAC port, and have a 10Gbit or faster link to the ISP. And then be able to setup a VPN to manage the host via ILO and directly on the network interface.

It seems it is still a dream.
 
r00t
Long time Member
Long time Member
Posts: 674
Joined: Tue Nov 28, 2017 2:14 am

Re: CCR2004-1G-2XS-PCIe causes INSTANT host crash when it's rebooted

Sat Dec 10, 2022 9:47 pm

If there is ever revision 2 of this card, adding optional external power (either jack or POE or internal connector) would help. That way it would be possible to run it at all times, no matter what's the state of PC. Or even use it as a "router on a stick" like some users tried (and fried) using mining riser.

As for drivers, that's still a problem. Too bad it requires modified ones so running any unsupported OS is just out of the question...
 
wpeople
Member
Member
Posts: 380
Joined: Sat May 26, 2007 6:36 pm

Re: CCR2004-1G-2XS-PCIe causes INSTANT host crash when it's rebooted

Mon Dec 12, 2022 2:43 pm

If there is ever revision 2 of this card, adding optional external power (either jack or POE or internal connector) would help. That way it would be possible to run it at all times, no matter what's the state of PC. Or even use it as a "router on a stick" like some users tried (and fried) using mining riser.

As for drivers, that's still a problem. Too bad it requires modified ones so running any unsupported OS is just out of the question...
probably the card would work a PCIe riser what does nothing but powering the card?
 
User avatar
woland
Member Candidate
Member Candidate
Posts: 258
Joined: Mon Aug 16, 2021 4:49 pm

Re: CCR2004-1G-2XS-PCIe causes INSTANT host crash when it's rebooted

Mon Dec 12, 2022 3:31 pm

I wouldn´t! Here it is how I tried and fried and also repaired it:
viewtopic.php?t=189441

Besides there are no proper cases available.
 
wpeople
Member
Member
Posts: 380
Joined: Sat May 26, 2007 6:36 pm

Re: CCR2004-1G-2XS-PCIe causes INSTANT host crash when it's rebooted

Mon Dec 12, 2022 3:39 pm

I wouldn´t! Here it is how I tried and fried and also repaired it:
viewtopic.php?t=189441

Besides there are no proper cases available.
if i'm right, that card is blown because of OLD picoPSU...
the proper case is not an issue.
 
User avatar
woland
Member Candidate
Member Candidate
Posts: 258
Joined: Mon Aug 16, 2021 4:49 pm

Re: CCR2004-1G-2XS-PCIe causes INSTANT host crash when it's rebooted

Mon Dec 12, 2022 4:12 pm

I´m not sure if that was my PicoPSU or the riser. I have measured the outputs of that PicoPSU after the fuse was blown and they were OK. So I rather tend to say, that the riser had issues and the capacitor exploded on it, so I can´t test that any more.

I would not use a plastic case, but something out of sheet metal for fire protection (it should run 24hours a day for many years). Of course it´s doable, but it needs time and effort, to manufacture it.
Cooling should be considered as well. I don´t have the time for a custom made one.
So to me it is an issue. Not an unsolvable one, I could even take one of my MiniITX cases.
They are however bulky, compared to an RB5009. Taking care of low noise cooling+ custom case + power would also add to the costs.

Besides my most important usecase requires OpenBSD compatibility, which is not given and there was no information if MT will support the driver development.
 
wpeople
Member
Member
Posts: 380
Joined: Sat May 26, 2007 6:36 pm

Re: CCR2004-1G-2XS-PCIe causes INSTANT host crash when it's rebooted

Mon Dec 12, 2022 4:15 pm

well, i tought to use them in DC/hosting environment, probably a 2U rack chassis serving 6-9 cards in 2-3 columns.
 
User avatar
woland
Member Candidate
Member Candidate
Posts: 258
Joined: Mon Aug 16, 2021 4:49 pm

Re: CCR2004-1G-2XS-PCIe causes INSTANT host crash when it's rebooted

Mon Dec 12, 2022 4:29 pm

It seems to me, for that case the cards are perfect. You don´t need a host for the cards and you could probably get multi PCIe riser boards in higher quality. Besides I´m sure you could just use any server case, powering the whole thing with a beefy standard server PSU.
The missing drivers are not needed in this case, while you get the best price/performance for a router from MT.
Great idea! The only problem I see is the avilability of the cards....
 
pe1chl
Forum Guru
Forum Guru
Posts: 10221
Joined: Mon Jun 08, 2015 12:09 pm

Re: CCR2004-1G-2XS-PCIe causes INSTANT host crash when it's rebooted

Mon Dec 12, 2022 5:07 pm

I don't see a need to hack a case for this card, in that situation I would just buy a router in a case.
My use-case is in colocation environments where you rent only a single 1U slot and thus there is no space for an external router, only your 1U server.
With this card you would be able to have an external router to protect the management interfaces of your server (ILO/DRAC, RDP, SSH etc) using a VPN, and have routing functionality in general.
The card would be placed in a slot of the server, the ethernet port connected to the ILO/DRAC port, and the SFP used to connect to the ISP network.
 
User avatar
woland
Member Candidate
Member Candidate
Posts: 258
Joined: Mon Aug 16, 2021 4:49 pm

Re: CCR2004-1G-2XS-PCIe causes INSTANT host crash when it's rebooted

Mon Dec 12, 2022 6:15 pm

I did see a need to hack a case, becuase my usecase is that of a remote location ("spoke" - meaning the flat of my family) equipped with an X86 OpenSense box, which should have been prepared for 10G interfaces.
At the time of the purchase of the MT card it was no big additional cost to purchase a CCR2004 card intead of a dual SFP+ Intel card.
It turned out, that the OpenBSD drivers for the CCR2004 are unusable.
My next idea would have been to use my CCR2004 as a standalone router instead of an rb5009, but there were some issues....
 
srTeCHNoiD
just joined
Posts: 7
Joined: Thu Feb 28, 2019 4:04 pm

Re: CCR2004-1G-2XS-PCIe causes INSTANT host crash when it's rebooted

Sat Jan 14, 2023 1:09 pm

I have the same problem with this card on HP DL165 G9. After firmware update and rebooting routeros, server is hang and reboot too.

Image

Uncorrectable Machine Check Exception (Board 0, Processor 1, APIC ID 0x00000000, Bank 0x00000013, Status 0xBE200000'000C110A, Address 0x00000000'80600000, Misc 0xCCFC3816'00402086)

PCI Bus Error (Slot 1, Bus 0, Device 3, Function 0)
 
User avatar
internetolog
just joined
Posts: 20
Joined: Wed Jan 31, 2007 5:40 pm
Location: Wilmington, DE
Contact:

Re: CCR2004-1G-2XS-PCIe causes INSTANT host crash when it's rebooted

Mon Mar 18, 2024 2:09 am

I just installed two of these cards in 2 separate servers. When I reboot the Mikrotik router, one of them which has ESXi, crashes and hangs in purple screen. The one with the linux reboots. I believe it is a bug and may be reboot command causes the server crash. I was going to use them in all of my servers but I cannot go on with this problem. I have to shutdown the server safely but then the Mikrotik card is also shuts down. Crazy chicken and egg problem.
 
pe1chl
Forum Guru
Forum Guru
Posts: 10221
Joined: Mon Jun 08, 2015 12:09 pm

Re: CCR2004-1G-2XS-PCIe causes INSTANT host crash when it's rebooted

Mon Mar 18, 2024 11:00 am

Yeah, it is clear that this card, which seemed very attractive for co-located servers requiring a router, is not usable in practice.
E.g. it would also have to be running when the system is in STANDBY state, so you can poweroff the server via ILO/DRAC and then still be connected to send a poweron command later.
So, it all was just a dream.
 
wpeople
Member
Member
Posts: 380
Joined: Sat May 26, 2007 6:36 pm

Re: CCR2004-1G-2XS-PCIe causes INSTANT host crash when it's rebooted

Mon Mar 18, 2024 4:43 pm

probably the card doing some strange PCI init thing - what causes the machine to crash or forces reboot.
If a soft-restart (like watchdog reboot) can be survived by the host, and only firmware upgrade would need cold restart,
that would be something i can live with.

Until than, i only can ask if there is someone who builds a 2U chassis (like for crypto mining purposes) where i can put some of those cards - and do nothing but powers PCI bus with redundant power supplies?

Who is online

Users browsing this forum: GoogleOther [Bot], Rudolph123123 and 53 guests