redundant hardware or spare in entreprise infrastructure

In this time where the cost of energy inrease I am wondering if it worth inside a company to have redundant network hardware or just one spare racked or not with its configuration ready. Especially since I looked at the https://mikrotik.com/product/crs504_4xq_in that seem to imply to use one switch for all the hw.

To be more concrete I have 2 days 2 ISPS + 1 5G backup on which I am sharing by own IP range using BGP. This is for now only used in a failover manner (i still have to figure how to share the load using IPv6).

Until now I used 2 CCR2004 (R1, R2) + 1 Chateau connected to 2 CRS317 (SW1, SW2). Routing itself is handled by 2 virtual firewall on the servers connected to each switch. On CRS317 I have a POE switch connected to power the WiFI APs:
Untitled 5.png
I am testing MLAG with some success (though I have an issue sometime when the VM is migrated to another server) though I was thinking to just use the VXLANS. For the POE switch I already have a spare racked on which the configuration is duplicated. In case of a failure someone will have to switch the cable to the switch.

All of this is very redundant and allows me to update (not that often the configuration when needed or resist to a possible failure but it is alsocostly in term of power. Not only for the direct power cost of each equipments, but also the indirect cost of cooling the room( more HW = more cooling). So I am thinking today to remove that redudancy and just pre-iinstall a spare for the possible failure. Since it’s the offie that should be enough. But I am wondering how many do it and what would be the best way to process the configuration of the spare and possibly switch on/off since the hw has no power button. Thoughts?

Any feedback/ experience sharing/hint is welcome :slight_smile:

Reasons for redundancy is of course availability and ability to perform maintenance without downtime.

If it’s acceptable with an hour or so to restore functionality, and if you can plan downtime for maintenance, then you’re fine with “spares on the shelf”.

“spares on the shelf” has some advantages:

less power consumption and cooling equipment power consumption so less costs

more battery or fuel generator autonomy if utility power fails

if you suffer a lighting strike “spares on shelf” if they are disconected will be saved from be burned and ready to deploy and replace burned equipment

off course you will have downtime if some equipment fails ultin you get it replaced, fully redundant active active solves that but without the benfits of “spares on the shelf”

Concur, the only reason to have live switchover is extremely critical connectivity. In other words the requirement will dictate and pay for such a setup.
Otherwise, the redundancy needed 99% is independent ISP multi-wan and also possibly redundant servers…

I have already dual homing using my own as and redudant servers, bith locally and on remote location. that coming with 2 distinct electrical path in the building with distinct batteries (though they goin the same endpoint).

For theservers so redudancy can be achieved but since locally there is that bottlneck for the aps (poe switch with one cable to each APs) I hesitate to simplify the whole thing and only rely on the remote endpoint for the services redudancy). Buth then having 2 ccr2004 maybe too much? Maybe i should simplify it to one. What would be the best way to handle thse 2 wan links (+ bgp backup) in one router in such case?

In the good old days, when providing process control computers to airport environments, we used a simple switchover technique, triggered manually.
The backup system was on cold stand-by, all external interfaces (DI/DO, serial lines …) were switched from the failed live system to the backup using one button.
And then the backup was powered up, doing standard re-start.
Although these were old 16-bit process control machines, we achieved 99.98% availability of the combined hw+sw for production, 365/24.
The 99.98% included scheduled maintenance.

For a low-cost solution, having less strict availability requirements, or having more reliable equipment, manual patching of external connections could be reasonable.
We practically never relied on auto switchover, cause the switchover logic to be considered another (single) point of failure, and the danger of getting auto-switch back and forth.