For two 10-Gig BGP CHR routers , this is what I do (and it works good).
- One physical VMware ESXi box (Free version of VMware ESXi)
- Two Xeon CPUs - with a minimum of 10 cores per Xeon CPU.
- Lots of Xeon CPU cache helps
- Disable Hyper-threading
-- On 1st CHR , configure eight CPUs (CPUs 2-through-10) ((( dedicated configuration to make this CHR run only on the 1st Xeon CPU )))
-- On 2nd CHR , configure eight CPUs (CPUs 12-through-20) ((( dedicated configuration to make this CHR run only on the 2nd Xeon CPU )))
((( Note ; Each CHR will run on a different physical Xeon processor -and- each CHR will have access to the entire built-in Xeon CPU cache )))
((( Note ; The 2nd CHR which runs on the 2nd Xeon CPU might have slightly faster throughput )))
((( Note ; On your CHRs , use VMXNET-3 network interfaces only )))
You should have a second cold-spare physical VMware ESXi server ready to use if you have a hardware problem with the first physical box.
Your 2nd physical box could also be running a 3rd CHR to handle any OSPF traffic and/or routing to your customers.
Your 2nd physical box could also be running a 4th CHR to handle any customer bandwidth limiter configurations you may have to limit customer up/down bandwidths.
I always suggest a stand-by physical Hyper-Visor server (VMware ESXi) with some CHRs configured , licensed and ready to go) to minimize any possible down-time in the event you have some serious hardware failures. The spare stand-by physical box and spare stand-by CHRs when not needed will make a great LAB system for testing stuff.
CHR P-Unlimited licenses are cost effective (about $250 per license). The only real expense is the physical servers ( In my case SuperMicro servers ) and some 10-gig switches.
This is how I do my core networks (including BGP systems).
North Idaho Tom Jones
Tom - I really appreciate all of your insight and interaction as it relates to running BGP on CHR nodes. We are a small MSO serving a rather demanding customer base, and currently peak between 30 - 36Gbps at peak hour, and run flat out at our entire 40Gbps during any Call of Duty updates. We presently have a single default route to our primary transit provider, though we are moving towards multiple transit providers and a need for capacity to run two BGP full tables.
Given these throughput requirements, can you share your thoughts on how you might approach this? After reading this thread, I was thinking perhaps two physical ESXi or Hyper-V hosts running in parallel, each servicing 20Gbps of traffic.
Would you present the CHRs as individual peers to the upstream transit providers, or apply some type of load balancing? When evaluating other carrier class routers capable of running dual full view tables, the exorbitant costs led me to search for alternatives, and that is how I came across your thread.
Wow - You've got some good high-end questions. There is no single correct answer.
However , If it were me ... , I think I would consider something like this:
- One primary high-end VmWare ESXi server ( and possibly a second warm-spare server )
Two or Four Xeon CPUs ( lots of cores & lots of CPU cache
128 Gig or 256 Gig of RAM memory
One or Two high-throughput network card ( 100 Gig would be best )
- High throughput layer 2-switch ( 100 Gig ports would be best )
I would try this in a single VmWare ESXi server:
- Two CHR BGP-Peering routers
* Each BGP peering router has it's own 10-100-Gig interface - So you are burning up two ports for WANs to your BGP routers on the WAN interfaces
- A third CHR or PfSense router which is doing OSPF to my two BGP peering routers
* the two BGP routers would do OSPF to your OSPF router and no traffic would be going through an external switch * faster I/0 using using on the network interfaces *
- A fourth ( or more ) distribution routers routers on the same VmWare ESXi server ( these routers talk to your OSPF router - not the BGP servers )
* The WANs on your distribution routes don't go through an external switch * faster network I/O
** At his point , everything ( excluding your distribution router LANs ) is running inside one signe VmWare ESXi server * Faster network I/O network throughput because you are not going through external networks/switches
*** You only have four virtual machines on your VmWare ESXi server ( CHR#1-BGP & CHR#2-BGP & CHR-OSPF0router & CHR-Distribution-router )
Note - anything talking to your distribution router LANs should be going through their own switches *** BGP WANs and distribution LANs are not on the same switch *** keep it simple and keep it fast
Note : In theory , if you are on a fast enough VmWare ESXi server , virtual switches should outrun physical switches ( because a virtual switch is software instead of physical - and the only limitation on a virtual switch throughput is how fast is the CPU.
my thoughts
North Idaho Tom Jones