Community discussions

 
User avatar
TomjNorthIdaho
Forum Veteran
Forum Veteran
Topic Author
Posts: 998
Joined: Mon Oct 04, 2010 11:25 pm
Location: North Idaho
Contact:

Full BGP tables with two upstream ISPs using CHR - Performance question

Tue Feb 28, 2017 9:29 pm

Full BGP tables with two upstream ISPs using CHR - Performance question

We are soon going to be doing full BGP (IP-V3 & IP-V6) with a /19 to two outside up-stream ISPs over a 10-Gig links and a 1-Gig link.

My questions are:
- 1 ); How reliable is a CHR with full BGP tables with two up-stream bandwidth providers ?
- 2 ); Is a CHR reliable enough ? What kind of memory & CPU is recommended for full tables (IP-V4 and IP-V6) ?
- 3 ); Who is / How many are running Full BGP tables
- 4); What is the time from power-up that it takes to load all IP-V4 and IP-V6 tables ?

thank you for any answers

North Idaho Tom Jones
 
User avatar
shaoranrch
Member Candidate
Member Candidate
Posts: 183
Joined: Thu Feb 13, 2014 8:03 pm

Re: Full BGP tables with two upstream ISPs using CHR - Performance question

Wed Mar 01, 2017 1:12 am

I haven't implemented it in production yet. I've got one however taking feeds from one CCR as a dummy test.

Seems to be reliable so far, takes around 1 minute to load a full table (over 600k routes). However I've noticed rather weird things (it's BFO 6.37.4).

For instance sometimes after it loads the full table it keeps at 30% CPU usage consistently for hours, the profiler states the CPU usage is mostly the routing process (and there's no traffic at all). Some other times this doesn't happen at all and the CPU goes back to idle after the load.

I've also experienced an issue where if the peer goes down, then up after a while, the table takes over 5 minutes to load instead of 1 minute, this is rather annoying and I don't know why it happens.

It's working on a XEON single vCPU at 2.7 ghz and 1 GB ram.

Enviado desde mi SAMSUNG-SM-G920A mediante Tapatalk
Rafael Carvallo
Telecommunications Engineer

Need consultation?
Need a hotspot with facebook integration?
Send a PM!

Hablamos español, atendemos el mercado de latinoamérica visita nuestra página web:
http://www.tuproximosalto.com
 
User avatar
TomjNorthIdaho
Forum Veteran
Forum Veteran
Topic Author
Posts: 998
Joined: Mon Oct 04, 2010 11:25 pm
Location: North Idaho
Contact:

Re: Full BGP tables with two upstream ISPs using CHR - Performance question

Wed Mar 01, 2017 1:20 am

I haven't implemented it in production yet. I've got one however taking feeds from one CCR as a dummy test.

Seems to be reliable so far, takes around 1 minute to load a full table (over 600k routes). However I've noticed rather weird things (it's BFO 6.37.4).

For instance sometimes after it loads the full table it keeps at 30% CPU usage consistently for hours, the profiler states the CPU usage is mostly the routing process (and there's no traffic at all). Some other times this doesn't happen at all and the CPU goes back to idle after the load.

I've also experienced an issue where if the peer goes down, then up after a while, the table takes over 5 minutes to load instead of 1 minute, this is rather annoying and I don't know why it happens.

It's working on a XEON single vCPU at 2.7 ghz and 1 GB ram.

Enviado desde mi SAMSUNG-SM-G920A mediante Tapatalk
Thank you

If BGP properly works and is stable on a CHR,
Then I am going to next consider do I want to run it under VMware ESXi and a virtual server (8 CPUs with multi-Gig of ram and 10-gig Network cards)
-or-
Run CHR directly without a hypervisor (boot directly to CHR) and then have all CPUs and all RAM only on the CHR (possibly a faster machine this way)

North Idaho Tom Jones
 
savage
Forum Guru
Forum Guru
Posts: 1213
Joined: Mon Oct 18, 2004 12:07 am
Location: Cape Town, South Africa
Contact:

Re: Full BGP tables with two upstream ISPs using CHR - Performance question

Wed Mar 01, 2017 10:11 am

For instance sometimes after it loads the full table it keeps at 30% CPU usage consistently for hours, the profiler states the CPU usage is mostly the routing process (and there's no traffic at all). Some other times this doesn't happen at all and the CPU goes back to idle after the load.

I've also experienced an issue where if the peer goes down, then up after a while, the table takes over 5 minutes to load instead of 1 minute, this is rather annoying and I don't know why it happens.
Yup. Not only the initial load is slow, but convergence is very slow too (full tables IPv4/IPv6 - 600k+ and peering IPv4/IPv6 - 150k+)... If MT doesn't do improvements to BGP soon, I'll be looking to replace my large BGP tables with other devices. From where the BGP process receives the route, until where the route is (in an active state) in the routing tables, could easily take 2mins+ as well.

If a peer is down for a long time, then eventually it stops attempting to reconnect to the peer as well. You have to disable/enable the peer to reconnect... For something as critical as BGP, there's a lot of 'issues' with the MT implementation. I can't say I'm entirely happy running MT on the BGP side at this stage.
Regards,
Chris
 
User avatar
shaoranrch
Member Candidate
Member Candidate
Posts: 183
Joined: Thu Feb 13, 2014 8:03 pm

Re: Full BGP tables with two upstream ISPs using CHR - Performance question

Wed Mar 01, 2017 2:49 pm

Yup. Not only the initial load is slow, but convergence is very slow too (full tables IPv4/IPv6 - 600k+ and peering IPv4/IPv6 - 150k+)... If MT doesn't do improvements to BGP soon, I'll be looking to replace my large BGP tables with other devices. From where the BGP process receives the route, until where the route is (in an active state) in the routing tables, could easily take 2mins+ as well.

If a peer is down for a long time, then eventually it stops attempting to reconnect to the peer as well. You have to disable/enable the peer to reconnect... For something as critical as BGP, there's a lot of 'issues' with the MT implementation. I can't say I'm entirely happy running MT on the BGP side at this stage.
I have not seen an initial load with CHR that takes over 1 minute 20 seconds from start to routes being active. My issue is after there's some sort of flapping with an already established peer, when this happens, it takes a LOT of time to load all the routes that peer was sending again.

Also I don't believe 2 minutes is a lot of time on BGP related stuff, I've worked with Cisco ASR 1XXX and this seems to be the standard time. Maybe the guys from IParchitechs can say anything about this since they seem to use CHR a lot.

Other than this, have you experienced any other issue?

btw, not related to this, but is there some sort of documentation about performance of CCR when it's working as a route reflector for a combination of IP, VPNv4, VPLS and IPv6? for around, say 2.000.000 routes (combined from all the families above).
Rafael Carvallo
Telecommunications Engineer

Need consultation?
Need a hotspot with facebook integration?
Send a PM!

Hablamos español, atendemos el mercado de latinoamérica visita nuestra página web:
http://www.tuproximosalto.com
 
savage
Forum Guru
Forum Guru
Posts: 1213
Joined: Mon Oct 18, 2004 12:07 am
Location: Cape Town, South Africa
Contact:

Re: Full BGP tables with two upstream ISPs using CHR - Performance question

Wed Mar 01, 2017 2:58 pm

I have not seen an initial load with CHR that takes over 1 minute 20 seconds from start to routes being active. My issue is after there's some sort of flapping with an already established peer, when this happens, it takes a LOT of time to load all the routes that peer was sending again.
Consider the routing table:
AB a.b.c.d gw 1.2.3.4
B a.b.c.d gw 1.2.3.5

When the active route is removed from the routing table (say peer 1.2.3.4 goes down), it takes *minutes* for the backup route via 1.2.3.5 to become active. When 1.2.3.4 is established again, it will take minutes for it to become the active / preferred route in the routing tables again.

Definitely not 'standard' or 'normal'. I've seen (and personally worked on) Cisco's with 800K routes switching between active/backup paths within seconds (both routes are already in the FIB). Yes, once the peer re-establish, it could take a while for the routes to make it back into the FIB, but with two (or more) routes ALREADY in the FIB, it should most definitely NOT take minutes for a backup path to become active. I have paths in my routing table where I have over 10 paths to a peer, and when the active peer goes down, I'm stuck without a active route to the prefix for quite a long time.

Point I'm trying to make, is it's not only BGP that's slow. With large routing tables, even MT's "FIB" becomes slow. The routing table itself, is slow to update / change.

Personally, I haven't experienced any other issues (other than buggy route-filters which requires a lot of TLC when handling them), other than what has been mentioned.
Regards,
Chris
 
User avatar
shaoranrch
Member Candidate
Member Candidate
Posts: 183
Joined: Thu Feb 13, 2014 8:03 pm

Re: Full BGP tables with two upstream ISPs using CHR - Performance question

Wed Mar 01, 2017 3:33 pm

I just did a test inducing a flapping 5 times on a CHR, basically takes eons to load back the full table as I commented, but the last time I left the peering disabled. What I'm seeing is the device keeping a number of routes in the FIB as ACTIVE even though there's no BGP session at all, the routes are stuck in there (97658 in total), been like this for the past 15 minutes.

Now that I recall, I saw this behavior weeks ago, with the same number of routes getting stuck, it goes like this:

The peer goes down, then up, then down
CHR starts removing all the routes from the FIB (slowly), goes from 600k or so routes to 97658, it keeps in this states unless rebooted (admittedly I haven't had the patience to wait more than 20 minutes until I just reboot the instance).

It's RouterOS 6.37.4
Rafael Carvallo
Telecommunications Engineer

Need consultation?
Need a hotspot with facebook integration?
Send a PM!

Hablamos español, atendemos el mercado de latinoamérica visita nuestra página web:
http://www.tuproximosalto.com
 
User avatar
sri2007
Member Candidate
Member Candidate
Posts: 191
Joined: Wed May 20, 2015 10:14 pm
Location: Quito

Re: Full BGP tables with two upstream ISPs using CHR - Performance question

Sat Feb 10, 2018 3:05 am

Hi guys!! We’ve worked with several CHR as eBGP routers, and those have a good performance, we also recommend to install an hypervisor because you can add any CHR that you require... however, while we wait for RouterOS 7 which may handle BGP in a multicore way, there are some extra awesome routers which are ESXi servers with an i7 processor at 4.2GHz per core which can load a Full Routing table (600k routes) in about seconds, if want to do this in another router it can take some minutes to los everything.
MikroTik Soporte y Consultoría - Español / English +593 98 709 3502
https://www.safenet.ec/consultoria.html/ soporte@safenet.ec
 
merlinios
just joined
Posts: 21
Joined: Sat Oct 07, 2006 9:37 pm

Re: Full BGP tables with two upstream ISPs using CHR - Performance question

Fri Mar 16, 2018 4:20 pm

Hi guys!! We’ve worked with several CHR as eBGP routers, and those have a good performance, we also recommend to install an hypervisor because you can add any CHR that you require... however, while we wait for RouterOS 7 which may handle BGP in a multicore way, there are some extra awesome routers which are ESXi servers with an i7 processor at 4.2GHz per core which can load a Full Routing table (600k routes) in about seconds, if want to do this in another router it can take some minutes to los everything.
So the final question is if someone can use Mikrotik for a small ISP with 3 upstream providers with full BGP Feeds from them using x86 hardware. Do you recommend this ?
 
User avatar
TomjNorthIdaho
Forum Veteran
Forum Veteran
Topic Author
Posts: 998
Joined: Mon Oct 04, 2010 11:25 pm
Location: North Idaho
Contact:

Re: Full BGP tables with two upstream ISPs using CHR - Performance question

Fri Mar 16, 2018 8:42 pm

Hi guys!! We’ve worked with several CHR as eBGP routers, and those have a good performance, we also recommend to install an hypervisor because you can add any CHR that you require... however, while we wait for RouterOS 7 which may handle BGP in a multicore way, there are some extra awesome routers which are ESXi servers with an i7 processor at 4.2GHz per core which can load a Full Routing table (600k routes) in about seconds, if want to do this in another router it can take some minutes to los everything.
So the final question is if someone can use Mikrotik for a small ISP with 3 upstream providers with full BGP Feeds from them using x86 hardware. Do you recommend this ?
Well, I am fairly new to running BGP on a CHR , it does appear to work fairly well.
I am currently running BGP with full tables on my CHR 64-Bit system which is a virtual machine on a VmWare ESXi 6.5.0
My physical server is a SuperMicro with 128 Gig ram & two physical Intel Xeon 3 Ghz 10-core processors (hyper-threading disabled) (20 cores total) -and- 10-Gig network cards
I assigned my CHR BGP (64-bit) router 6-Gig of ram (more than needed) and 8 processors & vmnic3 10-Gig ethernet interfaces
My CHR is a license level: P unlimited

I would not recomend running BGP on a 32-Bit x86 ROS router. The x86 32-bit ROS does not support paravirtual vmnic3 ethernet interfaces and (for me) x86 32-Bit has been subject to hundreds of lockups when under heavy load, Also the x86 ROS 32-Bit version has limited useable maximum memory , vs the CHR 64-Bit version that can use greater than 2-Gig of ram memory. Lots and lots of ram memory is critical when running BGP.

FYI - on boot-up, my CHR appears to load the BGP tables (using 10-gig interfaces) in seconds (not minutes). :)

Although I am still new to BGP on a CHR 64-Bit system, I give it thumbs-UP

EDIT: Note - Currently , Mikrotik's ROS BGP is CPU-single-threaded (aka BGP only runs on 1 of possible CPUs) ((( At least that is what I think & have read))) --- Thus I suspect you want a very fast CPU processor core to run BGP.
- Note: Getting full BGP tables on a Mikrotik CCR1016-12S-1S+ with a 1-gig interface takes up to a minute (tilegx 1.2 GHz 16-core-CPU)
- Note: Getting full BGP tables on a Mikrotik CHR with a 10-Gig vmxnet interfaces takes only a few seconds (Intel Xeon 3 GHz 8-core-CPU w/ 25-Meg CPU cache)

North Idaho Tom Jones
 
User avatar
sri2007
Member Candidate
Member Candidate
Posts: 191
Joined: Wed May 20, 2015 10:14 pm
Location: Quito

Re: Full BGP tables with two upstream ISPs using CHR - Performance question

Tue May 01, 2018 4:26 pm

btw, you can check this link for a most specific analysis too: https://mum.mikrotik.com/presentations/ ... 562405.pdf
MikroTik Soporte y Consultoría - Español / English +593 98 709 3502
https://www.safenet.ec/consultoria.html/ soporte@safenet.ec
 
jmginer
Member Candidate
Member Candidate
Posts: 115
Joined: Tue Dec 11, 2012 4:56 am

Re: Full BGP tables with two upstream ISPs using CHR - Performance question

Mon Sep 24, 2018 4:44 pm

btw, you can check this link for a most specific analysis too: https://mum.mikrotik.com/presentations/ ... 562405.pdf
Hello!, thanks to share this!!!

In your tests with Proxmox, you have only generated less than 80,000 PPS, however with ESXi and Hyper-V you have exceeded +500,000 PPS.

Does Proxmox have a problem managing PPS?

Who is online

Users browsing this forum: Majestic-12 [Bot] and 8 guests