Community discussions

MikroTik App
 
User avatar
TomjNorthIdaho
Forum Guru
Forum Guru
Topic Author
Posts: 1492
Joined: Mon Oct 04, 2010 11:25 pm
Location: North Idaho
Contact:

Full BGP tables with two upstream ISPs using CHR - Performance question

Tue Feb 28, 2017 9:29 pm

Full BGP tables with two upstream ISPs using CHR - Performance question

We are soon going to be doing full BGP (IP-V3 & IP-V6) with a /19 to two outside up-stream ISPs over a 10-Gig links and a 1-Gig link.

My questions are:
- 1 ); How reliable is a CHR with full BGP tables with two up-stream bandwidth providers ?
- 2 ); Is a CHR reliable enough ? What kind of memory & CPU is recommended for full tables (IP-V4 and IP-V6) ?
- 3 ); Who is / How many are running Full BGP tables
- 4); What is the time from power-up that it takes to load all IP-V4 and IP-V6 tables ?

thank you for any answers

North Idaho Tom Jones
 
User avatar
shaoranrch
Member Candidate
Member Candidate
Posts: 184
Joined: Thu Feb 13, 2014 8:03 pm

Re: Full BGP tables with two upstream ISPs using CHR - Performance question

Wed Mar 01, 2017 1:12 am

I haven't implemented it in production yet. I've got one however taking feeds from one CCR as a dummy test.

Seems to be reliable so far, takes around 1 minute to load a full table (over 600k routes). However I've noticed rather weird things (it's BFO 6.37.4).

For instance sometimes after it loads the full table it keeps at 30% CPU usage consistently for hours, the profiler states the CPU usage is mostly the routing process (and there's no traffic at all). Some other times this doesn't happen at all and the CPU goes back to idle after the load.

I've also experienced an issue where if the peer goes down, then up after a while, the table takes over 5 minutes to load instead of 1 minute, this is rather annoying and I don't know why it happens.

It's working on a XEON single vCPU at 2.7 ghz and 1 GB ram.

Enviado desde mi SAMSUNG-SM-G920A mediante Tapatalk
 
User avatar
TomjNorthIdaho
Forum Guru
Forum Guru
Topic Author
Posts: 1492
Joined: Mon Oct 04, 2010 11:25 pm
Location: North Idaho
Contact:

Re: Full BGP tables with two upstream ISPs using CHR - Performance question

Wed Mar 01, 2017 1:20 am

I haven't implemented it in production yet. I've got one however taking feeds from one CCR as a dummy test.

Seems to be reliable so far, takes around 1 minute to load a full table (over 600k routes). However I've noticed rather weird things (it's BFO 6.37.4).

For instance sometimes after it loads the full table it keeps at 30% CPU usage consistently for hours, the profiler states the CPU usage is mostly the routing process (and there's no traffic at all). Some other times this doesn't happen at all and the CPU goes back to idle after the load.

I've also experienced an issue where if the peer goes down, then up after a while, the table takes over 5 minutes to load instead of 1 minute, this is rather annoying and I don't know why it happens.

It's working on a XEON single vCPU at 2.7 ghz and 1 GB ram.

Enviado desde mi SAMSUNG-SM-G920A mediante Tapatalk
Thank you

If BGP properly works and is stable on a CHR,
Then I am going to next consider do I want to run it under VMware ESXi and a virtual server (8 CPUs with multi-Gig of ram and 10-gig Network cards)
-or-
Run CHR directly without a hypervisor (boot directly to CHR) and then have all CPUs and all RAM only on the CHR (possibly a faster machine this way)

North Idaho Tom Jones
 
savage
Forum Guru
Forum Guru
Posts: 1262
Joined: Mon Oct 18, 2004 12:07 am
Location: Cape Town, South Africa
Contact:

Re: Full BGP tables with two upstream ISPs using CHR - Performance question

Wed Mar 01, 2017 10:11 am

For instance sometimes after it loads the full table it keeps at 30% CPU usage consistently for hours, the profiler states the CPU usage is mostly the routing process (and there's no traffic at all). Some other times this doesn't happen at all and the CPU goes back to idle after the load.

I've also experienced an issue where if the peer goes down, then up after a while, the table takes over 5 minutes to load instead of 1 minute, this is rather annoying and I don't know why it happens.
Yup. Not only the initial load is slow, but convergence is very slow too (full tables IPv4/IPv6 - 600k+ and peering IPv4/IPv6 - 150k+)... If MT doesn't do improvements to BGP soon, I'll be looking to replace my large BGP tables with other devices. From where the BGP process receives the route, until where the route is (in an active state) in the routing tables, could easily take 2mins+ as well.

If a peer is down for a long time, then eventually it stops attempting to reconnect to the peer as well. You have to disable/enable the peer to reconnect... For something as critical as BGP, there's a lot of 'issues' with the MT implementation. I can't say I'm entirely happy running MT on the BGP side at this stage.
 
User avatar
shaoranrch
Member Candidate
Member Candidate
Posts: 184
Joined: Thu Feb 13, 2014 8:03 pm

Re: Full BGP tables with two upstream ISPs using CHR - Performance question

Wed Mar 01, 2017 2:49 pm

Yup. Not only the initial load is slow, but convergence is very slow too (full tables IPv4/IPv6 - 600k+ and peering IPv4/IPv6 - 150k+)... If MT doesn't do improvements to BGP soon, I'll be looking to replace my large BGP tables with other devices. From where the BGP process receives the route, until where the route is (in an active state) in the routing tables, could easily take 2mins+ as well.

If a peer is down for a long time, then eventually it stops attempting to reconnect to the peer as well. You have to disable/enable the peer to reconnect... For something as critical as BGP, there's a lot of 'issues' with the MT implementation. I can't say I'm entirely happy running MT on the BGP side at this stage.
I have not seen an initial load with CHR that takes over 1 minute 20 seconds from start to routes being active. My issue is after there's some sort of flapping with an already established peer, when this happens, it takes a LOT of time to load all the routes that peer was sending again.

Also I don't believe 2 minutes is a lot of time on BGP related stuff, I've worked with Cisco ASR 1XXX and this seems to be the standard time. Maybe the guys from IParchitechs can say anything about this since they seem to use CHR a lot.

Other than this, have you experienced any other issue?

btw, not related to this, but is there some sort of documentation about performance of CCR when it's working as a route reflector for a combination of IP, VPNv4, VPLS and IPv6? for around, say 2.000.000 routes (combined from all the families above).
 
savage
Forum Guru
Forum Guru
Posts: 1262
Joined: Mon Oct 18, 2004 12:07 am
Location: Cape Town, South Africa
Contact:

Re: Full BGP tables with two upstream ISPs using CHR - Performance question

Wed Mar 01, 2017 2:58 pm

I have not seen an initial load with CHR that takes over 1 minute 20 seconds from start to routes being active. My issue is after there's some sort of flapping with an already established peer, when this happens, it takes a LOT of time to load all the routes that peer was sending again.
Consider the routing table:
AB a.b.c.d gw 1.2.3.4
B a.b.c.d gw 1.2.3.5

When the active route is removed from the routing table (say peer 1.2.3.4 goes down), it takes *minutes* for the backup route via 1.2.3.5 to become active. When 1.2.3.4 is established again, it will take minutes for it to become the active / preferred route in the routing tables again.

Definitely not 'standard' or 'normal'. I've seen (and personally worked on) Cisco's with 800K routes switching between active/backup paths within seconds (both routes are already in the FIB). Yes, once the peer re-establish, it could take a while for the routes to make it back into the FIB, but with two (or more) routes ALREADY in the FIB, it should most definitely NOT take minutes for a backup path to become active. I have paths in my routing table where I have over 10 paths to a peer, and when the active peer goes down, I'm stuck without a active route to the prefix for quite a long time.

Point I'm trying to make, is it's not only BGP that's slow. With large routing tables, even MT's "FIB" becomes slow. The routing table itself, is slow to update / change.

Personally, I haven't experienced any other issues (other than buggy route-filters which requires a lot of TLC when handling them), other than what has been mentioned.
 
User avatar
shaoranrch
Member Candidate
Member Candidate
Posts: 184
Joined: Thu Feb 13, 2014 8:03 pm

Re: Full BGP tables with two upstream ISPs using CHR - Performance question

Wed Mar 01, 2017 3:33 pm

I just did a test inducing a flapping 5 times on a CHR, basically takes eons to load back the full table as I commented, but the last time I left the peering disabled. What I'm seeing is the device keeping a number of routes in the FIB as ACTIVE even though there's no BGP session at all, the routes are stuck in there (97658 in total), been like this for the past 15 minutes.

Now that I recall, I saw this behavior weeks ago, with the same number of routes getting stuck, it goes like this:

The peer goes down, then up, then down
CHR starts removing all the routes from the FIB (slowly), goes from 600k or so routes to 97658, it keeps in this states unless rebooted (admittedly I haven't had the patience to wait more than 20 minutes until I just reboot the instance).

It's RouterOS 6.37.4
 
User avatar
sri2007
Member Candidate
Member Candidate
Posts: 205
Joined: Wed May 20, 2015 10:14 pm
Location: Lake Grove, NY

Re: Full BGP tables with two upstream ISPs using CHR - Performance question

Sat Feb 10, 2018 3:05 am

Hi guys!! We’ve worked with several CHR as eBGP routers, and those have a good performance, we also recommend to install an hypervisor because you can add any CHR that you require... however, while we wait for RouterOS 7 which may handle BGP in a multicore way, there are some extra awesome routers which are ESXi servers with an i7 processor at 4.2GHz per core which can load a Full Routing table (600k routes) in about seconds, if want to do this in another router it can take some minutes to los everything.
 
merlinios
just joined
Posts: 21
Joined: Sat Oct 07, 2006 9:37 pm

Re: Full BGP tables with two upstream ISPs using CHR - Performance question

Fri Mar 16, 2018 4:20 pm

Hi guys!! We’ve worked with several CHR as eBGP routers, and those have a good performance, we also recommend to install an hypervisor because you can add any CHR that you require... however, while we wait for RouterOS 7 which may handle BGP in a multicore way, there are some extra awesome routers which are ESXi servers with an i7 processor at 4.2GHz per core which can load a Full Routing table (600k routes) in about seconds, if want to do this in another router it can take some minutes to los everything.
So the final question is if someone can use Mikrotik for a small ISP with 3 upstream providers with full BGP Feeds from them using x86 hardware. Do you recommend this ?
 
User avatar
TomjNorthIdaho
Forum Guru
Forum Guru
Topic Author
Posts: 1492
Joined: Mon Oct 04, 2010 11:25 pm
Location: North Idaho
Contact:

Re: Full BGP tables with two upstream ISPs using CHR - Performance question

Fri Mar 16, 2018 8:42 pm

Hi guys!! We’ve worked with several CHR as eBGP routers, and those have a good performance, we also recommend to install an hypervisor because you can add any CHR that you require... however, while we wait for RouterOS 7 which may handle BGP in a multicore way, there are some extra awesome routers which are ESXi servers with an i7 processor at 4.2GHz per core which can load a Full Routing table (600k routes) in about seconds, if want to do this in another router it can take some minutes to los everything.
So the final question is if someone can use Mikrotik for a small ISP with 3 upstream providers with full BGP Feeds from them using x86 hardware. Do you recommend this ?
Well, I am fairly new to running BGP on a CHR , it does appear to work fairly well.
I am currently running BGP with full tables on my CHR 64-Bit system which is a virtual machine on a VmWare ESXi 6.5.0
My physical server is a SuperMicro with 128 Gig ram & two physical Intel Xeon 3 Ghz 10-core processors (hyper-threading disabled) (20 cores total) -and- 10-Gig network cards
I assigned my CHR BGP (64-bit) router 6-Gig of ram (more than needed) and 8 processors & vmnic3 10-Gig ethernet interfaces
My CHR is a license level: P unlimited

I would not recomend running BGP on a 32-Bit x86 ROS router. The x86 32-bit ROS does not support paravirtual vmnic3 ethernet interfaces and (for me) x86 32-Bit has been subject to hundreds of lockups when under heavy load, Also the x86 ROS 32-Bit version has limited useable maximum memory , vs the CHR 64-Bit version that can use greater than 2-Gig of ram memory. Lots and lots of ram memory is critical when running BGP.

FYI - on boot-up, my CHR appears to load the BGP tables (using 10-gig interfaces) in seconds (not minutes). :)

Although I am still new to BGP on a CHR 64-Bit system, I give it thumbs-UP

EDIT: Note - Currently , Mikrotik's ROS BGP is CPU-single-threaded (aka BGP only runs on 1 of possible CPUs) ((( At least that is what I think & have read))) --- Thus I suspect you want a very fast CPU processor core to run BGP.
- Note: Getting full BGP tables on a Mikrotik CCR1016-12S-1S+ with a 1-gig interface takes up to a minute (tilegx 1.2 GHz 16-core-CPU)
- Note: Getting full BGP tables on a Mikrotik CHR with a 10-Gig vmxnet interfaces takes only a few seconds (Intel Xeon 3 GHz 8-core-CPU w/ 25-Meg CPU cache)

North Idaho Tom Jones
 
User avatar
sri2007
Member Candidate
Member Candidate
Posts: 205
Joined: Wed May 20, 2015 10:14 pm
Location: Lake Grove, NY

Re: Full BGP tables with two upstream ISPs using CHR - Performance question

Tue May 01, 2018 4:26 pm

btw, you can check this link for a most specific analysis too: https://mum.mikrotik.com/presentations/ ... 562405.pdf
 
jmginer
Member Candidate
Member Candidate
Posts: 153
Joined: Tue Dec 11, 2012 4:56 am
Contact:

Re: Full BGP tables with two upstream ISPs using CHR - Performance question

Mon Sep 24, 2018 4:44 pm

btw, you can check this link for a most specific analysis too: https://mum.mikrotik.com/presentations/ ... 562405.pdf
Hello!, thanks to share this!!!

In your tests with Proxmox, you have only generated less than 80,000 PPS, however with ESXi and Hyper-V you have exceeded +500,000 PPS.

Does Proxmox have a problem managing PPS?
 
User avatar
seriousblack
newbie
Posts: 36
Joined: Tue Apr 03, 2018 4:02 am
Contact:

Re: Full BGP tables with two upstream ISPs using CHR - Performance question

Mon Dec 14, 2020 4:44 pm

Hello..

Using a 1036 with two peers with full tables and some local peers.

Everything seems to be okay. No delays / -ve's
 
ste
Forum Guru
Forum Guru
Posts: 1924
Joined: Sun Feb 13, 2005 11:21 pm

Re: Full BGP tables with two upstream ISPs using CHR - Performance question

Mon Dec 14, 2020 6:31 pm

Interesting stuff: https://blog.kroy.io/2019/08/23/battle- ... l-routers/
May be the older chr-Kernel slows things down a bit.

Throw as much cpu power at vmware/chr as possible ...
 
User avatar
TomjNorthIdaho
Forum Guru
Forum Guru
Topic Author
Posts: 1492
Joined: Mon Oct 04, 2010 11:25 pm
Location: North Idaho
Contact:

Re: Full BGP tables with two upstream ISPs using CHR - Performance question

Mon Dec 14, 2020 9:11 pm

Interesting stuff: https://blog.kroy.io/2019/08/23/battle- ... l-routers/
May be the older chr-Kernel slows things down a bit.

Throw as much cpu power at vmware/chr as possible ...
I don't see these test results as being truly accurate.
I can think of dozens of possible not-considered issues which can skew the results - such as :
- On the physical test hyper-visor box , what physical CPUs/cores were running more than one virtual-machine ( job ) at the same time ?
- Was hyper-threading enabled or disabled ?
 
User avatar
j2sw
Member Candidate
Member Candidate
Posts: 131
Joined: Mon Sep 04, 2006 5:42 am
Location: Indiana
Contact:

Re: Full BGP tables with two upstream ISPs using CHR - Performance question

Sat Dec 19, 2020 1:29 pm

We have multiple instances where we have CHRs running with 2+ full internet v4/v6 feeds. Here are my observations.

1.Make sure your convergence time issues are not related to being able to pass traffic. I have seen many instances where the underlying hardware was not enough to keep up with the traffic. The forwarding of traffic was slowing down the router more than BGP.

2.Convegence time can be slow, even on a CHR. We always pull in a default route plus full routes for this purpose. This way traffic can be shoved out a default route while convergence is happening. traffic going somewhere is better than traffic going nowhere.

3.Anytime I make BGP changes I typically wait 10-15 minutes before I start getting worried. Announcement changes usually happen within the allotted BGP 3 minutes. It usually takes the route table quite a bit longer to converge. It also depends on what you do. If you are adding a new peer convergence time is usually pretty quick. If you drop a peer and don't wait for the tables to flush it can take a while because that CPU is macing out withdrawing the routes while trying to pull in new routes.

It can be done, you just have to account for the limitations until v7 is ready for prime time.
Last edited by j2sw on Sat Dec 19, 2020 1:46 pm, edited 1 time in total.
 
ste
Forum Guru
Forum Guru
Posts: 1924
Joined: Sun Feb 13, 2005 11:21 pm

Re: Full BGP tables with two upstream ISPs using CHR - Performance question

Sat Dec 19, 2020 1:43 pm

We have multiple instances where we have CHRs running with 2+ full internet v4/v6 feeds. Here are my observations.

1.Make sure your convenience time issues are not related to being able to pass traffic. I have seen many instances where the underlying hardware was not enough to keep up with the traffic. The forwarding of traffic was slowing down the router more than BGP.

2.Convegence time can be slow, even on a CHR. We always pull in a default route plus full routes for this purpose. This way traffic can be shoved out a default route while convergence is happening. traffic going somewhere is better than traffic going nowhere.

3.Anytime I make BGP changes I typically wait 10-15 minutes before I start getting worried. Announcement changes usually happen within the allotted BGP 3 minutes. It usually takes the route table quite a bit longer to converge. It also depends on what you do. If you are adding a new peer convergence time is usually pretty quick. If you drop a peer and don't wait for the tables to flush it can take a while because that CPU is macing out withdrawing the routes while trying to pull in new routes.

It can be done, you just have to account for the limitations until v7 is ready for prime time.
Yes. But it is very annoying/worrying waiting this 15 minutes. Esp last winter we had one time where too much snow dropped the 80GHz link between 2 BGP CCRs (This link has a hell lot spare signal and a backup link). You sit watching bgp sessions while alarms hitting in. BGP sessions did not settle until I reboot one side. With this wait times between there is a lot of time passing by.
 
User avatar
j2sw
Member Candidate
Member Candidate
Posts: 131
Joined: Mon Sep 04, 2006 5:42 am
Location: Indiana
Contact:

Re: Full BGP tables with two upstream ISPs using CHR - Performance question

Sat Dec 19, 2020 1:53 pm

We have multiple instances where we have CHRs running with 2+ full internet v4/v6 feeds. Here are my observations.

1.Make sure your convenience time issues are not related to being able to pass traffic. I have seen many instances where the underlying hardware was not enough to keep up with the traffic. The forwarding of traffic was slowing down the router more than BGP.

2.Convegence time can be slow, even on a CHR. We always pull in a default route plus full routes for this purpose. This way traffic can be shoved out a default route while convergence is happening. traffic going somewhere is better than traffic going nowhere.

3.Anytime I make BGP changes I typically wait 10-15 minutes before I start getting worried. Announcement changes usually happen within the allotted BGP 3 minutes. It usually takes the route table quite a bit longer to converge. It also depends on what you do. If you are adding a new peer convergence time is usually pretty quick. If you drop a peer and don't wait for the tables to flush it can take a while because that CPU is macing out withdrawing the routes while trying to pull in new routes.

It can be done, you just have to account for the limitations until v7 is ready for prime time.
Yes. But it is very annoying/worrying waiting this 15 minutes. Esp last winter we had one time where too much snow dropped the 80GHz link between 2 BGP CCRs (This link has a hell lot spare signal and a backup link). You sit watching bgp sessions while alarms hitting in. BGP sessions did not settle until I reboot one side. With this wait times between there is a lot of time passing by.

I feel your pain. I look at it as a business decision. When I talk to clients about such issues the conversation usually goes one of two ways.

Way 1
Me: Your option for upping BGp performance is Juniper, Cisco, Whitebox or Arista.
them: Cool. Whats that going to take.
Me: A used Cisco that will route multiple 10 gigs of traffic is going to be around $6,000 on the used market. Add on smartnet for another $1000 a year.
Them: what about the others?
me: they are around the same price for a decent router.
Them: I can live with my little $1000 router and still afford to have a spare on the shelf. I will deal with it.

Way 2:
them: Great. Let me write a check for the router and a spare. About $15k you say?
Me: Yup. Do you know Cisco?
Them: no. Whats that gonna cost
Me: We charge $150 an hour. Will take 5-10 hours to get this implemented. Any changes after that we can discuss.

I just ordered a Cisco because I can't find hardware that is not a long Dell server that can pass 40+ gigs of traffic. That cost me $15,000 USD. The router was $6,000 and the 40 gig card was $7000
 
ernieball17
just joined
Posts: 16
Joined: Thu Jan 28, 2021 9:55 pm

Re: Full BGP tables with two upstream ISPs using CHR - Performance question

Thu Nov 18, 2021 2:25 am

Hi guys!
I'm testing a CHR hosted on proxmox with 1Gb eth nic, 2 Gb of RAM and 12 cores @2,9 GHz. I need to simulate the process of a full routing table flood with a linux VM that I find and generate routes to the CHR. The weird is the time needed to take routes, almost 1 hour for 200k and 2 and a little bit more for 500k. Anyone has any idea what could be wrong? I saw some tests were the amount time of routes flooding takes 1 or 2 minutes

Who is online

Users browsing this forum: donkeyKong and 19 guests