Advice on how to grow an ISP network

Hi everyone,

I’m a small ISP but we are growing and I’m at a loss on how to grow the network efficiently, as of now the topology looks like this:

image_2024-05-15_113957206.png
We don’t have much users but most of them are companies which requite 500M to 5G, everything is connected by fiber. The reason I’m creating this topic is, if possible, to inquire on ideas on how to keep connecting more services.

I thought about connecting a switch on on a SFP+ port and reach the CP with one vlan each, but then I thought about the throughput the 10G port will give me and what will be the limitations with this, what would be the best logic to overcome this?

I hope I’m not asking too much of the community and I’ll appreciate any ideas.

Kind regards,

“everything is connected by fiber” => So you have actual fiber-pairs coming in for each of your customers ??
Or you have some wholesale-service that you take from a larger ISP that does the last-mile to customers or something ?

We have fiber connecting our core routers and also our customers(which come out of our routers), some special services are connected with FTTx(olt), but what I’m concerned about is how to integrate more routers for more connectivity.

There are two different topics here: your BGP PE kit that connects your AS to upstreams and advertises your address space, and then your access network.

I would NOT connect customers directly to my network edge. The industry norm is to have aggregation switches that connect back to the core with high-density ports. Using the hardware in the picture, the 100G ports from the BGP would connect to an aggregation switch.

Provision customers with Q-in-Q, strip the outer VLAN on the aggregation switch, and trunk the inner VLAN back to the BGP kit.

You want to allow your customers to pick the outer VLAN ID (the outer VLAN ID can be decided by the customer; it makes no difference to you. The inner VLAN ID is picked by you and must not have overlaps).

I also don’t see any benefit of OSPF here; it’s just another potential point of failure. Essentially, if you are running full table BGP on those edge Tiks, you want NOTHING other than BGP running on them and your inner VLANs.

I found this guide to be very helpful for growing our network.

https://web.archive.org/web/20230326170759/https://stubarea51.net/2022/05/02/webinar-isp-design-separation-of-network-functions/

The original article is still available from IP ArchiTechs: https://iparchitechs.com/webinar-isp-design-separation-of-network-functions.

This one might also provide some general tips: https://www.daryllswer.com/edge-router-bng-optimisation-guide-for-isps

This is backwards and a bit misleading.

OSPF is fine for an IGP. It’s a little bit easier to work with if you’re not as familiar with all of BGP’s settings and tweaks, and can make redundancy between sites much easier to configure. I overlay BGP on top of OSPF, using OSPF for announcing the transport networks between sites, and BGP for announcing customer subnets. 600 customers, 20+ sites, multiple redundant paths, redundant everything in the core. It works great.

iBGP requires all routers to be fully meshed (i.e. establishing a connection between each of them, which can get unruly pretty quickly), or route reflectors that can be reachable from the whole network, or strategically staggered as you go.

Some operators use eBGP internally (using private ASN’s for each site). That can work as well, but MikroTik’s BGP implementation doesn’t support ECMP, ruling out the ability to load balance across redundant links. BGP overlaid on an underlying IGP (OSPF, soon IS-IS) allows ECMP to work.

You’ve got your tagging backwards. Customers don’t get to pick their VLANs facing you. That’s a no-no. They only pick the inner tags. And that’s only needed for site-to-site services (a.k.a. ELAN, EVPN, Metro-E, etc.), not for Internet (a.k.a. High Speed IP, Dedicated Internet Access).

First off, if you’re doing fiber and you’re looking at those kinds of speeds, it is standard to put a managed carrier-grade switch or ONT at the customer premise with a number of ports that can be dedicated to the various types of services you offer (i.e. ELAN vs. DIA). Many large US carriers use Ciena for circuit hand-off, but for simple tagging, MikroTik would work.

Also, if tag stacking is needed (again only for private circuits), the industry standard way is to use 802.1ad S-tags on the service provider network (ether type 0x88a8) for the outer tags, and then the customer tags their packets (if desired) with normal 802.1Q tags (ether-type 0x8100). One example is if a business customer wants to connect two sites together. You’d tag their traffic on both customer-facing ports with the same S-tag (outer tag). Then they could use C-tags on their own VLANs and your network wouldn’t care what it is. If you’re also providing Internet to them, you use a different S-tag at your CPE, typically on another port as customer gear shouldn’t be generating S-tags at all.

Looking at your design, here’s what I’d do pretty quickly.

ISP A → 2216 no. 1
ISP B → 2216 no. 2

Customers → 2216 no. 3 (& 4?)

2216 nos. 1, 2, & 3 (& 4) all connected via backbone.


Internet comes into border routers. Borders may aggregate all routes between each other, or may feed into a central core where all routes learned via BGP are combined.

Customers generally meet at edge routers. Edge routers then feed into aggregation routers, which can be regional or at the core. Functions such as CGNAT, PPPoE, MPLS, queuing, etc. are handled in the core. Your core can be a single router handling multiple functions, or a stack of routers each performing separate functions. The manipulated (i.e. Internet-ready) customer traffic is then handed to the BGP routers.

When you’re small, it’s not uncommon to have just a couple of routers that do everything, but doing that begins to complicate the configuration and makes redundancy difficult. If you’re planning to grow, it’s best to start separating responsibilities, as explained in the links the others have referred to already.

2216 nos. 1 & 2 are your borders. The third 2216 is a customer-facing edge. As already suggested, move all customers off 2216 no. 2 to a separate router, possibly another 2216. If you don’t need the CPU horsepower (i.e. no queueing or MPLS/VXLAN tagging), a CRS can route in hardware. The switches that most closely match your 2216’s port capacity are the CRS317, CRS510, and CRS518.

Depending on your backbone utilization, if customers don’t need more than 10Gbps handoff, I’d use one or more CRS317’s to face customers, with 2 or 3 10Gbps links LAG’d back to a 2116 or 2216, and possibly another aggregation switch in between. If you expect a huge amount of growth quickly, then use CRS510 or CRS518 to face the customers and uplink to a 2216 over the 100G ports, or LAG a couple 25G ports and use the 100G ports as cross connects between the 2216’s.

(I love designing this stuff. My forum profile has my contact info, should you feel so inclined.)

Really great overview and summary! You’re clearly passionate about designing network architectures. Totally agree with you on OSPF and the challenges of iBGP full mesh.

+1

Hi everyone thank you for your replies and I’m sorry it took my a while to update.


Sirbryan thank you for the insight, really helped with more ideas on how to do things. chronos31337 and Larsa thank you for the link to the articles I will pay them a visit.


I hope I didn’t messed up badly but what I came up with is as follows:
image_2024-06-19_180207250.png
I know that I’m not supposed to connect clients directly to my core, but because we are just starting and we didn’t expect to grow this much this fast, we didn’t have much choice but we are buying new equipment and I’m trying to do better so I will try to fix it.

As I show above I think my idea to deliver to clients is still wrong considering that the sw is directly connected to one of my core routers, am i correct or this is acceptable?

Thank you again for all you knowledge.

Kind regards,

Depending on how large your POPs are determines whether you have one just one router for customers to connect to, or a stack of routers, with one being customer-facing (PE or Provider Edge), and another one at the POP being that POP’s core. On small networks, like mine, I have one switch/router facing the customers’ routers. That router then talks to other site routers as part of a ring. Those all come back to two aggregation routers at my core.

On larger networks, you might have a few PE switches or routers that face the customers’ routers, and then those switches or routers combine into the site’s (POP’s) “core”. The POP core could be one big beefy router or a stacked pair. This would mainly be needed to handle larger volumes of ports or of traffic and guarantee higher uptimes. Similar to the small-site design, the POP cores all connect to each other (like a ring layout) and/or back to an aggregation POP or straight to the core (like a star/hub & spoke layout).

Best I can tell, your chart looks fine. As long as you keep the separate layers in mind (Border, Core, Edge, with varying degrees of Aggregation devices sandwiched between those three), you can repeat the pattern and scale effectively.