I want to try CCR1072 as a pppoe server with shaping for ~10k simultaneous connections.
There’s roughly 10 different possible data rates that a client might get (e.g. 2Mb/s, 20Mb/s, 40Mb/s, 60Mb/s, 100Mb/s)
After reading the Wiki I’m still confused about what kind of shaping to implement for large number of clients.
The most obvious way is to rely on the default way where RADIUS provides the desired speed and CCR creates a dynamic simple queue for each connection.
But the documentation says that PCQ is better suited for large number of clients. But in this case I don’t really see a non-hacky way to assign the right speed to a client.
Could someone share wisdom?
Thanks!
P.S.:
Is there a good way to test a large number of PPPoE connections in a lab?
We are still working on the scripting to allow us to test bandwidth at a scale over 10,000 connections, but virtual machines are a must if you want to test with those numbers and you’ll need a decent amount of RAM and 10 gig cards if you want to stress the 1072.
Here is a quick look at what we have done so far with PPPoE:
I did read your blog regarding mikrotik stress-test. Probably your 80G bw post was the reason we decided to buy a 1072 in the first place. So thanks for your efforts, I hope we will enjoy replacing our Cisco ASR1004 with MT1072 once all the testing is over.
Btw, why exactly do you need virtual machines to do the testing? Isn’t it just addition load for the CPU?
Oh, it’s quite simple. ASR was quite pricey when we were purchasing it in the first place. And now we’re about to hit the 10G limit, which means another pricey upgrade.
If the ccr1072 experiment works out, I’d be happy to sell our ASR and live happily ever after
Really? Is that your personal experience?
Because before actually buying a CCR I contacted the official mikrotik support and asked if such loads can be handled. They replied that one or two CCR1072 should be able to handle ~15k pppoe simultaneous connections with decent traffic.
I’d love to hear some official comment.
But anyway we’re planning to test it live some time next week and see how many actual, live sessions it can handle.
Btw, is there anything specific we should do to optimize for such loads? Coz right now we’re pretty much planning to go with just making our RADIUS provide Mikrotik specific fields to do the shaping.
We have worked with several Telcos that use ASRs as their BRAS and I have to say for a router that can cost upwards of $100,000 with licensing and modules, I was not very impressed with the amount of PPPoE traffic it could handle vs. a CCR.
We started to see CPU spiking at 15k connections on an ASR1002 and it was supposed to be rated at more than 25k for the configuration we had according to Cisco TAC and the release notes.
Virtual machines allow us to build just about any network topology quickly without having to physically rework the lab. Also, some processes are more efficient with x86 than on a CCR, so we use MikroTik VMs in addition to CentOS to build whatever environment is needed for the test.
Also, we do a lot of work integrating MikroTIk with Cisco /Juniper/etc in Data Centers and Service Providers, so it’s helpful to be able to spin up a Cisco or Juniper VM to test the design for a network integration.
Currently we have 4 VMWare ESXi 6.x hosts in our lab that can generate up to 80 Gbps of traffic collectively.
See how badly it distrubutes the load. Actually watching it live at peak hours where bandwidth get a bit higher, you can easily notice that some CPUs reach %100.
Nevertheless it does its job up to maybe 1500 users.
I hope v7 comes in with more multi threading support.
Using around 10 Ip firewall mangle rules to mark specific traffic and allow higher speed for it and of course Dynamic Simple queues per user for Internet speed limits.
So far with approx of 200 PPPoE sessions connected and around 150-160 Mbps of traffic, and I already started seeing some of the CPU cores reach 70-80%.
I have few questions
I’m right now using only 2 x 10Gig ports (1 is for WAN and 2nd is for LAN). Will it be more optimised if I use more 10Gig ports?
I’m doing NATing on the CCR1072 as well, is this the main CPU eater?
All right, so here’s a quick summary of our testing:
At about 3900 online pppoe sessions and ~1900 shaping queues we started seeing very high cpu load for all 72 cores (and at some point we saw them ALL busy in the range between 90% and 100%).
We also did speed tests: one under the account with speed limit and the other under the account with no shaping restrictions.
The unrestricted account worked well and could reach ~100Mbit/s. The user the had a speed limit at 60Mbit/s could reach only ~50Mbit/s. So we had to conclude that the service quality actually degraded.
Tonight we decreased the amount of sessions. And our CCR1072 is now serving about 3000 pppoe sessions with shaping rules for about 1500 users. At these numbers we see proper results at speed tests, so it pretty much looks like the actual limit this device is capable of in our configuration.
At peak moment they were consuming 2.43Gb/s(in)+1.2Gb/s(out),and 356Kpps(in)+239Kpps(out).
At that time we saw ~50 cores having load in a range of 30-60% and ~22 cores having load within 60-90%.
As I said it looks like our current configuration can serve ~3000 session without service degradation.
In your case, you should actually do some tests to find out. But I’m pretty sure that the 100% load is a sign of problems.
I think that some of the tasks you perform on your CCR is very badly optimized core-wise. It may be NAT, but you need to do some testing to find out. We do not NAT our users at CCR. We only terminate PPPoE sessions and do traffic for shaping some of them.