PPPoE server performance and packet lose problems

Hi all,

we have big problems with performance of PPPoE server running 3.x versions. Today we run 3.4 version and still having problem. The problem is that we have packetlose on localhost (127.0.0.1) and all interfaces. It seems like it is a performance problem or problem of interface queues.

We run PPPoE server with quad core cpu xeon 2400, with board chipset intel 3000 server chipset with two gbit onboard ethernet cards. Traffic is up to 40/10mbit download/upload.

Enclosed is screen copy of our pppoe server. You can see packet lose to address 127.0.0.1 and high ping time response. Can someone observe the same problem?

We have this problem on all PPPoE server under heavy load.
pppoe-problem.png

I have this same problem here, I changed the machine and it remains to loose about 4% on all interfaces. Even with low CPU usage (20%) I have packet loss on all interfaces (even local).

I already tried to disable conntrack and TCPMSS to reduce processing but it didn´t solve the problem.

In fact I also saw this happening in 2.9.50 version, but not on early versions.

This is a huge problem that I cannot find a solution.

At least, I am not alone.

And what are you doing now? Downgrade back to 2.9.xx?

I have new observation.

When I switched off “Change TCP MSS” on all used profiles, so there were no dynamic iptable’s Mangle rules, the problem diminish. It is still here, but not so huge.

I only have one router using version 3.4 (the other ones still use old 2.9 versions).

For me disabling Change TCPMSS doesn´t solve the problem, I made this test today and even with this option disabled I continue to have packetloss (about 2%). Probably with TCPMSS off the packetloss is lower but it still not acceptable.

And I´m not talking about too much connections. With more than 100 pppoe connections it´s already possible to see packetloss on a Pentium D 3.4Ghz (with multicore enabled or not).

We have machines with a lot worst hardware than that using old 2.9 version and more than 500 users with 0% of packeloss..

I think the best way is report it to Mikrotik support, and pray for a solution as soon as possible.

If everybody who has this same problem report it to Mikrotik support they will pay more attention to it.

I did, but still no answer. I wanted to check if there are others that meet the problem.

Do you use queues?
We have the same problem with packet loss and one of the problems is that ROS drops a packet every time a queue is created or deleted.
When many users connecting/disconnecting you get a packet loss.
Maybe that’s not the only reason, but thats a reproducable and Mikrotik confirmed this issue.

Yes, I suppose that it can be problem. But in older versions it was correct. Maybe MT should consider this.
But not only packet lose, also high time response to locahost ping is problem. Maybe it is common problem.

I use queues and I agree that it could be related to problem. But i can´t stop to use queues!

If this problem was not solved soon this will make the pppoe server resource unusable for me and everybody I know that use it.

I believe MT is working on this issue.
But there is always workaround - use static queues or queue tree. There are also several benefits to use static queues:
*) few queues, instead of hundreds for each user
*) easy to manage
*) reduced CPU and memory usage.

Probably you can find more.

When I write about this nobody responded :slight_smile:

Even some of you write that we (my company) do not know to use mikrotik and other bullshits…

So:)? What now:) Who has the right about packet loss in v3.X ??? :slight_smile:

The problem is really in MT. From my experience I can tell you that there is not packet loss when you use mikrotik v3.X versions for other purpose, but especially when you use it together with PPPoE, packet loss appear immediatelly.
As number of pppoe tunnels grow, packet loss also grows. As bandwith in this router grows, packet loss also grows.

Then CPU Load go high as 90-100% and than router is not accessible.

Static queues are not suitable for at least one situation: when you have a number of bandwidth plans and you set the speed with “Mikrotik-Rate-Limit” radius attribute.

Maybe you’ll recommend, how to set up a speed limit with predefined static queue based on radius reply?
All users have dynamic IPs, so I can’t just shape “that group of ips to 128k, other ips to 256k, etc”.

We terminate up to 700-900 users on one PPPoE server, I suppose it is not possible to deal with static queues. It would be possible only if services will be the same for all users or there will be some other limitations.

It seems to be true - I watched log and ping and it seems that every time new PPPoE session is created or closed, packet lose occures.

So, it seems there are two problems:

  • packet lose when creating/closing pppoe session
  • high ping response to locahost when higher utilization even CPU is about 50%, so it can be problem of some queues like interface queues

Use static queues to solve this problem is impossible. Queues are created dynamically by radius attribute.

It will not help to resolve problem. We already tried.

If using static queues doesnt solve the problem so it´s not related with the creation process of the queue.

I have reply from MT support, they confirmed problem and they are working on solution.

Let´s cross our fingers and hope they find a solution as soon as possible.