ROS/x86 3.0rc13 bugs (winbox, pppoe server)

  1. Tried to use filter in interface graphing to just show “vlan*” interfaces. The winbox appliction closes just after I click “Filter” button.
  2. Under heavy load (>500 users) ROS suddenly stop accepting new connections and starts to disconnect already active PPPoE users. Winbox disconnects, after connecting again it don’t show anything in “interfaces”, “pppoe servers” sections.
    After reboot it looks good again for a while.

We have an L6 license, so the >500 users should be no problem..

And one more question: is it possible to rate-limit PADI requests?
On our current FreeBSD servers we have a custom script, which listens for PADIs and adds hammering users to pipe with 32kbit/s to slow down the requests (and to slow down the requests of server to radius).
Is it possible to make something like this in ROS?

We had the same problem. Under 300+ - 600+ PPPoE Tunnels v3 rc 13 is very unstable with a lot of packet loss, even CPU Load is around 50-70%

Downgrading to 2.9.X fixed problem : )

@rkorolev:

Can you provide me some help on this FreeBSD script of yours? I’m having the same problem with my BSD servers. I plan to migrate all of them to MT eventually, but not yet (it’s hard to let FreeBSD go like that =). If you prefer to continue off the forum, my e-mail is parrini(at)tdf.com.br.

:smiley: Thanks alot!

Sent by email, hope you can understand what it’s doing :slight_smile:

Last weekend we got a PPPOE AC (about >500 users) 2.9.46 upgraded to 3.0rc13. Its a Core2 Duo on a D946GZIS and it was a pain. Seems there are serious issues with this MB about ACPI and ROS 3.0rc13 with multi-cpu=yes was very unstable. Only downgrading to 2.9.x can fix the issue.

We contantly get a message on the console that looks like this:

unregistered_netdevice: waiting for ppp326 to become free. Usage count = 23

When this message pops up, the ethernet cards freezes for few seconds causing input errors and input discards. Perhaps MT should have a look at the kernel used. Googling arround you may find people talking about that it was fixed on kernel 427 (Fedora 2.6.6-1.x).

Today we are going to change that MB with another one that put correct interrupt assignments and also using a dual core processor (not core 2 duo). Will post results later.

We need to upgrade to ROS 3.0 due to issues with ROS 2.9.x about tcp mss (pppoe and mtu) and globo.com web mailing attachments for uploading.

edit We also have >1MB supouts available…

Not even a “blink” from MT about this issue? :confused: Guess it is even more serious than expected… :unamused:


Ive sent the supouts. We changed the whole hardware, got another MB that put cards on correct irqs and nothing! Just put a rc13 with pppoe server that have a high auth load (I mean people connecting/disconnecting all the time, normal production server) and there you can see ethernet ifaces freezing randomly. This eventually will lead you to reboot the server when it becomes inoperable. NO GOOD, about a thousand people calling our help desk. :frowning:

Why I shouldnt go back and put the ROS 2.9.46 up and running? Yes, I could and I will if I desire drink my cold beer in peace this weekend. BUT it is NOT what we gladly expect from a release candidate, right? Everything else rocks on ROS 3.0, just have a special look on this one, shall we?

Oh, I need more coffe… .

I beleive they were off for the holiday and just got back, give them time im sure there working on it.

I’ve got a reply from ROS Support:
PPPOE crash will be fixed in next release.

oh my pencils! :slight_smile: cheers

Give MT credit, they listen to their peers and users quite often. The proper documentation is mostly what is needed. supouts are part of it, but as well as screen shots, etc, I even would go as far as doing a screen capture with audio as a WMV or something for them to see what is happening.

RC is still a potential candidate, and the idea is that they feel it is a stable version, so now they are working out the bugs. More and more people will use the RCs and as they do, more and more will find issues etc. Only issue here is that its a NOS and it affects more than 1 person like most software.

PPPOE crash will be fixed in next release.

We are looking forward for it… :slight_smile: anxiously.

Give MT credit, they listen to their peers and users quite often.

I do. But sometimes (generally after some kind of tecnical suffering) I just get mad when there is a little lack of ack from them on this respective issue… No doubts they are busy.

For now Ive got a ticket already, did a big report back to Sergejs. Hopefully it will help and they will give us a proper feed-back. Like rkorolev said, it seems to be fixed soon for the next release.

The problem was temporaly solved downgrading back to 2.9.46. Its a bug for sure and it need a fix anyway, anxiously expected from who purchased level 6 licenses.

Only issue here is that its a NOS and it affects more than 1 person like most software.

Didnt got it, sry. :wink:


Thanks
Ozelo

Ozelo,

Do you maybe have integrated motherboard lan card enabled on this pppoe machine which mikrotik 2.9.x doesn’t see?

I’m having the same issue with PPPoE (approx 700 users). CPU goes to 100%, PPPoE process dies. Interfaces window blank, etc etc…

I’ve tried a pre-release 3.0RC14, but still no fix – don’t hold your breath on that one.

I love Mikrotik, I think that they took the linux-based router software to the next level, and I’m not going to bitch. I’ll patiently wait until the problem is solved for good.

but…

so far, all the later 2.9.x series and all the 3.0 RC series are doing the same thing. 3.0 being much worse than 2.9.x.

I’m with rkorolev on this one.. There needs to be a way to limit the PADI requests that are answered simultaneously. Say, answer 50 requests, wait until they 're authenticated and shaped, then answer more requests.

There’s a way to rate-limit PADIs with bridge filters:

/interface bridge filter
add action=accept chain=input comment="" disabled=no
 limit=X,Y mac-protocol=0x8863
add action=drop chain=input comment="" disabled=no
 mac-protocol=0x8863

But it’s not suitable when some clients are “hammering” with PADI, so other good clients will match the same queue and their requests will be dropped. There should be something per-source MAC queueing. Maybe it’s possible to program with scripts, but I don’t really know how to do it :frowning:

AFAIK from my mailing with MT support - PADI request limitation is hard coded in RouterOS and it IS 50 requests. And if server is busy it will not answer to the requests one way or another.

Do you maybe have integrated motherboard lan card enabled on this pppoe machine which mikrotik 2.9.x doesn’t see?

Yes, MT ROS 2.9.x does not recognize the onboard lan adapter on that GZIS motherboard, but ROS 3.0 do.

AFAIK from my mailing with MT support - PADI request limitation is hard coded in RouterOS and it IS 50 requests. And if server is busy it will not answer to the requests one way or another.

hmm.. I wonder at what point does the PPPoE server determine that the previous “batch” of 50 PADI requests have been handled completely… Does it wait until those sessions are authenticated, and the queues are set up?

If you using 2.9.x try disabling and see what will heppends with stability…

If you using 2.9.x try disabling and see what will heppends with stability…

I have NO problems at all with 2.9.x, the problem is with ROS 3.0 rc"X". Unfortunatelly, prc14 still the same. Definately, ROS 3.0 for some setups (high load setups like 500+ simultaneous pppoe sessions) is very unstable, leading to a eventual system halted. Simple setups (routerboards handling a wireless AP with no more than 50 simultaneous clients) we see absolutely no problem so far.

Just to mention, when it stop working we see on console the following:

unregistered_netdevice: waiting ppp1 to become free. Usage count = XXX

Where XXX is a number that gradually decrease one per second when you cut off the ethernet cable where have pppoe sessions.

We didn’t have this kind of problems with high load setups: 600+ pppoe

Just problem with packet loss for traffic going through this pppoe router.