PPPoE Cient: random no PADO response

Hi everyone,
we have a WISP all (or almost all) with Mikrotik hardware. Sometimes, random, some clients CPE, after a reboot, are not able to connect to internet. This happen because the PPPoE into the client (SXT or LHG) did not receive a PADO response after the PADI request:


Jun/26/2018 09:41:52 pppoe,ppp,info pppoe-internet: initializing...
Jun/26/2018 09:41:52 pppoe,ppp,info pppoe-internet: connecting...
Jun/26/2018 09:41:52 pppoe,debug,packet wlan1: sent PADI to FF:FF:FF:FF:FF:FF
Jun/26/2018 09:41:52 pppoe,debug,packet     session-id=0x0000
Jun/26/2018 09:41:52 pppoe,debug,packet     host-uniq=0xae001e
Jun/26/2018 09:41:52 pppoe,debug,packet     service-name=
Jun/26/2018 09:41:53 pppoe,debug,packet wlan1: sent PADI to FF:FF:FF:FF:FF:FF
Jun/26/2018 09:41:53 pppoe,debug,packet     session-id=0x0000
Jun/26/2018 09:41:53 pppoe,debug,packet     host-uniq=0xae001e
Jun/26/2018 09:41:53 pppoe,debug,packet     service-name=
Jun/26/2018 09:41:54 pppoe,debug,packet wlan1: sent PADI to FF:FF:FF:FF:FF:FF
Jun/26/2018 09:41:54 pppoe,debug,packet     session-id=0x0000
Jun/26/2018 09:41:54 pppoe,debug,packet     host-uniq=0xae001e
Jun/26/2018 09:41:54 pppoe,debug,packet     service-name=
Jun/26/2018 09:41:55 pppoe,debug,packet wlan1: sent PADI to FF:FF:FF:FF:FF:FF
Jun/26/2018 09:41:55 pppoe,debug,packet     session-id=0x0000
Jun/26/2018 09:41:55 pppoe,debug,packet     host-uniq=0xae001e
Jun/26/2018 09:41:55 pppoe,debug,packet     service-name=
Jun/26/2018 09:41:56 pppoe,debug,packet wlan1: sent PADI to FF:FF:FF:FF:FF:FF
Jun/26/2018 09:41:56 pppoe,debug,packet     session-id=0x0000
Jun/26/2018 09:41:56 pppoe,debug,packet     host-uniq=0xae001e
Jun/26/2018 09:41:56 pppoe,debug,packet     service-name=
Jun/26/2018 09:41:57 pppoe,debug,packet wlan1: sent PADI to FF:FF:FF:FF:FF:FF
Jun/26/2018 09:41:57 pppoe,debug,packet     session-id=0x0000
Jun/26/2018 09:41:57 pppoe,debug,packet     host-uniq=0xae001e
Jun/26/2018 09:41:57 pppoe,debug,packet     service-name=
Jun/26/2018 09:41:58 pppoe,debug,packet wlan1: sent PADI to FF:FF:FF:FF:FF:FF
Jun/26/2018 09:41:58 pppoe,debug,packet     session-id=0x0000
Jun/26/2018 09:41:58 pppoe,debug,packet     host-uniq=0xae001e
Jun/26/2018 09:41:58 pppoe,debug,packet     service-name=
Jun/26/2018 09:41:59 pppoe,debug,packet wlan1: sent PADI to FF:FF:FF:FF:FF:FF
Jun/26/2018 09:41:59 pppoe,debug,packet     session-id=0x0000
Jun/26/2018 09:41:59 pppoe,debug,packet     host-uniq=0xae001e
Jun/26/2018 09:41:59 pppoe,debug,packet     service-name=
Jun/26/2018 09:42:00 pppoe,debug,packet wlan1: sent PADI to FF:FF:FF:FF:FF:FF
Jun/26/2018 09:42:00 pppoe,debug,packet     session-id=0x0000
Jun/26/2018 09:42:00 pppoe,debug,packet     host-uniq=0xae001e
Jun/26/2018 09:42:00 pppoe,debug,packet     service-name=
Jun/26/2018 09:42:01 pppoe,debug,packet wlan1: sent PADI to FF:FF:FF:FF:FF:FF
Jun/26/2018 09:42:01 pppoe,debug,packet     session-id=0x0000
Jun/26/2018 09:42:01 pppoe,debug,packet     host-uniq=0xae001e
Jun/26/2018 09:42:01 pppoe,debug,packet     service-name=
Jun/26/2018 09:42:02 pppoe,ppp,info pppoe-internet: terminating... - disconnected
Jun/26/2018 09:42:02 pppoe,ppp,debug pppoe-internet: LCP lowerdown
Jun/26/2018 09:42:02 pppoe,ppp,debug pppoe-internet: LCP down event in initial state
Jun/26/2018 09:42:02 pppoe,ppp,info pppoe-internet: disconnected

Into same period of time we have logged PPPoE requests into PPPoE Server and no PADI brodcast request from previous CPE was received.

Network is in bridge: CPE → Sectors (generally MANTBOX19) → Backbone Network → PPPoE

As said, this happen at random times and at random clients, every times with different equipment (SXT, SXT AC, LHG) and different RouterOS version (also with last available).

A strange thing: the CPE is registered into the sector and after a remote connection with Mac-Telnet from ours NOC operators the PPPoE Client immediatly connect!

Any solution?

Thanx in advance.

Up up up

It appears that some PPPoE servers can get into a bsf state where they previously had established a PPPoE session and
a PPP session has opened on top of that, then the link is lost for a while and the MikroTik PPPoE client decides that
it needs to be re-established and starts sending PADI indefinitely (well, it tries a few times, then issues an error message,
but when the link is always-on it immediately starts trying again).

When the server has not yet seen that the session was lost (longer limeout?) it will just ignore those incoming
PADI packets. Worse: it will not timeout as long as it keeps receiving them. So the link never recovers until the PPPoE
client is disabled for a few minutes, then the server will time out and reset the session and accept PADI next time.

So it would be better when the MikroTik PPPoE client sent a PADT at the end of each series of PADI at the time it gives
up, so the other side knows the session is closed and the next go it will accept the PADI and the session comes up.
Alternatively, there could be a “re-establish timer field” that tells the client how long to wait after PPPoE session
establishment has failed before it is tried again. When that is set to 5 minutes or so, it will work because the other
side has timed out by then.