PPPoE connection losses - more detailed logging?

Hey Mikrotik community,

Our router (RouterOS v 7.15.3) is connected via PPPoE to a DSL Internet uplink. Every few days, the router reports, that PPPoE connnection is lost and tries to reconnnect. Usually after some minutes (2-10 minutes) PPPoE connection is established again by the router. The DSL modem doesn’t report any disconnection, so I assume, that the DSL connection itself is not affected.

I’ve contacted the ISP, but they say, that they don’t see any problems on their site and I should provide more detailed information. Is there any way to automatically log what happens during such an event more detailed? Or does anyone have ideas how I can „prove“ that the PPPoE server of the ISP may be the problem?

Currently our Mikrotik router logs just messages like this, in case of such an event:

 00:33:16 pppoe,ppp,info pppoe-dsl: terminating... - disconnected 
 00:33:16 pppoe,ppp,info pppoe-dsl: disconnected
 00:33:16 pppoe,ppp,info pppoe-dsl: initializing...
 00:33:16 pppoe,ppp,info pppoe-dsl: connecting...
 00:33:16 pppoe,ppp,info pppoe-dsl: terminating...
 00:33:16 pppoe,ppp,info pppoe-dsl: disconnected
 00:33:16 pppoe,ppp,info pppoe-dsl: initializing...
 00:33:16 pppoe,ppp,info pppoe-dsl: connecting...
 00:33:26 pppoe,ppp,info pppoe-dsl: terminating... - disconnected
 00:33:26 pppoe,ppp,info pppoe-dsl: disconnected
 00:33:26 pppoe,ppp,info pppoe-dsl: initializing...
 00:33:26 pppoe,ppp,info pppoe-dsl: connecting...
 00:33:36 pppoe,ppp,info pppoe-dsl: terminating... - disconnected
 00:33:36 pppoe,ppp,info pppoe-dsl: disconnected
 00:33:36 pppoe,ppp,info pppoe-dsl: initializing...
 00:33:36 pppoe,ppp,info pppoe-dsl: connecting...
 00:33:46 pppoe,ppp,info pppoe-dsl: terminating... - disconnected
 00:33:46 pppoe,ppp,info pppoe-dsl: disconnected
 00:33:47 pppoe,ppp,info pppoe-dsl: initializing...
 00:33:47 pppoe,ppp,info pppoe-dsl: connecting...
 00:33:57 pppoe,ppp,info pppoe-dsl: terminating... - disconnected
 00:33:57 pppoe,ppp,info pppoe-dsl: disconnected
 00:33:58 pppoe,ppp,info pppoe-dsl: initializing...
 00:33:58 pppoe,ppp,info pppoe-dsl: connecting...
 00:34:08 pppoe,ppp,info pppoe-dsl: terminating... - disconnected
 00:34:08 pppoe,ppp,info pppoe-dsl: disconnected
 [...]
 00:40:42 pppoe,ppp,info pppoe-dsl: initializing...
 00:40:42 pppoe,ppp,info pppoe-dsl: connecting...
 00:40:52 pppoe,ppp,info pppoe-dsl: terminating... - disconnected
 00:40:52 pppoe,ppp,info pppoe-dsl: disconnected
 00:41:02 pppoe,ppp,info pppoe-dsl: initializing...
 00:41:02 pppoe,ppp,info pppoe-dsl: connecting...
 00:41:08 pppoe,ppp,info pppoe-dsl: authenticated
 00:41:08 pppoe,ppp,info pppoe-dsl: connected

I can only suggest sniffing on the underlying Ethernet interface. As the sniffing has to record all the traffic right before the disconnect, the amount of sniffed data will be proportional to your total traffic on that PPPoE interface. So unless your router has a USB port and you can sniff to a file on the USB drive, you’ll have to connect a PC running tcpdump or dumpcap and writing into a circular buffer of files, or sniff into a circular buffer in memory and use some script to stop the sniffer as soon as the issue appears. The uplink bandwidth and the router capacity are also important as sniffing is a low-priority process, so a switch chip capable of traffic mirroring, or an external device equipped with such a switch chip, may be necessary to deal with the traffic volume.

If you provide some details regarding all the above (Mikrotik device model, contract download an upload bandwidth), we can suggest the most appropriate way to sniff.

Thank you, sindy! Router is a RB5009, so it has an USB port and plenty of resources. Internet bandwidth is 270 down and 70 up MBit/s, but usually only small amount of bandwidth is actually used.

Currently I found a solution to debug pppoe messages with log level “debug” (before it was log level “info”). Do you think this is maybe unsufficient?

Even in debug, the logging typically shows only control traffic and omits the packets carrying only payload. So sniffing may show you e.g. long periods of silence in one direction and corresponding retransmissions in the other one. Less important, my personal experience is that people tend to accept sniffed traffic as a more reliable evidence than logs from an “obscure” (as in “unknown to them”) device.

I see. I will try now with debug logging and if that doesn’t work or if it is not “enough”, I will use traffic logging to an usb device. I think I will achieve that with Mikrotik docs.
Thanks for explaining possibilities!

Can you post your config of the PPPoE Client? Of course, without the PPPoE credentials, you can do it via /export.
As from the logs, the session is disconnected, which means that either the ISP or the client(your router) has send a PADT packet during the session to terminate it.
Why your router would send a PADT message to ISP?

  1. The PPP interface is either disabled or restarted.
  2. The Ethernet interface on which the PPP interface runs is disabled or restarted.
  3. You have the option “dial on demand” set to yes which makes the router to only connect to internet if there is outgoing traffic from LAN/Router itself.
  4. It received a PADT request from the ISP and it has to acknowledge(PADT ACK) and comply.
    Why your ISP would send a PADT message to close the session is their concern and not a RouterOS problem.
    For more logs you must add a new action and topic, see this: https://help.mikrotik.com/docs/display/ROS/Log
    Add the topic PPPoE & Debug so you could see the LCP messages between peers and find out if either you or the ISP is terminating the session via PADT.
    Store those logs on USB or Memory(RAM), not on the router’s flash because there will be a lot of logging which adds wear to flash types like NAND used in routers.
    Also, with the default logging in RouterOS, you can see if there was a problem with the communcation between you and ISP or the session is closed via PADT Packets which there is a layer 2 problem, not a layer 1(physical) issue. If the router can’t communicate with the ISP it will show a message with “Peer not responding”, if the session is closed via PADT, it simply shows “Disconnected”(as in this case).

Very useful details, thanks a lot!

My pppoe client configuration:

/interface pppoe-client
add add-default-route=yes allow=pap,chap,mschap2 disabled=no interface=etherX name=pppoe-dsl user=XXXX

I am not aware that point 1 or 2 is happening, at least not intentionally. Regarding pt. 3: this options is disabled.

I have configured pppoe & debug logging. Do you know, whether I can skip messages with topic “packet” as this would reduce log size? Or do I need that to monitor PADT requests/responses?

Yes, you need the packets to see the PADT requests. As I can see from your config, I think that the ISP is the one closing the session, does your ethernet interface goes down/up when the PPPoE interface disconnects? You can see this in WINBOX/WebFig noted as “Last link up/Last link down” followed by date and time. If no, then it’s your ISP’s fault, as stated before, the “Disconnect” means the session is intentionally closed by one peer(ISP or router), “Peer not responding” message suggests that the session was closed due to communcation error between your router and ISP’s acces concentrator.
In the second case, “Peer not responding”, your router will send LCP Echo requests to see if the session is ok and if the ISP is responding with LCP Echo replys, if not, the router terminates the session as it thinks the connection is down. LCP Echo requests/replys are only send when there is no other traffic between peers(also called “Adaptive LCP ECHO”), when the session is in idle and there’s no way of probing it is still alive other than sending this LCP packets similary to pinging a host to see if it is up and you can reach it. During PPPoE Established state both peer must respond to LCP ECHO Requests, your ISP too is probably checking your router with this type of packets to see if the session is up.

I see, great explanation. Regarding the underlying ethernet interface status, it’s difficult to tell, because I had to restart the interface today afternoon for some other reason, but I am pretty sure, there was no failure of the ethernet “connection” itself in the past. However, I’ll watch that too.

Thank you very much for sharing your knowledge!