PPoE + SIP Issue/Bug [Ticket#2018031922003491]

mTwUser · Tue Apr 10, 2018 2:12 pm

Sadly after over 2 weeks of waiting i have not received a proper respond from the support, it's also quite frustrating that the very specific description of the problem is getting straight up ignored, i received various suggestions that i already said won't work, support replied with a different thread of this forum with a completely different issue, after I replied with a description why it is definitely a different issue on march 29th i have not received any response.

The Problem

The problem probably lies in the connection tracking, interestingly enough, this only happens while using a PPoE connection, on PPTP or simple ethernet connections this issue can't be reproduced.

In this topic the issue was already described and there is a workaround -> reboot the router or create a new PPoE interface every other option (creating the connection manually, deactivate and activate session tracking and so on, please review the thread) has already been tried without success. Various RouterOS Versions have been tried!

A short summary with he most important details was provided by this post:
viewtopic.php?f=2&t=127858

Hi Forum Users, I am delighted to know i am not the only person experiencing this.

Issue Summary

* Same issue. When running PPPoE tunnel over VDSL, if VDSL tunnel drops / re-auths, the trunk becomes unreachable until the router has been rebooted.
* The issue is NOT limited to NAT / PBX's on private networks. This also affects systems on PUBLIC IP's.
* All other TCP/UDP traffic remains unaffected and continues to pass.
* SIP Debug (no apparent SIP responses are recieved by either side, e.g. OPTIONS, INVITE).
* Capture via TCPDump reveals that the packet is being sent by both instances of Asterisk but nothing being recieved on remote end.
* MikroTik Conntrack shows the session but no repl bytes / packets are recorded. This is further reflected by a lack of 'Seen Reply' flag.

The following steps have been attempted to detemrmine the cause and workable solution at the customer site WITHOUT REBOOTING. They have NOT worked.

* Reset sessions in MikroTik Conntrack.
* Stop Asterisk for 10 Minutes
* Reboot Asterisk
* Reboot Hypervisor
* SIP ALG on/off (tried both, does not matter).
* Static Default Route (with pref src set).
* PPPoE Dialler profile set to 'Default'.
* Redirect and retargetted 5080, (translated remotely to 5060), the trunk becomes reachable until a subsequent disconnect/reauth.
* Redirect and retargetted 5060, the trunk remains unreachable.
* Added port forward udp::5080->udp::5060

The following steps have been attempted to detemrmine the cause and workable solution at the provider site. They have NOT worked.

* Added redirect (IPTABLES POSTROUTING) ports from 5080 -> 5060 on Trunk box.
* Changed customer target port to 5080. The trunk becomes reachable until a subsequent disconnect/reauth.
* Changed customer target port back to 5060. The trunk remains unreachable.

Supplemental.

* I had set up the same test conditions at my lab. With RB2011, PPPoE (over true Fibre Optic) with VMWare workstation and a FreeBSD 10.3 / Asterisk 13 Server. I could not reproduce the error.
* Routed a public /30 to the customer.
* Added vlan interface to MikroTik w/ public IP.
* Added vlan inerface / portgroup to PBX.
* Assigned public IP to PBX.
* Changed last resort gw to new public /30.
* Removed NAT rules, specific to SIP / VoIP.
* Reconfigured SIP configs to listen/connect via/on new public IP.
* Established bi-directional trunk.
* Forced disconnect/re-auth of PPPoE.
* Trunk becomes unreachable until Reboot.

My thoughts.
* I suspect the MikroTik's kernel, subsequent to the disconnect/reauth is no longer processing the SIP packets, irrespective of the port used prior to the disconnect.
* It appears session beccomes stuck in the kernel likely due to internal RouterOS interface / session identification no longer existing.

Last but not least: today the problem also occurred while the ppoe connection was down for a couple of seconds, so it wasn't the classic 24 hour forced disconnection. Re-initiating and rebooting the PBX didn't help, also clearing the connection manually etc. etc. did not resolve the issue.

As a VoIP provider which uses mainly Mikrotiks at customer site, it is really frustrating to manually recreate such interfaces or reboot the Mikrotik, both kills the internet connection all over again and leads to a bad reputation at our customers. As of now we implemented a reboot script every night in order to avoid this, sadly a short downtime on a DSL line is inevitable, so we can not automate this process without resetting every connection because of this bug, also various filter rules and nat rules are bound to the ppoe interface which makes a script to create a new ppoe interface also not an viable option.

Please look into this!

rextended · Wed Apr 11, 2018 10:55 pm

Sadly after over 2 weeks of waiting i have not received a proper respond from the support, it's also quite frustrating that the very specific description of the problem is getting straight up ignored, i received various suggestions that i already said won't work, support replied with a different thread of this forum with a completely different issue, after I replied with a description why it is definitely a different issue on march 29th i have not received any response.

The Problem

The problem probably lies in the connection tracking, interestingly enough, this only happens while using a PPoE connection, on PPTP or simple ethernet connections this issue can't be reproduced.

In this topic the issue was already described and there is a workaround -> reboot the router or create a new PPoE interface every other option (creating the connection manually, deactivate and activate session tracking and so on, please review the thread) has already been tried without success. Various RouterOS Versions have been tried!

A short summary with he most important details was provided by this post:
viewtopic.php?f=2&t=127858

Hi Forum Users, I am delighted to know i am not the only person experiencing this.

Issue Summary

* Same issue. When running PPPoE tunnel over VDSL, if VDSL tunnel drops / re-auths, the trunk becomes unreachable until the router has been rebooted.
* The issue is NOT limited to NAT / PBX's on private networks. This also affects systems on PUBLIC IP's.
* All other TCP/UDP traffic remains unaffected and continues to pass.
* SIP Debug (no apparent SIP responses are recieved by either side, e.g. OPTIONS, INVITE).
* Capture via TCPDump reveals that the packet is being sent by both instances of Asterisk but nothing being recieved on remote end.
* MikroTik Conntrack shows the session but no repl bytes / packets are recorded. This is further reflected by a lack of 'Seen Reply' flag.

The following steps have been attempted to detemrmine the cause and workable solution at the customer site WITHOUT REBOOTING. They have NOT worked.

* Reset sessions in MikroTik Conntrack.
* Stop Asterisk for 10 Minutes
* Reboot Asterisk
* Reboot Hypervisor
* SIP ALG on/off (tried both, does not matter).
* Static Default Route (with pref src set).
* PPPoE Dialler profile set to 'Default'.
* Redirect and retargetted 5080, (translated remotely to 5060), the trunk becomes reachable until a subsequent disconnect/reauth.
* Redirect and retargetted 5060, the trunk remains unreachable.
* Added port forward udp::5080->udp::5060

The following steps have been attempted to detemrmine the cause and workable solution at the provider site. They have NOT worked.

* Added redirect (IPTABLES POSTROUTING) ports from 5080 -> 5060 on Trunk box.
* Changed customer target port to 5080. The trunk becomes reachable until a subsequent disconnect/reauth.
* Changed customer target port back to 5060. The trunk remains unreachable.

Supplemental.

* I had set up the same test conditions at my lab. With RB2011, PPPoE (over true Fibre Optic) with VMWare workstation and a FreeBSD 10.3 / Asterisk 13 Server. I could not reproduce the error.
* Routed a public /30 to the customer.
* Added vlan interface to MikroTik w/ public IP.
* Added vlan inerface / portgroup to PBX.
* Assigned public IP to PBX.
* Changed last resort gw to new public /30.
* Removed NAT rules, specific to SIP / VoIP.
* Reconfigured SIP configs to listen/connect via/on new public IP.
* Established bi-directional trunk.
* Forced disconnect/re-auth of PPPoE.
* Trunk becomes unreachable until Reboot.

My thoughts.
* I suspect the MikroTik's kernel, subsequent to the disconnect/reauth is no longer processing the SIP packets, irrespective of the port used prior to the disconnect.
* It appears session beccomes stuck in the kernel likely due to internal RouterOS interface / session identification no longer existing.
Last but not least: today the problem also occurred while the ppoe connection was down for a couple of seconds, so it wasn't the classic 24 hour forced disconnection. Re-initiating and rebooting the PBX didn't help, also clearing the connection manually etc. etc. did not resolve the issue.

As a VoIP provider which uses mainly Mikrotiks at customer site, it is really frustrating to manually recreate such interfaces or reboot the Mikrotik, both kills the internet connection all over again and leads to a bad reputation at our customers. As of now we implemented a reboot script every night in order to avoid this, sadly a short downtime on a DSL line is inevitable, so we can not automate this process without resetting every connection because of this bug, also various filter rules and nat rules are bound to the ppoe interface which makes a script to create a new ppoe interface also not an viable option.

Please look into this!

RouterBOARD?
RouterOS version?
(RouterBOOT firmware version?)

The SIP device have static or DHCP address?
If DHCP, DHCP Lease time? Fixed DHCP address or not?
SIP ALG detailed settings (default or not?)

mTwUser · Thu Apr 12, 2018 9:52 am

RouterBOARD?
RouterOS version?
(RouterBOOT firmware version?)

The SIP device have static or DHCP address?
If DHCP, DHCP Lease time? Fixed DHCP address or not?
SIP ALG detailed settings (default or not?)

Various Devices (RB2011, hex, hap ac, crsXXXX and so on)
Various RouterOS versions (starting from 6.37.5 till the newest version)
Firmware Version: Whichever was the most recent one on the device/routeros

We have phone/PBXs with static and DHCP addresses, doesn't matter (also i don't know why this should matter, this only happens if the PPoE tunnel is down and back up again)
SIP ALG: Mostly it's turned off, we tried turning it on but didn't change the settings.

Also - as stated in my post - this thread provides even more information, a SIP-Provider in Germany and another one in Australia has the same problem. viewtopic.php?f=2&t=127858 sadly it got marked as "resolve" because the workaround was acceptable for the OP.

Redmor · Wed May 16, 2018 4:40 pm

Try what I wrote in last reply:

viewtopic.php?f=2&t=133907&p=658818#p658818

mTwUser · Fri May 25, 2018 3:38 pm

Try what I wrote in last reply:

viewtopic.php?f=2&t=133907&p=658818#p658818

Ok this seems quite interesting, could you elaborate this a bit further? I think it's not quite the same since I'm expiriencing the problem on a single PPoE Interface.

_____________________

As I discussed with support:
The problem is an old PPPoE + UDP kernel bug in Linux, the problem will get fixed with RouterOS v7 since it's using a new kernel.

Sadly still no ETA on v7.

Just expirienced the problem again with L2TP over PPPoE, sometimes recreating the Interface doesn't even work.

mTwUser · Thu Jun 07, 2018 5:30 pm

Our ISP who mainly works with Mikrotik as well has found a work around, you need to implement a script on the PPP profile of your ppoe interface. Implement it "On Down" - in this case pppoe-out1 uses ether1 to establish the connection (change this as you desire). 5 Seconds of delay seem to work very well for this.

:if ([/interface ethernet find name=ether1 disabled=no] !="") do={
:if ([/interface pppoe-client find name=pppoe-out1 disabled=no] !="") do={
  /interface pppoe-client disable pppoe-out1
  /interface ethernet disable ether1
  delay 3
  /interface ethernet enable ether1
  delay 2
  /interface pppoe-client enable pppoe-out1
}
}

CZFan · Thu Jun 07, 2018 6:05 pm

Did you get anything back from MT Support?

mTwUser · Fri Jun 08, 2018 10:17 am

As I discussed with support:
The problem is an old PPPoE + UDP kernel bug in Linux, the problem will get fixed with RouterOS v7 since it's using a new kernel.

Sadly still no ETA on v7.

Just expirienced the problem again with L2TP over PPPoE, sometimes recreating the Interface doesn't even work.

"Will be fixed in v7, can't change the fundamentals in v6"

Asked about v7 release because you know, you can always try

but yeah, no ETA

PPoE + SIP Issue/Bug [Ticket#2018031922003491] [SOLVED]

PPoE + SIP Issue/Bug [Ticket#2018031922003491]

Re: PPoE + SIP Issue/Bug [Ticket#2018031922003491]

Re: PPoE + SIP Issue/Bug [Ticket#2018031922003491]

Re: PPoE + SIP Issue/Bug [Ticket#2018031922003491]

Re: PPoE + SIP Issue/Bug [Ticket#2018031922003491]

Re: PPoE + SIP Issue/Bug [Ticket#2018031922003491] [SOLVED]

Re: PPoE + SIP Issue/Bug [Ticket#2018031922003491]

Re: PPoE + SIP Issue/Bug [Ticket#2018031922003491]

Who is online