Community discussions

MikroTik App
 
sflynn
just joined
Topic Author
Posts: 12
Joined: Sun Jan 13, 2008 4:52 am

Complete PPPoE Router Failure -- Looking for ideas...

Wed Jan 16, 2008 7:51 pm

~~ Problem ~~
We just implemented a dual-core / 1gb RAM rack mount router with ROS v2.4.48, Level 6 license. Because I don't know of a way to test PPPoE loading (if anyone has any ideas on this one please let me know), I could only test with my laptop to make sure the configs were working for PPPoE authentication. The failure point is this. When the router was installed into a production network, with 1000+ users (< 1500 users), the PPPoE server showed that clients were authenticating on the PPPoE server, radius showed authentication sessions, and we had 9mb of traffic on a 30mb circuit. That next morning our help desk was SLAMMED with user calls saying they couldn't get on the internet.

I checked the outside route servers to make sure our BGP routing was operational, then checked network OSPF routing was operational. All routers in the network were seeing the OSPF routes and we were able to ping anything we tested, even from a Canada route server via the public internet. After speaking with our help desk guys (i'm an engineer) they said that some customers were calling in saying they could get web sites, just extremely SLOW!!!! When I checked the IPs that were issued to the customer complaining of slow speeds, I was able to see a little bit of traffic (56k speeds) from their PPPoE interface.

Network Setup (Short version):
LAN (Eth 2 -- PPPoE interface) > POP site (Mikrotik Router [Eth 1 -- WAN Interface]) > 30MB fiber to NOC > Cisco Core Routers > Internet BGP feeds
PPPoE auth is done via radius which sits at the NOC on Linux servers.

/------------------------------------/

From what I can tell, it appears that the PPPoE server is having trouble passing traffic once it gets to a certain user load. The part that I don't understand is that OSPF is seeing everything on the network, BGP is seeing the IP routes, and the internet (tested from Canada) is seeing and can ping the end user device... but yet they can't get out. There are NO firewall / filter rules setup on the ROS system. Everything is Public IPs, except our Mgmt WAN layer, which is on an isolated VLAN.

/------------------------------------/

Before I pulled the router back out of the network, I logged into it and shut down the PPPoE server, waited for my radius server to clear all the connections, and then re-enabled the PPPoE server. As soon as I re-enabled it, my CPU went to 100% (normal for initial PPPoE requests) and my router seemed to "hang". I went back to the PPPoE server to disable it and it now does not show a PPPoE server. I rebooted the hardware and it did the same thing (3 times). I then went to the site and plugged directly into the switch to see if I could emulate the problems. YUP! same problem for me directly connected to the switch and then directly connected to the router LAN port.

/------------------------------------/

Any help would be appreciated. IF i can get this solution working it will save us 20k on capital expense per site, otherwise I am looking at using Cisco 7200 hardware at the POP sites for the PPPoE servers. Most sites have 1000+ users.

/------------------------------------/

If you would like to see a config of the router please let me know. Thanks for the help.
 
sflynn
just joined
Topic Author
Posts: 12
Joined: Sun Jan 13, 2008 4:52 am

Re: Complete PPPoE Router Failure -- Looking for ideas...

Wed Jan 16, 2008 8:35 pm

Here is the config so that you can see it.
/--------------------------------------/

# Jan 16, 2008 12:31pm by RouterOS 2.9.48
# software id = WZER-R9T
#
/ interface ethernet 
set E1-WAN name="E1-WAN" mtu=1500 mac-address=00:60:E0:42:B0:81 arp=enabled disable-running-check=yes auto-negotiation=yes \
    full-duplex=yes cable-settings=default speed=100Mbps comment="" disabled=no 
set E2-LAN name="E2-LAN" mtu=1500 mac-address=00:60:E0:42:B0:82 arp=enabled disable-running-check=yes auto-negotiation=yes \
    full-duplex=yes cable-settings=default speed=100Mbps comment="" disabled=no 
set E3-MGMT name="E3-MGMT" mtu=1500 mac-address=00:60:E0:42:B0:83 arp=enabled disable-running-check=yes auto-negotiation=yes \
    full-duplex=yes cable-settings=default speed=100Mbps comment="" disabled=no 

/ interface pppoe-server server 
add service-name="pppoe" interface=E2-LAN max-mtu=1480 max-mru=1480 authentication=pap,chap,mschap1,mschap2 \
    keepalive-timeout=10 one-session-per-host=yes max-sessions=0 default-profile=pppoe disabled=no 

/ ip pool 
add name="pool1" ranges=70.xx.116.1-70.xx.116.254 next-pool=pool2 
add name="pool2" ranges=70.xx.117.1-70.xx.117.254 next-pool=pool3 
add name="pool3" ranges=70.xx.118.1-70.xx.118.254 next-pool=pool4 
add name="pool4" ranges=70.xx.119.1-70.xx.119.254 next-pool=pool5 
add name="pool5" ranges=70.xx.126.1-70.xx.126.254 next-pool=pool6 
add name="pool6" ranges=70.xx.127.1-70.xx.127.254 

/ ip dns 
set primary-dns=64.xx.xx.138 secondary-dns=64.xx.xx.139 allow-remote-requests=no cache-size=2048KiB cache-max-ttl=1w 

/ ip address 
add address=64.xx.xx.33/27 network=64.xx.xx.32 broadcast=64.xx.xx.63 interface=E2-LAN comment="Primary LAN (PPPoE) Address" disabled=no 
add address=10.1.56.1/21 network=10.1.56.0 broadcast=10.1.63.255 interface=E2-LAN comment="" disabled=no 
add address=10.1.32.1/21 network=10.1.32.0 broadcast=10.1.39.255 interface=E2-LAN comment="" disabled=no 
add address=10.0.112.1/21 network=10.0.112.0 broadcast=10.0.119.255 interface=E2-LAN comment="" disabled=no 
add address=10.1.24.1/21 network=10.1.24.0 broadcast=10.1.31.255 interface=E2-LAN comment="" disabled=no 
add address=10.0.16.1/21 network=10.0.16.0 broadcast=10.0.23.255 interface=E2-LAN comment="" disabled=no 
add address=10.0.192.1/21 network=10.0.192.0 broadcast=10.0.199.255 interface=E2-LAN comment="" disabled=no 
add address=172.16.10.10/29 network=172.16.10.8 broadcast=172.16.10.15 interface=E1-WAN comment="WAN Uplink \(E1-WAN\)" \
    disabled=no 
add address=192.168.254.200/24 network=192.168.254.0 broadcast=192.168.254.255 interface=E3-MGMT comment="" disabled=no 


/ ip neighbor discovery 
set E1-WAN discover=yes 
set E2-LAN discover=yes 
set E3-MGMT discover=yes 

/ ip route 
add dst-address=0.0.0.0/0 gateway=172.16.10.9 check-gateway=ping distance=1 scope=255 target-scope=10 comment="" disabled=no 

/ ip firewall service-port 
set ftp ports=21 disabled=no 
set tftp ports=69 disabled=no 
set irc ports=6667 disabled=no 
set h323 disabled=yes 
set quake3 disabled=no 
set gre disabled=yes 
set pptp disabled=yes 

/ ip firewall connection tracking 
set enabled=yes tcp-syn-sent-timeout=5s tcp-syn-received-timeout=5s tcp-established-timeout=1d tcp-fin-wait-timeout=10s \
    tcp-close-wait-timeout=10s tcp-last-ack-timeout=10s tcp-time-wait-timeout=10s tcp-close-timeout=10s udp-timeout=10s \
    udp-stream-timeout=3m icmp-timeout=10s generic-timeout=10m tcp-syncookie=no 

/ system logging 
add topics=info prefix="" action=memory disabled=no 
add topics=error prefix="" action=memory disabled=no 
add topics=warning prefix="" action=memory disabled=no 
add topics=critical prefix="" action=echo disabled=no 
add topics=ospf prefix="" action=memory disabled=yes 
add topics=pppoe prefix="" action=memory disabled=yes 
add topics=ppp prefix="" action=memory disabled=yes 

/ system logging action 
set memory name="memory" target=memory memory-lines=100 memory-stop-on-full=no 
set disk name="disk" target=disk disk-lines=100 disk-stop-on-full=no 
set echo name="echo" target=echo remember=yes 
set remote name="remote" target=remote remote=64.xx.xx.253:514 

/ system upgrade mirror 
set enabled=no primary-server=0.0.0.0 secondary-server=0.0.0.0 check-interval=1d user="" 

/ system clock manual 
set time-zone=+00:00 dst-delta=+00:00 dst-start="jan/01/1970 00:00:00" dst-end="jan/01/1970 00:00:00" 

/ system watchdog 
set reboot-on-failure=yes watch-address=none watchdog-timer=yes no-ping-delay=5m automatic-supout=yes auto-send-supout=no 

/ system console 
add port=serial0 term="" disabled=no 
set FIXME term="linux" disabled=no 
set FIXME term="linux" disabled=no 
set FIXME term="linux" disabled=no 
set FIXME term="linux" disabled=no 
set FIXME term="linux" disabled=no 
set FIXME term="linux" disabled=no 
set FIXME term="linux" disabled=no 
set FIXME term="linux" disabled=no 

/ system console screen 
set line-count=25 

/ system identity 
set name="SITE--RTA" 

/ system note 
set show-at-login=yes note="" 

/ system health 
set state-after-reboot=enabled 

/ system routerboard bios 
set 

/ system ntp server 
set enabled=no broadcast=no multicast=no manycast=yes 

/ system ntp client 
set enabled=no mode=unicast primary-ntp=0.0.0.0 secondary-ntp=0.0.0.0 

/ port 
set serial0 name="serial0" baud-rate=9600 data-bits=8 parity=none stop-bits=1 flow-control=hardware 
set serial1 name="serial1" baud-rate=9600 data-bits=8 parity=none stop-bits=1 flow-control=hardware 

/ ppp profile 
set default name="default" use-compression=default use-vj-compression=default use-encryption=default only-one=default \
    change-tcp-mss=yes comment="" 
add name="pppoe" local-address=64.xx.xx.33 remote-address=pool1 use-compression=yes use-vj-compression=yes \
    use-encryption=default only-one=yes change-tcp-mss=default dns-server=64.xx.xx.138,64.xx.xx.139 comment="" 
set default-encryption name="default-encryption" use-compression=default use-vj-compression=default use-encryption=yes \
    only-one=default change-tcp-mss=yes comment="" 

/ ppp aaa 
set use-radius=yes accounting=yes interim-update=0s 

/ queue type 
set default name="default" kind=pfifo pfifo-limit=50 
set ethernet-default name="ethernet-default" kind=pfifo pfifo-limit=50 
set wireless-default name="wireless-default" kind=sfq sfq-perturb=5 sfq-allot=1514 
set synchronous-default name="synchronous-default" kind=red red-limit=60 red-min-threshold=10 red-max-threshold=50 red-burst=20 \
    red-avg-packet=1000 
set hotspot-default name="hotspot-default" kind=sfq sfq-perturb=5 sfq-allot=1514 
add name="default-small" kind=pfifo pfifo-limit=10 

/ queue interface 
set E1-WAN queue=ethernet-default 
set E2-LAN queue=ethernet-default 
set E3-MGMT queue=ethernet-default 

/ user 
add name="admin" group=full address=0.0.0.0/0 comment="system default user" disabled=no 

/ user group 
add name="read" policy=local,telnet,ssh,reboot,read,test,winbox,password,web,sniff,!ftp,!write,!policy 
add name="write" policy=local,telnet,ssh,reboot,read,write,test,winbox,password,web,sniff,!ftp,!policy 
add name="full" policy=local,telnet,ssh,ftp,reboot,read,write,policy,test,winbox,password,web,sniff 

/ user aaa 
set use-radius=no accounting=yes interim-update=0s default-group=read 

/ radius 
add service=ppp called-id="" domain="" address=64.xx.xx.132 secret="^Radius$" authentication-port=1645 \
    accounting-port=1646 timeout=300ms accounting-backup=no realm="" comment="" disabled=no 

/ radius incoming 
set accept=no port=1700 

/ driver 

/ snmp 
set enabled=yes contact="noc@domain.com" location="Site Location" 

/ snmp community 
add name="SNMP-ReadString" address=0.0.0.0/0 read-access=yes 

/ tool bandwidth-server 
set enabled=yes authenticate=yes allocate-udp-ports-from=2000 max-sessions=10 

/ tool mac-server ping 
set enabled=yes 

/ tool e-mail 
set server=0.0.0.0 from="<>" 

/ tool sniffer 
set interface=all only-headers=no memory-limit=10 file-name="" file-limit=10 streaming-enabled=no streaming-server=0.0.0.0 \
    filter-stream=yes filter-protocol=ip-only filter-address1=0.0.0.0/0:0-65535 filter-address2=0.0.0.0/0:0-65535 

/ tool graphing 
set store-every=24hours 

/ tool graphing queue 
add simple-queue=all allow-address=0.0.0.0/0 store-on-disk=yes allow-target=yes disabled=no 

/ tool graphing resource 
add allow-address=0.0.0.0/0 store-on-disk=yes disabled=no 

/ tool graphing interface 
add interface=E1-WAN allow-address=0.0.0.0/0 store-on-disk=yes disabled=no 

/ routing ospf 
set router-id=172.16.10.10 distribute-default=never redistribute-connected=as-type-2 redistribute-static=as-type-1 \
    redistribute-rip=no redistribute-bgp=no metric-default=1 metric-connected=20 metric-static=20 metric-rip=20 metric-bgp=20 

/ routing ospf area 
set backbone area-id=0.0.0.0 type=default translator-role=translate-candidate authentication=none disabled=no 
add name="area1" area-id=0.0.0.1 type=stub translator-role=translate-always authentication=none summary=no default-cost=10 \
    disabled=no 

/ routing ospf network 
add network=172.16.10.8/29 area=backbone disabled=no 

/ routing bgp instance 
set default name="default" as=65530 router-id=0.0.0.0 redistribute-connected=no redistribute-static=no redistribute-rip=no \
    redistribute-ospf=no redistribute-other-bgp=no out-filter="" client-to-client-reflection=yes ignore-as-path-len=no comment="" \
    disabled=no 

/ routing rip 
set distribute-default=never redistribute-static=no redistribute-connected=no redistribute-ospf=no redistribute-bgp=no \
    metric-default=1 metric-static=1 metric-connected=1 metric-ospf=1 metric-bgp=1 update-timer=30s timeout-timer=3m \
    garbage-timer=2m 

/ routing rip interface 
add interface=all receive=v2 send=v2 authentication=none authentication-key="" key-chain="" in-filter="" out-filter="" \
    disabled=no 

/--------------------------------------/
 
bbmj
just joined
Posts: 5
Joined: Sat May 27, 2006 12:37 pm

Re: Complete PPPoE Router Failure -- Looking for ideas...

Thu Jan 17, 2008 8:01 am

everything imho looks fine. only things i would be doing different is to remove all the addresses from from e2_lan, not needed in order to make ppp work, and to adjust my mtu to 1420.
 
Gerard
Trainer
Trainer
Posts: 336
Joined: Wed Apr 26, 2006 4:21 am
Location: Kentucky, USA
Contact:

Re: Complete PPPoE Router Failure -- Looking for ideas...

Thu Jan 17, 2008 8:38 am

Since you are not using any firewall rules disable connection tracking. It will save a good amount of memory and cpu.

I would also turn compression off for your pppoe clients to save extra processing power.

-Gerard
 
sflynn
just joined
Topic Author
Posts: 12
Joined: Sun Jan 13, 2008 4:52 am

Re: Complete PPPoE Router Failure -- Looking for ideas...

Fri Jan 18, 2008 8:01 pm

everything imho looks fine. only things i would be doing different is to remove all the addresses from from e2_lan, not needed in order to make ppp work, and to adjust my mtu to 1420.
bbmj ... just for reference... the ip address's on the LAN interface ARE required. They provide our management layer to the CPE devices. They have nothing to do with the PPPoE termination.
 
User avatar
gmsmstr
Trainer
Trainer
Posts: 982
Joined: Fri Jun 04, 2004 2:22 am
Location: St. Louis, MO
Contact:

Re: Complete PPPoE Router Failure -- Looking for ideas...

Fri Jan 18, 2008 8:19 pm

Connection tracking is a big thing, if you turn it off, you should save quite a bit of CPU time.

You will also need v3 for dual core support. Good thought to try to upgrade to v3, and turn off connection tracking. See how that does.

Make sure you turn ON multi-processor support in v3 though, its not on by default.
 
EgyCom
Member Candidate
Member Candidate
Posts: 123
Joined: Thu May 31, 2007 9:47 pm

Re: Complete PPPoE Router Failure -- Looking for ideas...

Fri Jan 18, 2008 10:07 pm

everything imho looks fine. only things i would be doing different is to remove all the addresses from from e2_lan, not needed in order to make ppp work, and to adjust my mtu to 1420.
bbmj ... just for reference... the ip address's on the LAN interface ARE required. They provide our management layer to the CPE devices. They have nothing to do with the PPPoE termination.

i agree with bbmj i wouldn't put (64.xx.xx.33) on interface E2-LAN and if this IP is required, i will change local address in PPPoE profile to any other free IP.

and i will disable change-tcp-mss since it create two dynamic mangle policy to every user.
 
rkorolev
newbie
Posts: 43
Joined: Tue Oct 23, 2007 1:49 pm

Re: Complete PPPoE Router Failure -- Looking for ideas...

Mon Jan 21, 2008 9:01 pm

You will also need v3 for dual core support. Good thought to try to upgrade to v3, and turn off connection tracking. See how that does.

Make sure you turn ON multi-processor support in v3 though, its not on by default.
Multi-processor on ROS v3 is not working: ROS just stops responding to PPPoE then hangs or reboots:
20:57:17 system,error,critical router was rebooted without proper shutdown

Under a not so heavy load (<200 users) even with multi-cpu turned off there's a problem with dropped packets (5% or more).

Pity, but 3.0 looks like not really a release, but just another not so stable beta ;(
 
User avatar
gmsmstr
Trainer
Trainer
Posts: 982
Joined: Fri Jun 04, 2004 2:22 am
Location: St. Louis, MO
Contact:

Re: Complete PPPoE Router Failure -- Looking for ideas...

Mon Jan 21, 2008 9:14 pm

Maybe this is all with PPPoE. Again, we have clients running 300mbit routing ,not PPPoE using 3.0 and they save LOTS of CPU time by having multi-cpu on!

I understand you are saying its not working, but I don't know if that is the cause of your issues. I have not had issues with it yet! But of course, most of the clients are not running 1500+ pppoe sessions. Just routing.
 
sflynn
just joined
Topic Author
Posts: 12
Joined: Sun Jan 13, 2008 4:52 am

Re: Complete PPPoE Router Failure -- Looking for ideas...

Wed Jan 23, 2008 5:17 am

Well... just to post a reply to this topic. Here is what i've decided to do, with permission of my CTO.

~~ Hardware ~~
PowerRouter 732
• Dual-Core CPU (3.0GHz)
• 2gb DDR2 Memory
• Mikrotik ROS v3.0 (multi-cpu enabled)

~~ Interface Setup ~~
E1 --> VLAN WAN LINK w/Public IP
E2 --> PPPoE Interface w/VLANs
E3 --> PPPoE Interface w/VLANs
E4 --> PPPoE Interface w/VLANs

Because I can't justify the 10+K that it will take to purchase load testing equipment for my lab, here is what i'm going to test, one step at a time...

• OSPF -- peering with two Cisco 7206-G1 routers and Cisco 3550 switches (routing mode)
(note: One 7206-G1 router is currently terminating 3000+ PPPoE sessions w/OSPF & Full BGP from two Tier1 internet uplinks)
• Take smallest site (50+ PPPoE sessions) and move the VLAN to the second interface (E2) ... watch load / resources
• Take next smallest site (150+ PPPoE sessions) and move the VLAN to the second interface (E2) ... watch load / resources
• Take next site (400+ PPPoE sessions) and move the VLAN to the second interface (E2) ... watch load / resources
--- Depending on how this is working will determine if I continue. With these settings I will have approx 600+ PPPoE sessions terminated on the router.
--- This will test two things...
1) 500+ users (already accomplished with v2.9.48, then started seeing problems after 500 users)
2) OSPF peering with Cisco equipment (previous problem in < v2.9.48)
• The big step... take the next site (1500+ PPPoE sessions) and move the VLAN to the third interface (E3) ... watch load/resources ... wait for failure.... keep waiting...
--- we are now looking at 2000+ PPPoE session on this device ...
• Take the next site ... (it just keeps going from there!)
 
User avatar
gmsmstr
Trainer
Trainer
Posts: 982
Joined: Fri Jun 04, 2004 2:22 am
Location: St. Louis, MO
Contact:

Re: Complete PPPoE Router Failure -- Looking for ideas...

Wed Jan 23, 2008 6:45 am

Let me know if you have a timetable, and I can see about being available to watch progress and provide some support if necessary.

I would try without connection tracking. Multi-CPU has shown a great performance increase. There was also some talk about removing the change TCP MSS value as it creates lots of rules etc.
 
bokili
Member Candidate
Member Candidate
Posts: 135
Joined: Wed Aug 16, 2006 8:52 pm

Re: Complete PPPoE Router Failure -- Looking for ideas...

Wed Jan 23, 2008 8:17 am

There was also some talk about removing the change TCP MSS value as it creates lots of rules etc.
This is not a solution. You will have a lot of problems by turning this off: Cannot open some websites, MSN messenger doesn't working etc.
Under a not so heavy load (<200 users) even with multi-cpu turned off there's a problem with dropped packets (5% or more).
Yes, this is a big problem for V3.
 
sflynn
just joined
Topic Author
Posts: 12
Joined: Sun Jan 13, 2008 4:52 am

Re: Complete PPPoE Router Failure -- Looking for ideas...

Wed Jan 23, 2008 10:50 am

As of this morning... testing has been completed. I will post details once i've had more sleep.
 
sflynn
just joined
Topic Author
Posts: 12
Joined: Sun Jan 13, 2008 4:52 am

Re: Complete PPPoE Router Failure -- Looking for ideas...

Wed Jan 23, 2008 7:54 pm

Well... We completed testing last night with PPPoE load. Using the steps listed above we moved over the traffic to 600+ users with CPU running an average of 20-23% utilization. We then moved over a big site (1000+ PPPoE session). The CPU was running good at 33% utilization, memory 90% free, ....

20min into it, approx 1600 PPPoE sessions, traffic throughput of 20-30mb (at 2:30AM that was good) ... the winbox and telnet dropped. Checked my rolling pings and I still had communication with my router interface, checked my SNMP monitoring system of switch interfaces and ALL traffic had flat-lined. ... ... oh boy... this is not good.

We were able to get back into the router via telnet and generate a supout file (took over 20min to build it) ... so now we are back to the testing board.

(supout has been sent to mikrotik for review)
 
rkorolev
newbie
Posts: 43
Joined: Tue Oct 23, 2007 1:49 pm

Re: Complete PPPoE Router Failure -- Looking for ideas...

Thu Jan 24, 2008 8:57 am

The only response you get from support will be "turn off multi-cpu".
And on the question, when this will be fixed:
"About multi-core - sorry, I can't give you any estimates."
 
sflynn
just joined
Topic Author
Posts: 12
Joined: Sun Jan 13, 2008 4:52 am

Re: Complete PPPoE Router Failure -- Looking for ideas...

Thu Jan 24, 2008 9:02 pm

The only response you get from support will be "turn off multi-cpu".
... That is the response we received...
"About multi-core - sorry, I can't give you any estimates."
... and yes, this was the response as well.


.......... Now on to further testing results ...........

The test results are in. We disabled the multi-cpu support on this device and loaded up the system. The test went very good! We were astonished at the results. We loaded up the system with 2,631 PPPoE sessions, 3,464 OSPF routes from two peer routers, while maintaining a 62% CPU load average for 30 minutes. The initial sessions were running for over 2 hours, with constant traffic running through the system.

View the attached graphs / screens shots for the proof!!!

ImageImage
ImageImage

----------------------------------------------------------
I want to throw out a BIG thank you to Dennis Burgess with LinkTechs. LinkTechs is the hardware vendor for the PowerRouter 732. He worked with us during this testing process in our network maintenance window. I highly recommend this router appliance as a 1u, stable solution. Check them out for your next router needs. http://www.linktechs.net

Now if we could just get Mikrotik to fix the SMP problems, this would be an excellent appliance for termination PPPoE sessions with a load-balanced cluster solution.
----------------------------------------------------------
 
rkorolev
newbie
Posts: 43
Joined: Tue Oct 23, 2007 1:49 pm

Re: Complete PPPoE Router Failure -- Looking for ideas...

Fri Jan 25, 2008 8:16 am

We loaded up the system with 2,631 PPPoE sessions, 3,464 OSPF routes from two peer routers, while maintaining a 62% CPU load average for 30 minutes.
Lucky man you are... We have a much heavier load, 'cause we need shaping and filtering.
Image
 
bokili
Member Candidate
Member Candidate
Posts: 135
Joined: Wed Aug 16, 2006 8:52 pm

Re: Complete PPPoE Router Failure -- Looking for ideas...

Fri Jan 25, 2008 9:30 am

Well... We completed testing last night with PPPoE load. Using the steps listed above we moved over the traffic to 600+ users with CPU running an average of 20-23% utilization. We then moved over a big site (1000+ PPPoE session). The CPU was running good at 33% utilization, memory 90% free, ....

20min into it, approx 1600 PPPoE sessions, traffic throughput of 20-30mb (at 2:30AM that was good) ... the winbox and telnet dropped. Checked my rolling pings and I still had communication with my router interface, checked my SNMP monitoring system of switch interfaces and ALL traffic had flat-lined. ... ... oh boy... this is not good.
I don't buy this. You should:

Enable: Change MSS option in ppp profiles. As you probably know, there is a lot of problems with ppp if you don't enable this.

You also have just of 20-30 M of traffic for 1600 pppoe connections. Try to get at least 100M so we can see if you will run this stable : )

After this you should post your CPU usage. Also, please post statistics of LOST packets if you use V3!
 
User avatar
gmsmstr
Trainer
Trainer
Posts: 982
Joined: Fri Jun 04, 2004 2:22 am
Location: St. Louis, MO
Contact:

Re: Complete PPPoE Router Failure -- Looking for ideas...

Fri Jan 25, 2008 5:25 pm

We pinged a good number of customers from remote off-net IPs and never lost more than a single packet. The loss was well under 1%. Nothing like what was reported of 15+% ..
 
bokili
Member Candidate
Member Candidate
Posts: 135
Joined: Wed Aug 16, 2006 8:52 pm

Re: Complete PPPoE Router Failure -- Looking for ideas...

Sat Jan 26, 2008 3:26 am

We pinged a good number of customers from remote off-net IPs and never lost more than a single packet. The loss was well under 1%. Nothing like what was reported of 15+% ..
Which RouterOS version you are running? How many pppoe tunnels you had active at the time you test ping ?
 
User avatar
gmsmstr
Trainer
Trainer
Posts: 982
Joined: Fri Jun 04, 2004 2:22 am
Location: St. Louis, MO
Contact:

Re: Complete PPPoE Router Failure -- Looking for ideas...

Sat Jan 26, 2008 6:48 pm

v3 at the time. 2600 tunnels at once.
 
bokili
Member Candidate
Member Candidate
Posts: 135
Joined: Wed Aug 16, 2006 8:52 pm

Re: Complete PPPoE Router Failure -- Looking for ideas...

Sat Jan 26, 2008 6:52 pm

v3 at the time. 2600 tunnels at once.
What is the bandwidth? How you test ping ?

What is the hardware configuration ? Do you have enabled CHANGE MSS in profiles?

Do you NAT your internet or have public IP's? How many?
 
User avatar
gmsmstr
Trainer
Trainer
Posts: 982
Joined: Fri Jun 04, 2004 2:22 am
Location: St. Louis, MO
Contact:

Re: Complete PPPoE Router Failure -- Looking for ideas...

Sat Jan 26, 2008 8:29 pm

It was the middle of the night, about 30-35 meg total.. All routed, public IPs, 3500 or so OSPF Routes. PoweRouter 732 with only one core on, v.3. Change TCP MSS was off. Most of that info is above. No issues we could find by turnning TCP MSS off.

Ping tests were done from the Router, and from two remote locations pinging customer Public IPs on their routers.
 
bokili
Member Candidate
Member Candidate
Posts: 135
Joined: Wed Aug 16, 2006 8:52 pm

Re: Complete PPPoE Router Failure -- Looking for ideas...

Sat Jan 26, 2008 8:51 pm

about 30-35 meg total.. All routed, public IPs, 3500 or so OSPF Routes. PoweRouter 732 with only one core on, v.3. Change TCP MSS was off.
This explain it. But there is a lot of issues by not using Change TCP MSS. So, we cannot turn that option off.

Probably our problem with packet loss in V3 lies in Change TCP MSS. As I said we cannot turn it off because of various problems. And there is no problem with packet loss when using Change TCP MSS in 2.9.X .. So, there is bug or whatever in V3

Try to use it, and you will notice packet loss, even if you ping it manually. The best tool for checking is SMOKEPING: http://oss.oetiker.ch/smokeping/
 
User avatar
gmsmstr
Trainer
Trainer
Posts: 982
Joined: Fri Jun 04, 2004 2:22 am
Location: St. Louis, MO
Contact:

Re: Complete PPPoE Router Failure -- Looking for ideas...

Sun Jan 27, 2008 12:11 am

Its not my network, but I will ask. Once we get the SMP thing fixed, the CPU time will go way down! We had less than 40% with dual core, but the SMP bug showed up.
 
Ozelo
Member
Member
Posts: 338
Joined: Fri Jun 02, 2006 3:56 am

Re: Complete PPPoE Router Failure -- Looking for ideas...

Thu Jan 31, 2008 4:27 pm

Aye, thats make a huge difference. We didnt got the lab in time (theres no lab) :) but I believe youre right. Surely you may get a lot of pppoe tunnels without dynamic mangle rules for change tcp mss as it is 2 rules per connection plus no traffic shaping, no queue rules, nothing. On a specific scenario, this may be throublesome.

I would like to point out something:

Could you avoid dynamic tcp mss rules by: If you have a nice network without a mix of MTUs from 1300 to 1492? Perhaps few static rules grouping the change tcp mss thing? Guess you dont need tcp mss rules if there are only one MTU setting for all applications, no matter how many tunnels.

What to do about VPNs with DF flags inside PPPOE tunnels without change tcp mss or a strict MTU policy? Maybe its not the target.

Anyway, there is no doubt that (On the same setup with dynamic tcp mss on, mangle rules and QoS queue tree) have a HUGE difference between ROS 2.9.x and 3.x versions in terms of processing data.
 
rkorolev
newbie
Posts: 43
Joined: Tue Oct 23, 2007 1:49 pm

Re: Complete PPPoE Router Failure -- Looking for ideas...

Fri Feb 01, 2008 8:31 am

I got recomendation to replace dynamic change-mss to static rules from Mikrotik support and we now have these:
[admin@MT-0] /ip firewall mangle> print
Flags: X - disabled, I - invalid, D - dynamic
 0   chain=forward action=accept in-interface=ether1

 1   chain=forward action=accept out-interface=ether1

 2   chain=forward action=change-mss new-mss=1452 tcp-flags=syn protocol=tcp
     tcp-mss=1453-65535
0 and 1 should pass all packets via ethernet without modification, 3 should match other packets via ppp interfaces.
 
User avatar
macgaiver
Forum Guru
Forum Guru
Posts: 1764
Joined: Wed May 18, 2005 5:57 pm
Location: Sol III, Sol system, Sector 001, Alpha Quadrant

Re: Complete PPPoE Router Failure -- Looking for ideas...

Fri Feb 01, 2008 9:13 am

I got recomendation to replace dynamic change-mss to static rules from Mikrotik support and we now have these:
[admin@MT-0] /ip firewall mangle> print
Flags: X - disabled, I - invalid, D - dynamic
 0   chain=forward action=accept in-interface=ether1

 1   chain=forward action=accept out-interface=ether1

 2   chain=forward action=change-mss new-mss=1452 tcp-flags=syn protocol=tcp
     tcp-mss=1453-65535
0 and 1 should pass all packets via ethernet without modification, 3 should match other packets via ppp interfaces.

I recommend to use MRRU (PPP Multilink protocol) option and forget about Change-mms rules at all
 
bokili
Member Candidate
Member Candidate
Posts: 135
Joined: Wed Aug 16, 2006 8:52 pm

Re: Complete PPPoE Router Failure -- Looking for ideas...

Sun Feb 03, 2008 11:33 am

I recommend to use MRRU (PPP Multilink protocol) option and forget about Change-mms rules at all
Why you think that this is better?
 
hci
Long time Member
Long time Member
Posts: 674
Joined: Fri May 28, 2004 5:10 pm

Re: Complete PPPoE Router Failure -- Looking for ideas...

Mon Feb 04, 2008 7:37 am

This explain it. But there is a lot of issues by not using Change TCP MSS. So, we cannot turn that option off.
Been running Mikrotik as a PPPoE server for years. Terminating 1200+ PPPoE users between a couple different routers. Have always left "Change TCP MSS" option OFF and have had no issues.

Matt
 
jcremin
Member
Member
Posts: 360
Joined: Fri May 25, 2007 7:57 am

Re: Complete PPPoE Router Failure -- Looking for ideas...

Sun Feb 10, 2008 5:00 am

So do you do anything special regarding MTU issues? I had a huge problem when I first started using PPPoE where users could get to some sites but not others. Didn't seem to matter what I set on the PPPoE server, it didn't seem to help. I ended up manually setting ever user's MTU to 1400 on the client end and haven't had a problem since.

Any suggestions?
 
hci
Long time Member
Long time Member
Posts: 674
Joined: Fri May 28, 2004 5:10 pm

Re: Complete PPPoE Router Failure -- Looking for ideas...

Sun Feb 10, 2008 5:07 am

So do you do anything special regarding MTU issues? I had a huge problem when I first started using PPPoE where users could get to some sites but not others. Didn't seem to matter what I set on the PPPoE server, it didn't seem to help. I ended up manually setting ever user's MTU to 1400 on the client end and haven't had a problem since.

Any suggestions?
Do your users have NAT'ed IP's or public?

Matt
 
jcremin
Member
Member
Posts: 360
Joined: Fri May 25, 2007 7:57 am

Re: Complete PPPoE Router Failure -- Looking for ideas...

Sun Feb 10, 2008 6:01 pm

They are all Nat'd at the moment. Here's a simple breakdown:

/=Wired interface NOT using PPPoE works fine
ISP=====MY Router-=Wireless interface NOT using PPPoE works fine
\=Wireless interface using PPPoE only works for certain pages unless MTU is set to low 1400's.
 
hci
Long time Member
Long time Member
Posts: 674
Joined: Fri May 28, 2004 5:10 pm

Re: Complete PPPoE Router Failure -- Looking for ideas...

Sun Feb 10, 2008 8:15 pm

I wonder if its due to the NAT. Can you assign a PPPoE user a public IP and see they have issues? The way its supposed to work is if either end of the connection trys to send say a 1500 byte packet an ICMP packet is supposed to be sent saying its to big. Perhaps due to the NAT the ICMP packet does not get through.

I have never had these issues after having used PPPoE for years and now with close too 1500 users but I have always used public IP's. Have never used NAT.

Matt
 
jcremin
Member
Member
Posts: 360
Joined: Fri May 25, 2007 7:57 am

Re: Complete PPPoE Router Failure -- Looking for ideas...

Sun Feb 10, 2008 9:51 pm

I'll try that out. Most sites worked just fine but microsoft sites (microsoft.com, msn.com) and many banking sites wouldn't load at all until the MTU was lowered.

I'll try to report back with what I find out.

Thanks,
Joe
 
Ozelo
Member
Member
Posts: 338
Joined: Fri Jun 02, 2006 3:56 am

Re: Complete PPPoE Router Failure -- Looking for ideas...

Tue Feb 12, 2008 11:39 am

The only problem with PPPOE that I couldnt solve yet is about customers that also access VPNs using DF flag set inside a PPPOE connection.

Who is online

Users browsing this forum: Google [Bot] and 29 guests