PCC issue/bug with 3 WAN connections

All according Wiki example but with 3 lines:
Three ADSL lines as WAN connections. WAN1(wlan1), WAN2(wlan2) and WAN3(wlan3)

For problems with authentication on different servers (f.i. banking, web mail) I use src address as classifier only.
Previous this worked with 2 WAN connections for some months with little problem reports from users.
Since one week third line.

add action=mark-connection chain=prerouting comment=“PCC Both 3/0” disabled=no dst-address-type=!local in-interface=Local new-connection-mark=wlan1_conn
passthrough=yes per-connection-classifier=src-address:3/0
add action=mark-connection chain=prerouting comment=“PCC Both 3/1” disabled=no dst-address-type=!local in-interface=Local new-connection-mark=wlan2_conn
passthrough=yes per-connection-classifier=src-address:3/1
add action=mark-connection chain=prerouting comment=“PCC Both 3/2” disabled=no dst-address-type=!local in-interface=Local new-connection-mark=wlan3_conn
passthrough=yes per-connection-classifier=src-address:3/2
add action=mark-routing chain=prerouting comment=“” connection-mark=wlan1_conn disabled=no in-interface=Local new-routing-mark=to_wlan1 passthrough=no
add action=mark-routing chain=prerouting comment=“” connection-mark=wlan2_conn disabled=no in-interface=Local new-routing-mark=to_wlan2 passthrough=no
add action=mark-routing chain=prerouting comment=“” connection-mark=wlan3_conn disabled=no in-interface=Local new-routing-mark=to_wlan3 passthrough=no

I have one test Pc setup and yes, on this impossible to log in to hotmail but all other normal browsing and downloading no problem whatsoever.
When I make a mangle filter to force this PC to get one routing mark to force it down one of the WAN connections, the problem dissapeares! It doesn’t matter which routing mark I give it. All three work fine.

add action=mark-connection chain=prerouting comment=“Temp test 3-monitor PC!” disabled=yes in-interface=Local new-connection-mark=Pref_adsl_route
passthrough=yes src-address=172.25.48.7
add action=mark-routing chain=prerouting comment=“Temp test 3-monitor PC!” connection-mark=Pref_adsl_route disabled=no in-interface=Local new-routing-mark=
to_wlan3 passthrough=no

(firs filter is disabled yes/no upon my wish for testing)

Everytime I traceroute from that PC to see which route it takes.
Every session I restart my browser.

Since routing decision is a result of routing mark in mangle the only difference is in the use of PCC.
If routing mark is assigned by direct filter (see last rules) then no problem. If routing mark is assigned by PCC 3/2 process then browsing is ok but log in to hotmail is no longer possible.

The last direct routing filter rules are placed in front of the PCC filters. I can now easily over and over repeat the issue by just switching this forced filter on or off. (On=No problem, OFF=No authentication)

PCC classifier classifies on src address only so connections are grouped on that basis only and all connections with same src IP get same gateway.
What happens is that after hitting the “login” button on the already secure (=https) website the page takes for ever to load and after a while a timeout it generated by the browser. I tried both Ie and Firefox but same issue.

I also don’t understand that another PC performing the same action (but different srce addresss) does not have the same issue??
I have to test this second PC by giving it same IP as the first PC (while I shut that one down…!) and see if the problem again can be reproduced. (There is no other way to make certain IP go through the same PCC filter as the other PC. The process uses the src IP but how that is done I don’t know. So I can not force a certain IP to be filtered by one particular PCC filter out of the three.)

This issue has kept me busy now for 2 days and I am forced to setup manually route all my client (60+) through one or the other WAN connection. This is what PCC should do for me.

Just tested same issue on other PC. Gave that PC same IP address as previous one with issue and now that new PC has the issue.

I enable the ´force routing mark for that IP´ mangle filter and the problem is gone again…

It looks right. Hate to say it, but reboot the router. Then contact support via email.

Ok, finally after three days brain storming found the problem..
For those that have the interest I explain.

My rb1000 is also performing QoS on its traffic in the forward chain. For this some 30 rules are in the mangle, they come after the some 20 prerouting filters.
Meaning, the lowest number rules (winbox) are usually out of my visible window and since my issue had all to do with the prerouting chain not worth looking at for this problem…

BUT, since my 3 WAN’s are basically 3 PPoE’s interfaces and these PPoE’s make two automatically generated ´forward´ chain rules that change MSS value in the mangle, and that these rules are placed at the bottom (“last”) of the window I never really looked at them. All I noticed once where that they were not ´counting´ but hey, I have more rules that not always count…

Until I slowly started to remember that in a far history of my network I once had a tutor telling me about the importance of MTU and if it was wrong configured “strange” things started to happen I started to look at these automatically add rules with more interest.
I realize that MSS is closely related to MTU so in these rules could be my solution. BUT, they didn’t work! Or, they didn’t count!

It took me some hours to realize that these automatically generated rules didn’t count because they were add as “last” but my QoS filters already had filtered out all the traffic!

So, I moved these rules manually on top (“first”) of all my other ´forward chain´ rules and suddenly they started to count (off course) but, more important, my problems disappeared!

To avoid that each time a PPoE interface goes down and in coming back put these filters again “last” I copied them and made them static.

The system runs now fine since last Saturday without any more problems… what an relief!


P.S.
Extra complication was I use two different ISP’s adsl and first only two PPoE while the other was just routed towards the adsl router.

This topic can be considered as closed but I hope someone will learn from it. At least I did!