Help with multiwan

Hello,
I would really appreciate some advice on why the following multiwan setup doesn’t work.
It is PCC, based on https://mum.mikrotik.com/presentations/US12/steve.pdf
By doesn’t work I mean that HTTP connections on the browser spin forever.
Confusingly enough though pings do work.

Here is the data for my setup:
The device is a CRS125-24G-1S and RouterOS level is 6.42.3
The device has 24 ports, ether1 up to ether22 are bridged under ‘bridge1’ (10.20.30.0)
and ‘ether23’ and ‘ether24’ are the interfaces to the WAN uplinks with networks 192.168.1.0 and 192.168.10.0
‘ether23’ and ‘ether24’ have static IP addresses.

‘ether23’ talks directly with a wireless antenna (whose peer is connected to the internet)
and I mark this as ‘noWire’ (and ‘via_noWire’ for the routing) in the mangle rules
while ‘ether24’ is connected to the LAN side of a Speedport 2i ADSL modem/router
and I mark this as ‘OTE’ (and ‘via_OTE’ for the routing).

The router is the DHCP server for the LAN (‘bridge1’) and instructs clients to use itself as the DNS server.
The router itself then uses 8.8.8.8 as the upstream DNS server. I configured it like this to take out of the picture
possible issues with name resolving because of the multiwan setup.


When I disable PCC and test each uplink individually everything works as expected.


MANGLE

chain=prerouting action=accept dst-address=192.168.1.0/24 in-interface=bridge1 log=no log-prefix="" 
chain=prerouting action=accept dst-address=192.168.10.0/24 in-interface=bridge1 log=no log-prefix="" 
chain=prerouting action=mark-connection new-connection-mark=noWire passthrough=yes connection-mark=no-mark in-interface=ether23 
chain=prerouting action=mark-connection new-connection-mark=OTE passthrough=yes connection-mark=no-mark in-interface=ether24 
chain=prerouting action=mark-connection new-connection-mark=noWire passthrough=yes dst-address-type=!local connection-mark=no-mark in-interface=bridge1 
      per-connection-classifier=src-address:2/0 log=no log-prefix="" 
chain=prerouting action=mark-connection new-connection-mark=OTE passthrough=yes dst-address-type=!local connection-mark=no-mark in-interface=bridge1 
      per-connection-classifier=src-address:2/1 log=no log-prefix="" 
chain=prerouting action=mark-routing new-routing-mark=via-noWire passthrough=yes connection-mark=noWire in-interface=bridge1 log=no log-prefix="" 
chain=prerouting action=mark-routing new-routing-mark=via-OTE passthrough=yes connection-mark=OTE in-interface=bridge1 log=no log-prefix="" 
chain=output action=mark-routing new-routing-mark=via-noWire passthrough=yes connection-mark=noWire log=no log-prefix="" 
chain=output action=mark-routing new-routing-mark=via-OTE passthrough=yes connection-mark=OTE log=no log-prefix=""

FILTER

chain=input action=drop connection-state=invalid log=no log-prefix="" 
chain=input action=accept protocol=icmp log=no log-prefix="" 
chain=input action=accept connection-state=established,related 
chain=input action=drop in-interface-list=iflist-WAN log=no log-prefix="" 
chain=forward action=drop connection-state=invalid in-interface-list=iflist-WAN log=no log-prefix="" 
chain=forward action=drop connection-state=new connection-nat-state=!dstnat in-interface-list=iflist-WAN log=no log-prefix="" 
chain=forward action=drop src-address-list=bogons in-interface-list=iflist-WAN log=no log-prefix="" 
chain=forward action=drop src-address-list=!myNetworks dst-address-list=!myNetworks log=yes log-prefix="THREAT:FORWARD_FOREIGN_TRAFFIC"

ROUTES

0.0.0.0/0                          192.168.1.1               1
0.0.0.0/0                          192.168.10.254            1
0.0.0.0/0                          192.168.1.1               1
0.0.0.0/0                          192.168.10.254            2
10.20.30.0/24      10.20.30.1      bridge1                   0
10.20.31.0/24      10.20.31.1      vlan10                    0
192.168.1.0/24     192.168.1.190   ether23                   0
192.168.10.0/24    192.168.10.15   ether24                   0

Although it doesn’t show here 2 of the 4 default routes have the ‘via-noWire’ and ‘via-OTE’ marks as they should.
Also, ‘vlan10’ is irrelevant to this discussion.

Can anyone share some knowledge for this setup?

The mangle & filter rules you’ve posted seem OK to me. Maybe there is something in the other settins, can you post the complete configuration (/export hide-sensitive)?

Sure I will but tomorrow when I’ll have access to the router again. Thanx!

Below is the full device setup. I think the configuration is OK but it would be great if you could confirm that.
So if the config is OK where should I look for the source of the problem?


# jun/25/2018 09:42:10 by RouterOS 6.42.3
# software id = C4HE-Y0RN
#
# model = CRS125-24G-1S
# serial number = 123456789012
/interface bridge
add admin-mac=E4:8D:8C:D1:63:38 auto-mac=no name=bridge1 protocol-mode=none
/interface ethernet
set [ find default-name=ether1 ] name=ether1-master
/interface vlan
add interface=bridge1 name=vlan10 vlan-id=10
/interface list
add name=iflist-WAN
/interface wireless security-profiles
set [ find default=yes ] supplicant-identity=MikroTik
/ip pool
add name=pool-lan ranges=10.20.30.10-10.20.30.254
/ip dhcp-server
add address-pool=pool-lan disabled=no interface=bridge1 lease-script=dhcp2dns lease-time=8h name=dhcpd-lan
/interface bridge port
add bridge=bridge1 interface=ether2
add bridge=bridge1 interface=ether3
add bridge=bridge1 interface=ether4
add bridge=bridge1 interface=ether5
add bridge=bridge1 interface=ether6
add bridge=bridge1 interface=ether7
add bridge=bridge1 interface=ether8
add bridge=bridge1 interface=ether9
add bridge=bridge1 interface=ether10
add bridge=bridge1 interface=ether11
add bridge=bridge1 interface=ether12
add bridge=bridge1 interface=ether13
add bridge=bridge1 interface=ether14
add bridge=bridge1 interface=ether15
add bridge=bridge1 interface=ether16
add bridge=bridge1 interface=ether17
add bridge=bridge1 interface=ether18
add bridge=bridge1 interface=ether19
add bridge=bridge1 interface=ether20
add bridge=bridge1 interface=ether21
add bridge=bridge1 interface=ether22
add bridge=bridge1 interface=ether1-master
add bridge=bridge1 interface=vlan10
/interface list member
add interface=ether23 list=iflist-WAN
add interface=ether24 list=iflist-WAN
/ip address
add address=10.20.30.1/24 comment=defconf interface=bridge1 network=10.20.30.0
add address=10.20.31.1/24 comment="For Gitlab Docker container" interface=vlan10 network=10.20.31.0
add address=192.168.1.190/24 comment="interface to noWire " interface=ether23 network=192.168.1.0
add address=192.168.10.15/24 comment="interface to OTE" interface=ether24 network=192.168.10.0
/ip dhcp-client
add dhcp-options=clientid,hostname interface=ether23
add dhcp-options=clientid,hostname interface=ether24
/ip dhcp-server network
add address=10.20.30.0/24 dns-server=10.20.30.1 domain=my.domain-name.com gateway=10.20.30.1 netmask=24
/ip dns
set allow-remote-requests=yes servers=8.8.8.8
/ip dns static

/ip firewall address-list
add address=0.0.0.0/8 comment="Self-Identification [RFC 3330]" list=bogons
add address=127.0.0.0/8 comment="Loopback [RFC 3330]" list=bogons
add address=169.254.0.0/16 comment="Link Local [RFC 3330]" list=bogons
add address=172.16.0.0/12 comment="Private[RFC 1918] - CLASS B # Check if you need this subnet before enable it" disabled=yes list=bogons
add address=192.0.2.0/24 comment="Reserved - IANA - TestNet1" list=bogons
add address=192.168.0.0/16 comment="Private[RFC 1918] - CLASS C # Check if you need this subnet before enable it" disabled=yes list=bogons
add address=192.88.99.0/24 comment="6to4 Relay Anycast [RFC 3068]" list=bogons
add address=198.18.0.0/15 comment="NIDB Testing" list=bogons
add address=198.51.100.0/24 comment="Reserved - IANA - TestNet2" list=bogons
add address=203.0.113.0/24 comment="Reserved - IANA - TestNet3" list=bogons
add address=224.0.0.0/4 comment="MC, Class D, IANA # Check if you need this subnet before enable it" disabled=yes list=bogons
add address=100.64.0.0/10 comment=RFC6890 list=bogons
add address=192.0.0.0/24 comment=RFC6890 list=bogons
add address=240.0.0.0/4 comment=RFC6890 list=bogons
add address=10.20.0.0/16 list=myNetworks
/ip firewall filter
add action=drop chain=input comment="Drop invalid connections" connection-state=invalid
add action=accept chain=input comment="defconf: accept ICMP" disabled=yes protocol=icmp
add action=accept chain=input comment="defconf: accept established,related" connection-state=established,related
add action=drop chain=input comment="defconf: drop all from WAN" in-interface-list=iflist-WAN
add action=drop chain=forward comment="defconf: drop invalid" connection-state=invalid in-interface-list=iflist-WAN
add action=drop chain=forward comment="defconf:  drop all from WAN not DSTNATed" connection-nat-state=!dstnat connection-state=new in-interface-list=iflist-WAN
add action=drop chain=forward comment="Drop all packets from public internet which should not exist in public network" in-interface-list=iflist-WAN src-address-list=bogons
add action=drop chain=forward comment="Drop forwarding of foreign traffic (when both src and destination are valid public IP addresses)." disabled=yes dst-address-list=!myNetworks log=yes log-prefix=\
    THREAT:FORWARD_FOREIGN_TRAFFIC src-address-list=!myNetworks
/ip firewall mangle
add action=accept chain=prerouting dst-address=192.168.1.0/24 in-interface=bridge1
add action=accept chain=prerouting dst-address=192.168.10.0/24 in-interface=bridge1
add action=mark-connection chain=prerouting connection-mark=no-mark in-interface=ether23 new-connection-mark=noWire passthrough=yes
add action=mark-connection chain=prerouting connection-mark=no-mark in-interface=ether24 new-connection-mark=OTE passthrough=yes
add action=mark-connection chain=prerouting connection-mark=no-mark dst-address-type=!local in-interface=bridge1 new-connection-mark=noWire passthrough=yes per-connection-classifier=src-address:2/0
add action=mark-connection chain=prerouting connection-mark=no-mark dst-address-type=!local in-interface=bridge1 new-connection-mark=OTE passthrough=yes per-connection-classifier=src-address:2/1
add action=mark-routing chain=prerouting connection-mark=noWire in-interface=bridge1 new-routing-mark=via-noWire passthrough=yes
add action=mark-routing chain=prerouting connection-mark=OTE in-interface=bridge1 new-routing-mark=via-OTE passthrough=yes
add action=mark-routing chain=output connection-mark=noWire new-routing-mark=via-noWire passthrough=yes
add action=mark-routing chain=output connection-mark=OTE new-routing-mark=via-OTE passthrough=yes
/ip firewall nat
add action=masquerade chain=srcnat out-interface=ether23
add action=masquerade chain=srcnat out-interface=ether24
/ip route
add check-gateway=ping disabled=yes distance=1 gateway=192.168.1.1 routing-mark=via-noWire
add check-gateway=ping disabled=yes distance=1 gateway=192.168.10.254 routing-mark=via-OTE
add distance=1 gateway=192.168.1.1
add distance=2 gateway=192.168.10.254
/ip service
set telnet disabled=yes
set ftp disabled=yes
set www address=10.20.30.0/24
set ssh disabled=yes
set api disabled=yes
set winbox address=10.20.30.0/24
set api-ssl disabled=yes
/lcd
set backlight-timeout=never
/lcd interface
add interface=bridge1
/lcd interface pages
set 2 interfaces=sfp1
/system clock
set time-zone-name=Europe/Athens
/system identity
set name=router-name
/system ntp client
set primary-ntp=195.167.30.249 secondary-ntp=193.239.214.226
/system ntp server
set enabled=yes
/system routerboard settings
set silent-boot=no
/system script
add name=dhcp2dns owner=admin policy=read,write source=":local DHCPtag\r\
    \n:set DHCPtag \"#DHCP\"\r\
    \n\r\
    \n:if ( [ :len \$leaseActIP ] <= 0 ) do={ :error \"empty lease address\" }\r\
    \n\r\
    \n:if ( \$leaseBound = 1 ) do=\\\r\
    \n{\r\
    \n  :local ttl\r\
    \n  :local domain\r\
    \n  :local hostname\r\
    \n  :local fqdn\r\
    \n  :local leaseId\r\
    \n  :local comment\r\
    \n\r\
    \n  /ip dhcp-server\r\
    \n  :set ttl [ get [ find name=\$leaseServerName ] lease-time ]\r\
    \n  network \r\
    \n  :set domain [ get [ find \$leaseActIP in address ] domain ]\r\
    \n  \r\
    \n  .. lease\r\
    \n  :set leaseId [ find address=\$leaseActIP ]\r\
    \n\r\
    \n# Check for multiple active leases for the same IP address. It's weird and it shouldn't be, but just in case.\r\
    \n\r\
    \n  :if ( [ :len \$leaseId ] != 1) do=\\\r\
    \n  {\r\
    \n   :log info \"DHCP2DNS: not registering domain name for address \$leaseActIP because of multiple active leases for \$leaseActIP\"\r\
    \n   :error \"multiple active leases for \$leaseActIP\"\r\
    \n  }  \r\
    \n\r\
    \n  :set hostname [ get \$leaseId host-name ]\r\
    \n  :set comment [ get \$leaseId comment ]\r\
    \n  /\r\
    \n\r\
    \n  :if ( [ :len \$hostname ] <= 0 ) do={ :set hostname \$comment }\r\
    \n\r\
    \n  :if ( [ :len \$hostname ] <= 0 ) do=\\\r\
    \n  {\r\
    \n    :log error \"DHCP2DNS: not registering domain name for address \$leaseActIP because of empty lease host-name or comment\"\r\
    \n    :error \"empty lease host-name or comment\"\r\
    \n  }\r\
    \n  :if ( [ :len \$domain ] <= 0 ) do=\\\r\
    \n  {\r\
    \n    :log error \"DHCP2DNS: not registering domain name for address \$leaseActIP because of empty network domain name\"\r\
    \n    :error \"empty network domain name\"\r\
    \n  }\r\
    \n\r\
    \n  :set fqdn \"\$hostname.\$domain\"\r\
    \n  \r\
    \n  /ip dns static\r\
    \n  :if ( [ :len [ find name=\$fqdn and address=\$leaseActIP and disabled=no ] ] = 0 ) do=\\\r\
    \n  {\r\
    \n    :log info \"DHCP2DNS: registering static domain name \$fqdn for address \$leaseActIP with ttl \$ttl\"\r\
    \n    add address=\$leaseActIP name=\$fqdn ttl=\$ttl comment=\$DHCPtag disabled=no\r\
    \n  } else=\\\r\
    \n  {\r\
    \n    :log error \"DHCP2DNS: not registering domain name \$fqdn for address \$leaseActIP because of existing active static DNS entry with this name or address\" \r\
    \n  }\r\
    \n  /\r\
    \n} \\\r\
    \nelse=\\\r\
    \n{\r\
    \n  /ip dns static\r\
    \n  :local dnsDhcpId \r\
    \n  :set dnsDhcpId [ find address=\$leaseActIP and comment=\$DHCPtag ]\r\
    \n\r\
    \n  :if ( [ :len \$dnsDhcpId ] > 0 ) do=\\\r\
    \n  {\r\
    \n    :log info \"DHCP2DNS: removing static domain name(s) for address \$leaseActIP\"\r\
    \n    remove \$dnsDhcpId\r\
    \n  }\r\
    \n  /\r\
    \n}"
/tool graphing
set store-every=hour
/tool graphing interface
add allow-address=10.20.30.0/24 store-on-disk=no
/tool graphing resource
add allow-address=10.20.30.0/24 store-on-disk=no

I still cannot see anything wrong. Just a blind shot as I cannot safely test it anywhere at the moment, try to remove the check-gateway=ping from the routing-marked routes.

Hi again, the ping check didn’t resolve the issue.
I’ve tried disabling one of the two ports that are connected to the uplink(s) (that’s ‘ether23’ and ‘ether24’)
while having all other PCC rules enabled and it works. It is only when both of them are active that things go wrong.
Also take a look at the screenshot, the accept mangle rule for ‘ether24’ has constantly zero traffic, isn’t that weird?


Other than that I am out of ideas. :frowning:

It seems to me that the overall topology must have has some effect on this. Could it be that the paths through the WANs unite later in the network and the response from the http server is routed to the other WAN than from which the request has come from your side?

The fact that the second “accept” rule doesn’t count any packets should be easily explainable by no packets matching its dst-address condition, so nothing to do with the PCC operation.

Well I can’t say if the WANs unite later.
They are two independent ISPs , one is the major telco in Greece (OTE) which provides us with an ADSL connection
and the other a local WISP which eventually (AFAIK) uses a pool of DSL connections to serve their clients. I am sure that
their pool contains ADSL connections from OTE but would that be a problem?

I wonder if my firewall rule(s) for which I’ve used an interface list with both WAN interfaces as members is causing any issues (?)

If the second ISP is a client of the first ISP, it could have some impact but it is unlikely that it would be so systematic.

Please enable the PCC configuration which does not work, configure /tool sniffer to sniff on both WAN interfaces simultaneously into a file, start it and try to establish a http connection to some website, let the circle roll for a while, and then stop the sniffer, download the file and publish it somewhere for analysis. As both your WAN addresses are private ones you shouldn’t need anonymization unless some services you use are running over plaintext protocols.

Hi,
I’ve uploaded a capture file here: https://we.tl/IqYEQvyD0p
I definitely saw the browser circle spinning when trying to visit https://gmail.com

Cheers,
Vagelis

The SYN packets do get to the remote servers as the remote servers respond with SYN,ACK, but the responses do not reach the clients as they send the SYN packets again. Apply e.g. a display filter ip.addr== 212.205.126.11.

On the other hand some TCP sessions do pass through completely.

So try to sniff again with bridge1 also in the interface list. I’d like to know whether the 'Tik forwards the response packets from WAN to LAN or not. The firewall rules seem not to interfere but maybe they do.

Ok, I uploaded another file at https://we.tl/nZk74zlNHz
This time I changed the rules (sniffer filtering) to capture from my personal workstation IP only.

This time there were infinite browser spins at https://gmail.com and https://www.dropbox.com
while https://wetransfer.com responded with some content (not sure if that was cached in my browser)
but the browser span forever as well.
BTW I hope I didn’t mess with the experiment but I had changed the PCC rule to ‘both-addresses’
instead of ‘src-address’ since yesterday and forgot to mention that to you.
The rest of the configuration is the same for sure.

The mess was to limit the sniff filter to the workstation IP alone because that way the NATed packets from the WAN interfaces are not captured so you cannot see each packet “before” and “after”. The change of PCC hash source doesn’t matter. So please try again with filter-ip-address=your.workstation.ip,wan.23.ip,wan.24.ip.

Hi again,

i did another capture as you said.
As before, I visited gmail, dropbox and wetransfer (via HTTPS) and the browser span endlessly on all of them.
Here’s the capture: https://we.tl/DsR5qstvYU
Many thanx for persisting on it!

Cheers,
Vagelis

I don’t get it - this capture contains packets from both WANs but none from the LAN (including the LAN client → internet server ones). What does /tool sniffer print show?

Other than that, while the dropbox session could not set up at all (the SYN,ACK responses probably haven’t reached the client), the wetransfer sessions worked bidirectionally for a while and then simply hung (the client has acknowledged a packet from the server and no more packets from the server came).

Maybe you could try with everything set as if for the pcc, but instead of the pcc rules, you would use a single rule which would look exactly the same like the pcc one but without the per-connection-classifier itself and would assign one of the connection-marks you use - you would first test with one of them and then with the other. That way, the whole policy routing mechanism would be tested, except that the traffic would use one WAN all the time.

This should tell us whether there is an issue in the policy routing implementation or whether the remote servers have a problem with something.

Hi,
I replaced the two rules with a single rule as you said and kept all other PCC machinery in place.
When setting the ‘noWire’ mark I could browse OK. With marking as ‘OTE’ nothing worked.
In both those cases though I couldn’t access an internal server
(which is at another subnet, our LAN is 10.20.30.0/24 and the server at 10.20.31.2)
Here are the two pcaps: https://we.tl/YsTvH2sZXD

Cheers,
Vagelis

The problem with access to the internal server is the easier part. You must either exclude destination addresses of local subnets from policy routing or you must add also routing-marked routes with these subnets as dst-address. Only routing-marked routes are considered when routing routing-marked packets, and as no routing-marked routes for dst-address=10.20.30.0/24 and dst-address=10.20.31.0/24 exist, packets for these subnets use routing-marked default routes. So the requests from your clients to server in 10.20.31.0/24 do not get to the server because packets coming from bridge1 are routing-marked so they get routed out to the WAN gateways. The responses of the server come from vlan10 and thus they would not be routing-marked, but there are currently no responses as no requests reach the server :wink:

So you should add another match condition to the action=mark-routing rules - dst-address=!10.20.30.0/23.

Plus if the server in 10.20.31.0/24 should also access internet using policy routing, you have to create an interface list with members bridge1 and vlan10 and refer to that interface list in the action=mark-routing rules which currently refer just to in-interface=bridge.

But in the sniffs you’ve posted no addresses from 10.20.30.0/23 exist again, so once again, can you post the output of /tool sniffer print?

Hi,
for the unreachable LAN issue I was under the impression that the existing PCC rules already exclude local destination addresses:

chain=prerouting action=mark-connection new-connection-mark=noWire passthrough=yes dst-address-type=!local connection-mark=no-mark in-interface=bridge1

Isn’t that the case?

I am sorry, I forgot the tool sniff print, here it is:

only-headers: no
memory-limit: 100KiB
memory-scroll: yes
file-name: multiwan-sniff-noWire.txt
file-limit: 4096KiB
streaming-enabled: no
streaming-server: 0.0.0.0
filter-stream: no
filter-interface: ether23,ether24
filter-mac-address: 
filter-mac-protocol: 
filter-ip-address: 10.20.30.235/32,192.168.1.190/32,192.168.10.15/32
filter-ipv6-address: 
filter-ip-protocol: 
filter-port: 
filter-cpu: 
filter-direction: any
filter-operator-between-entries: or
running: no

Unfortunately it isn’t, because address-type=local doesn’t match on whole subnets of which Mikrotik’s own addresses are members but only on Mikrotik’s own addresses themselves.
You’re not alone to get trapped, but the explanation on the wiki is accurate, it is just that our wishful thinking makes us read it differently :slight_smile:


I wasn’t clear enough - you have added the filter-ip-address value properly but I haven’t explicitly stated that it should be instead of, not in addition to, the filter-interface value. So the filter-ip-address condition did match the packets with your PC’s IP address, but the filter-interface didn’t because these packets only exist on bridge1 which is not permitted by the filter.