WAN Failover with Cradlepoint

Following up on a similar thread here: http://forum.mikrotik.com/t/dual-wan-failover-help-and-advice-needed/135444/1

I’m setting up a CCR in a WAN failover configuration with an Ethernet connection to upstream internet and a Cradlepoint router with a Verizon USB stick attached for failover. All of the failover decisions will be made in the CCR with the /ip route commands:

/ip route
add check-gateway=ping distance=1 gateway=172.30.48.49
add distance=2 gateway=192.168.35.1

I go into the terminal of the CCR and start a ping session with 8.8.8.8, all is happy. When I disconnect the first gateway, the distance=1 route goes unreachable, and the distance=2 route becomes active, after about 15 seconds of timeouts the pings resume albeit with higher response as the primary route is fiber optic backbone and the secondary route is LTE through the Cradlepoint. From the LAN side while connected to the primary network the endpoints ping outside IPs without fail. When on the LTE through the Cradlepoint the pings to the outside from the LAN (bridge) fails. The LAN endpoint is assigned an IP in the 192.168.1.x range with a default GW of 192.168.1.250

/interface bridge port
add bridge=bridge1 hw=no interface=ether5
add bridge=bridge1 hw=no interface=sfp-sfpplus1
add bridge=bridge1 interface=ether1
…. [Some EoIP tunnels]….
add bridge=bridge1 interface=ether2
add bridge=bridge1 interface=ether3
add bridge=bridge1 interface=ether4
/interface list member
add interface=bonding1 list=WAN
add interface=ether6 list=WAN
add interface=bridge1 list=LAN
/ip address
add address=172.30.48.50/28 comment="Outside IP" interface=bonding1 \
    network=172.30.48.48
add address=192.168.1.250/24 comment="Inside IP" interface=bridge1 \
    network=192.168.1.0
add address=192.168.35.2/24 comment="LTE BACKUP" interface=ether6 network=\
    192.168.35.0

Here’s the NAT -

/ip firewall nat
[i]add action=dst-nat chain=dstnat comment="1:1 TEAM ROUTER NETMAP" disabled=yes \
    dst-address=172.30.48.60 to-addresses=192.168.35.2
add action=src-nat chain=srcnat disabled=yes src-address=192.168.35.1 \
    to-addresses=172.30.48.60[/i]   <----- These are disabled, I enabled them as a test and still was not working
add action=masquerade chain=srcnat comment=\
    "Can be used to NAT through One IP" out-interface-list=WAN

And finally the firewall rules - yes the drops are disabled. Its not like that in production just in testing mode

/ip firewall filter
add action=drop chain=input comment="Drop Invalid Connections" \
    connection-state=invalid disabled=yes
add action=drop chain=forward comment="Drop Invalid Connections" \
    connection-state=invalid disabled=yes
add action=accept chain=input comment="Allow Connections From LAN (DNS,..)" \
    src-address=192.168.1.0/24
add action=accept chain=input src-address=192.168.35.0/24
add action=accept chain=input comment="Allow Established Connections" \
    connection-state=established,related
......some miscellaneous services not relevant here ......
add action=drop chain=input comment="Drop Remaining Inputs" disabled=yes log=\
    yes
add action=accept chain=forward src-address=192.168.1.0/24
add action=accept chain=forward src-address=192.168.35.0/24
add action=accept chain=forward comment="Allow Team Router WAN out network" \
    in-interface=ether6 out-interface=bonding1
add action=accept chain=forward comment=\
    "Restrict New Connections to beings sourced from LAN only" \
    connection-state=new src-address=192.168.1.0/24
add action=accept chain=forward comment=\
    "Restrict New Connections to beings sourced from LAN only" \
    connection-state=new src-address=192.168.35.0/24
add action=accept chain=forward comment="Allow Related Connections" \
    connection-state=related
add action=accept chain=forward comment="Allow Established Connections" \
    connection-state=established
add action=drop chain=forward comment="Drop Remaining Forward Chain" \
    disabled=yes log=yes

Instead of using the interface list on the NAT statement, try creating two NAT rules with explicit out interface statements - one for the fiber WAN and one for the LTE interface

I’ve done a similar setup numerous times without issues switching from fiber to LTE on the private LAN side.

I split out the NATs as you said so instead of the interface list I have two NATs - one for the WAN side, the other for the LTE/Cradlepoint. Still no joy from the CCR LAN.

Looking at the Cradlepoint I tried a ping from the LTE local subnet (192.168.35.x) to the CCR LAN (192.168.1.x) and it failed. So I added a static route to the LAN IP of the CCR. Still no go.

After capturing packets on the Cradlepoint LTE, I found the issue - the outbound packets from the CCR LAN going out the Ethernet interface to the Cradlepoint are marked with the WAN IP coming off the interface going to the fiber WAN instead of the interface IP of the Cradlepoint interface. The Cradlepoint doesn’t know what to do with these packets so it drops them.