Apologies for the late reply; it has been a busy week.
No worries for late replies.
Just to make sure I have established the correct picture of your situation (as it's not your everyday SOHO setup you have running there):
- Traffic originating from 192.168.110.253 destined for 172.18.3.4 is coming in on MT2:eth10|192.168.110.1/24.
- One would expect traffic destined to 172.18.3.x to leave MT2:eth2|192.168.2.9/29 to MT3:eth5|192.168.2.14/29 as per routing table entry for subnet 172.18.2.0/23.
- Instead it shows up in MT1 firewall thus leaving MT2:eth1|192.168.2.6/29 towards MT1:eth?|192.168.2.1/29
- According to routing table of MT2, MT1 however would normally only route subnets 192.168.[11-14].0/24 as well as the default route 0.0.0.0/0
- In MT1, this traffic this then being marked as internet bound traffic given the ISP1_conn, ISP2_conn and ISP3_conn marks?
- Nevertheless, traffic eventually arrives through MT3 allowing control of the access points.
Indeed, not immediately obvious where things are going wrong.
Your assumptions are all right. The path off communication is indeed as you described.
Some questions:
- Is it just traffic from the CMR and/or to this subnet where you're seeing this behaviour or is it more wide-spread?
- Is there any filtering going on at MT2 that would prevent a direct route to MT3 without looping through MT1?
- I see some VLAN and bridge configurations. No strange loops introduced here?
- Wild guess: Ubiquiti discovery is taking place through L2 UDP broadcast 255.255.255.255:10001 / multicast 233.89.188.1:10001. Could it be that things are going "wrong" here, i.e. untagged multicast traffic traveling through MT1 as rendezvouz point?
I have seen this for actuall all my Ubiquiti devices managed by the CRM Point.
During the writing of the post I did check again and found NOTHING of this behaviour anymore!!!
Now here is my thought.
Last week I found out that, specifically when I would do a scan for multiple 192.168.x.0/24 or 172.18.x.0/23 segments from the UCRM, a lot of traffic is generated towards IP segments I do not use. For lowering the load on the CPU and block useless traffic I added unreachable routes to my routing table (I started with blackholes but found out later that unreachables were better in terms of trouble shooting).
My unreachable rules look like
add dst-address=10.0.0.0/8 type=unreachable
add dst-address=172.16.0.0/12 type=unreachable
add dst-address=192.168.0.0/16 type=unreachable
The routing tables on my routers are distilled from this, the OSPF information and directly connected networks.
Could it be that sometimes the OSPF routes get ditched for a short amount of time, and that this caused the traffic to end up at MT1?
Because before I added the unreachables, when an OSPF neighbor goes down, the 0.0.0.0/0 towards MT1 would have taken over the route for the 172.18.2.0/23 network and traffic towards the 172.18.3.x clients would endup on MT1.
Now since I have the unreachables, when an OSPF goes down, the network goes to unreachable., right?