I think I realized where you misunderstood. You’re thinking I’m using RB to provide services to periodic connecting people whereas in fact I’m stuck with a RB between my LAN (ether1) and my PPPoE WAN link (pppoe-client1). I’m not even 100 % sure how they configured it, they’re just not believing me that this is a NAT connection tracking issue (which I have already illustrated affects pretty much every DSL router out there, Linux is affected, and now MT). I seriously do prefer having a full-blown Linux box on my gateways where trouble-shooting these kinds of problems is not only easier and faster, it’s more accurate, less guessing and I typically solve issues there in a tenth of the time it takes me on MT.
Ok, so after fooling around for a while longer I can now (assuming my internal LAN attached the ether1 is in the 192.168.0.0/24 range) use the following to find all connection that has a dst-reply-address in that range. This is most likely all the connections that I want but I need to actually attach a negated check on the dst-address, ie, only if the dst-address isn’t also on the LAN:
/ip firewall connection find reply-dst-address~"^192\\.168\\.0\\.[0-9]+:[0-9]+\$"
Imho that is fugly. However, it’ll work. But it’ll also trap all internal connections to the RB, so additional filtering is required to ensure that internal connections are not trapped. I’m not sure about that additional filtering. And that is where I’m getting stuck at the moment. The ~ comparison gets us posix extended regexes. Which doesn’t do a “doesn’t match”, so the options that is required simply doesn’t work:
- Check that dst-address is NOT on the LAN.
- Check that reply-src-address is NOT on the LAN.
I’ve tried various ! intrapositions, eg:
! dst-address~"nasty regex"
dst-address !~ "nasty regex"
dst-address ~! "nasty regex"
dst-address ~ "!(nasty regex)"
None of these worked.
The idea can be expanded to cover typical “internal” IP ranges quite easily so have a single script cover things nicely (if you do this tens of times a year it’s often easier to write a single script that you just use everywhere):
/ip firewall connection find dst-reply-address~"^10\\.[0-9]+\\.[0-9]+\\.[0-9]+:[0-9]+\$"
/ip firewall connection find dst-reply-address~"^192\\.168\\.[0-9]+\\.[0-9]+:[0-9]+\$"
/ip firewall connection find dst-reply-address~"^172\\.(1[6-9]|2[0-9]|3[01])\\.[0-9]+\\.[0-9]+:[0-9]+\$"
Just the negated check that’s missing to make that function properly. Please don’t be naïve enough to think that the above will never have false positives, even with the negations added, if you use the generic script you will need to attach all three negations to each of the above to prevent false positives. Even then I’m going to say your mileage may vary.
The above may (possibly) be good enough if your INPUT and OUTPUT filter chains are all on ACCEPT and you don’t have any connection-marks in the mix on those connections. Also, when you start doing load-balancing and other smart routing tricks it gets way, way more complicated than this and the above will be extremely inadequate. Fortunately it’s also very difficult to setup complex enough routing examples on MT where those more complicated cases will trigger.
Just to try and express the issue even better, take the SIP example where an internal IP 192.168.0.10 will try and establish SIP to 1.2.3.4 in the down case. A DROP (as has been suggested will help to prevent the problem - I prefer using unreachable routes for the private ranges and then use OSPF between routers internally to make sure everything routes where it should without accidentally sending private IPs out to the interwebs) doesn’t prevent the connection tracking entry from being created. So if the pppoe is down, the system will only have routes to the internal networks. Thus the packet comes in on (eg) ether1, it goes throught the mangle, and dst-nat tables, then when it goes to the routing decision it gets dropped. At this point conntrack will still create a connection tracking entry with the following:
dst-address 1.2.3.4:5060
src-address 192.168.0.10:5060
reply-dst-address 192.168.0.10:5060 <-- problem.
reply-src-address 1.2.3.4:5060
Note that my client in this particular case just shot this down as “we have a static IP, this doesn’t apply”. I wish that was true. Even when you have a static IP on the pppoe, if the link is down the routing won’t happen, it’ll never get to src-nat, triggering the problem.
The other potential fix may be to somehow wangle the NOTRACK (is this even available on MT?) into the mix - but I can’t possibly think where, because until the routing decision has been made we don’t know that we want NOTRACK, and once the routing decision has been made … well, it’s too late, we will never see the packet in netfilter again.
Come to think of it (yes, I’m ranting here …) isn’t this a kernel bug? Shouldn’t the kernel just be modified to not create a conntrack entry if the packet cannot be forwarded, resulting in future packets for the flow also being treated as NEW … resulting in new conntrack entries when they eventually do go out? (keep in mind that according to the iptables man page MASQUERADE will drop conntrack entries for connections on that interface should the interface go down … I suspect this is done by purging all entries where reply-dst-address matches any of the addresses that was assigned to the downed interface, not by actually tracking flows to interfaces.) Then again … even that won’t solve the more complex cases I had in mind above where routing changes from one interface to another (both of which is NAT’ed with the peer doing return-path-filtering).