NAT Issues every 10-14 days

I didnt understand a word, but can I ask.

Are you pointing out a config issue or a more serious limitation in RoS where we should all flush them down the toilet and ask for our money back???

Simply written: A single Public IP has limits on how many connections can be NATtated (and tracked) at the same time.

Simply responded, what makes this person, the only person on gods green earth using MT devices to trip over this fact???
(the sceptical llama with occasional heartburn and flatulence)

My guess is based on the fact that it has 3Gbps, just one IPv4 address and several tunnels.
I don’t think he uses those 3Gbps just to keep his nephew calm by hypnotizing him with youtube… or not? :confused:

A fast way to deplete resources: open bittorrent or similar on one or more devices…

However it remains a hypothesis, there is no security.

If he gives that command and everything starts up again…

I see said the blind deaf mute to the horse…
All you needed to say was that it was stab in the dark, and thus lets wait to see if blood was spilled. :slight_smile:



can you provide more info about this? i think is very interesting

Thank You

Nothing special, TCP and UDP use 65536 ports, from 0 to 65535 (for now ignoring exceptions, assigned etc.)
Usually the first 1024 ports from 0 to 1023 are reserved from IANA,
From 1024 to 32767 are usually not-ufficially-reserved for other services, like sql, remote desktop, proxy, upnp, etc.
and the last group from 32766 to 65535 are usd for NAT.
This behaviour can be changed without problem, like assign port interval for each internal IP, etc.

I'll hypotheticize here as I've no idea how NAT actually works in linux kernel or ROS. But: for connection tracking machinery it's a quartet of addressing metadata that matters: src_address, src_port, dst_address, dst_port, if any of those changes, it's an entirely different connection (and additionally throw in protocol type to expand the possibilities). When it comes to NAT, one has to work with single src_address (in most cases) and limited number of ports (let's assume @rextended is right about 32767) ... however we get to work with plethora of dst_addresses and theoretically NAT could use same pair of src_address,src_port to connect to different remote addresses (or even to same remote address if remote port number is different). Which would mean that SRC NAT port pool exhaustion is not as likely as @rextended would like us to believe. If I'm right about src_port re-use.
Number of tracked connections is a different beast, I guess it's easy to skyrocket those. Specially with UDP "connections" (e.g. bit torrent) which don't have any connection "management" handshakes (unlike TCP which has the 3-way init handshake and a handfull of ways to terminate connection), so any firewall in the way has to rely on its own inactivity timers.

On my example I suppose only:
Only one unique Public IP,
Multiple internal IPs
Only one DNS used, that give the same resolved IP to all the machines.
The HTTPS use everytime same ports.

What the conntrack have to distinguish, for example tcp connections:
for example google.com = 172.23.23.23

On internal side, 3 devices, and can happen that the device use same ports that other devices, but the internal IPs port number not count on WAN traffic:
10.0.0.1:5678 → 172.23.23.23:443
10.0.0.1:5679 → 172.23.23.23:443
10.0.0.1:5680 → 172.23.23.23:443
10.0.0.1:5681 → 172.23.23.23:443
10.0.0.1:5682 → 172.23.23.23:443

10.0.0.2:5678 → 172.23.23.23:443
10.0.0.2:5679 → 172.23.23.23:443
10.0.0.2:5680 → 172.23.23.23:443
10.0.0.2:5681 → 172.23.23.23:443
10.0.0.2:5682 → 172.23.23.23:443

10.0.0.3:5678 → 172.23.23.23:443
10.0.0.3:5679 → 172.23.23.23:443
10.0.0.3:5680 → 172.23.23.23:443
10.0.0.3:5681 → 172.23.23.23:443
10.0.0.3:5682 → 172.23.23.23:443

now NAT take control:
10.0.0.1:5678 → 172.23.23.23:443 = 100.64.1.1:54879 → 172.23.23.23:443
10.0.0.1:5679 → 172.23.23.23:443 = 100.64.1.1:54880 → 172.23.23.23:443
10.0.0.1:5680 → 172.23.23.23:443 = 100.64.1.1:54881 → 172.23.23.23:443
10.0.0.1:5681 → 172.23.23.23:443 = 100.64.1.1:54882 → 172.23.23.23:443
10.0.0.1:5682 → 172.23.23.23:443 = 100.64.1.1:54883 → 172.23.23.23:443

10.0.0.2:5678 → 172.23.23.23:443 = 100.64.1.1:54884 → 172.23.23.23:443
10.0.0.2:5679 → 172.23.23.23:443 = 100.64.1.1:54885 → 172.23.23.23:443
10.0.0.2:5680 → 172.23.23.23:443 = 100.64.1.1:54886 → 172.23.23.23:443
10.0.0.2:5681 → 172.23.23.23:443 = 100.64.1.1:54887 → 172.23.23.23:443
10.0.0.2:5682 → 172.23.23.23:443 = 100.64.1.1:54888 → 172.23.23.23:443

10.0.0.3:5678 → 172.23.23.23:443 = 100.64.1.1:54889 → 172.23.23.23:443
10.0.0.3:5679 → 172.23.23.23:443 = 100.64.1.1:54890 → 172.23.23.23:443
10.0.0.3:5680 → 172.23.23.23:443 = 100.64.1.1:54891 → 172.23.23.23:443
10.0.0.3:5681 → 172.23.23.23:443 = 100.64.1.1:54892 → 172.23.23.23:443
10.0.0.3:5682 → 172.23.23.23:443 = 100.64.1.1:54893 → 172.23.23.23:443

Do you notice than you have only the PublicIP port than change and local Public IP, Remote IP and remote port are the same?

When one host guest multiple services, and those services rely on same htttps ports, only one variable can change…

And this example consider only one service and 3 devices, just with 5 connections each…

It doesn’t look like it’s hitting the limit:

Plus the manual says about it:

That print was not done when the connection hangs,
but my hypothesis, which remains so, is about the ports that can only be used on a single unique Public IP to mask everything on the NAT.

Surely conntrack has room for a thousand addresses.

Agree. But if other users connect e.g. mikrotik at 159.148.147.196, same port numbers with WAN IP address can be used.

In extreme case (bit torrent), where single client communicates with gazzilion of peers (tens of peers for every active torrent), it's enough to use one (or few) port numbers.

Just like DST-NAT, where single pair of WAN address + port (e.g. 443) is used to service potentialy milions of clients. It doesn't matter which NAT it is (SRC or DST), it's still the address/port quartet that identifies connection.

There would have to be one extremely popular IP address that everyone is connecting to (and to same port). Then yes, number of those connections would be limited to at most 65k (I’m not sure how many ports RouterOS uses for srcnat) for one local public address. I guess something like 8.8.8.8:53 could do it if too many internal devices used it.

Otherwise there’s no problem with reusing local ports, you can have multiple connections from local public address and fixed port for all, to different remote addresses and/or ports, and it will works just fine, e.g.:

 0  SAC  s  protocol=tcp src-address=192.168.80.10:57397 dst-address=re.mo.te.138:8291 reply-src-address=re.mo.te.138:8291 
            reply-dst-address=lo.ca.l.134:666 tcp-state=established timeout=23h59m59s orig-packets=3 404 orig-bytes=208 405 orig-fasttrack-packets=0 
            orig-fasttrack-bytes=0 repl-packets=5 503 repl-bytes=7 693 869 repl-fasttrack-packets=0 repl-fasttrack-bytes=0 orig-rate=43.4kbps 
            repl-rate=26.9kbps 

 1  SAC  s  protocol=tcp src-address=192.168.80.10:57400 dst-address=re.mo.te.139:8291 reply-src-address=re.mo.te.139:8291 
            reply-dst-address=lo.ca.l.134:666 tcp-state=established timeout=23h59m59s orig-packets=2 784 orig-bytes=171 781 orig-fasttrack-packets=0 
            orig-fasttrack-bytes=0 repl-packets=4 506 repl-bytes=6 275 963 repl-fasttrack-packets=0 repl-fasttrack-bytes=0 orig-rate=38.8kbps 
            repl-rate=279.5kbps

Same local lo.ca.l.134:666, no problem.

There would have to be one extremely popular IP address that everyone is connecting to (and to same port). Then yes, number of those connections would be limited to at most 65k (I’m not sure how many ports RouterOS uses for srcnat) for one local public address. I guess something like 8.8.8.8:53 could do it if too many internal devices used it.

Otherwise there’s no problem with reusing local ports, you can have multiple connections from local public address and fixed port for all, to different remote addresses and/or ports, and it will works just fine

I agree, that’s the reason why on CG-NAT (with tunned conn-track timeouts) a subscriber can be translated to a small range of SRC-ports like 64 ports (that’s the smaller amount in some other vendor’s CG-NAT solutions) and works OK

So the consensus seems to be that something is causing all my available nat ports to get used up and then NAT subsequently generally hangs.

And that to mitigate it I can run that command to see what happens when it hangs up, or lower the TCP timeout to something lower.

I don’t want to make it too low cause then actual persistent connections that don’t trade much information very often will drop.

What would the best command be to drop it to say 10 hours?

Thanks,
Matt

@rextended wrote command to shorten connection tracking timeout in post #20 above.

I wouldn’t care about idle connections too much. Most proper software using persistent connections implement keepalive functionality exactly for such purpose and everybody is welcome to enable it. So something around 1 hour should be fine (with app keepalive interval of half hour to match).

RFC5382 REQ-5 suggests that TCP Established Timeout should not be less than 2h4min.

2h4min

i think that’s a good start point from 1 day default setting

I had it set to 1h on my 1036 CGNAT router, which has worked well for the past year and a half for 400+ households. But I’ll take 2:04:00 (and round it up to 2:05:00 for good measure).