1. Is the above understanding correct or did I miss something ?
Correct.
2. How many ports should you allocate to an "end user" to let him comfortably access to "usual Internet services" (emailing, web surfing, ...) ?
Is allocating 100 ports, starting from port 10000, a good practise for 500 users, max ?
It depends on the type of users and their typical activity and the applications they use. After a TCP session is terminated peacefully, the client-side port must not be reused for another TCP connection to the same server side IP and port for 2m30s. Many application programmers do not have a clue about such kind of traps in networking, so some applications close TCP sessions after every request-response exchange, some do it properly and reuse TCP sessions, i.e. close them only after a certain timeout (of minutes) after the last request has been sent. So for just web browsing, it may be fine, for some specialized client-server applications, it may create hard to find issues.
And what if you have 5000 users ?
Deterministic NAT is a poor man's workaround for small networks where you cannot log connection initiation events on an external storage. So for 5000 users sharing a single public IP it is unusable. It is also unusable (at least alone) if you have dynamic access (like a complimentary WiFi for random customers in a store).
3. Is it correct than implementing RFC7422 on Linux and derivates, means adding many lines such as see [1]:
/ip firewall nat add chain="xxx-$($i / $x)" action=src-nat protocol=tcp src-address=($srcStart + $i) \
to-address=$toAddr to-ports=$prange
Correct. There is no embedded functionality where you would set the number ports allocated to each internal address and let the magic of RouterOS do the rest for you. You can use scripting as in the example to
generate those individual rules, like in the example, but you cannot have a single rule evaluating an arithmetic formula to calculate the new source port from the original source IP address on the fly.
4. How does it scale with Mikrotik hardware if you need Gigabit throughput ?
When NAT is used, what causes the load is that every single packet needs to be matched to the list of existing connections, and modified before getting forwarded if NAT handling is required for that particular connection. The first packet of each (potential) connection is typically seen by much more firewall rules than the mid-connection packets, but the difference between having a single NAT rule and 500 NAT rules will be negligible unless all the clients' connections consist of just a few packets. 5000 rules may be a different thing, so you'd need to build some kind of a binary tree to reduce a number of packets traversed by any given packet. Say, you have a 192.168.0.0/19 (8190 addresses) for clients. Whilst with 8190 individual rules, an average initial packet of a connection will traverse 4095 of them until it gets matched; using the tree approach illustrated below, you can reduce that figure to about 1024 rules per average packet. It is just an example, for smaller source subnets networks the reduction ratio may not be worth the effort spent to create the configuration.
/ip firewall nat
add chain=srcnat action=jump jump-target=T20.16 src-address=192.168.16.0/20
add chain=srcnat action=jump jump-target=T21.8 src-address=192.168.8.0/21
... individual rules in chain=srcnat for src addresses 192.168.0.1 to 192.168.7.255 ...
add chain=T20.16 action=jump jump-target=T21.24 src-address=192.168.24.0/21
... individual rules in chain=T20.16 for src addresses 192.168.16.0 to 192.168.23.255 ...
... individual rules in chain=T21.8 for src addresses 192.168.8.0 to 192.168.15.255
... individual rules in chain=T21.24 for src addresses 192.168.24.0 to 192.168.31.254