I cannot for the life of me get this to work (newer mikrotik user here). For simplicity I wiped all of my firewall configuration and just left the NAT rules.
DST-NAT from external → internal. UDP. UDP respoonses are NOT getting processed, seen, or forwarded from the mikrotik router.
It is like the UDP session is not tracked in the firewall. The response isn’t even processed, and packet sniffer directly on the interface doesn’t even see the response. but the server is responding.
I have replicated this using BOTH, CITRIX XEN and Proxmox for hypervisor hosts.
PROBLEM FOLLOWS HARDWARE
CCR2116-12G-4S+
CCR2116-1G-12XS-2XQ
Have attempted using the following ROS/Firmware
7.20.2
7.20.6
7.19.2
What connection tracking mechanism or other configuration item could I be missing. The return traffic is failing, repeatable. across hardware / firmware.
I didn't take the time to review everything. Half-covered selective screenshots are not the most natural to process.
From a glance, however, it seems to me that you're under the impression that you need both a dst-nat and src-nat rule. This is not the case.
You are right that in order for the packet flow to complete successfully, both the dst addr of the incoming packets and the src addr of the reply packets have to be changed, but this is implicit in the single dst-nat rule.
A connection can be both dst-nated and src-nated, but that's not for your use case (as far as I understand.)
Otherwise do a full/proper packet capture to further examine what's happening.
EDIT: The fact that the reply packet doesn't register as replied-to generally means that either the packet is not recognized by conntrack (doesn't match the expected addr-port-protocol 5-tuple) or is otherwise blocked.
src nat was implemented as a test to try to get the return path to work. I know the packet is leaving the server….on the wire….but the response is not registered….not even “reported” by the mikrotik??? Where is the packet? why isn’t the packet sniffer(running on the physical interface of the mikrotik) seeing it? It’s like it gets shoved into hardware and dropped and never reported to the CPU. It is being dropped by the L3 offload function…..
If I turn off L3 Hardware offload on the switch……. It works….so this must be something in the L3 offload on the mikrotik? I just upgraded my ROS and Hardware to fix OVERLOADED CPU to specifically turn on L3 offload….. Is there a way to make this work with L3 offload turned on?
It's quite normal for packets that are handled by the hw offload to not show up in the sniffer. (The actual cpu never gets them. Ultimately if you want to sniff the wire, that's what you'll have to do.)
Be careful in how you set up hw offloading. Full l3 offloading doesn't do nat. Flow offloading does.
If (and that's the big if) you have configured everything correctly, and you have rebooted the router, it can only be a bug in the offloading engine/driver. Lots of people are using it for all sorts of heavy traffic scenarios with generally good results, so while possible, it's not all that likely.
Hence why I am here. to actually seek how to fix the issue. I can troubleshoot all day long. but how to fix it? I have 20+ years with every other hardware vendor (enterprise class), and only about 6 months with these Mikrotik devices(hobby grade). I am used to features…..just working. I am not accustomed to these devices, how they work…etc.etc. going to a forum…only be told what I already know…..”You have successfully debugged it yourself….” ya…I know how to troubleshoot. What I am trying to determine is how to actually fix the issue…..Turning off L3 offload and dumping all of the load back to the CPU…which is why I upgraded the devices to begin with….is not a fix…. Either connection tracking is broken for UDP for L3 offload. OR…. I am configuring something incorrectly…..
What is Flow offloading? Where is that feature enabled/disabled?
You really don't provide much to go on. Without L3 it works, with it turned on, it doesn't.
Configure offloading for L3 fasttrack (that's what MT calls offloading to the hw flow table) according to the documentation. Pay special attention to per-port and per-switch l3 offload settings - if they are not as they are supposed to be, packets can get eaten. Maybe this is what's happening?
When you are still gathering experience, it is probably better to first try the dst-nat of a TCP service like a webserver or SSH server (not FTP). Once you get that working you can try a UDP service like DNS.
Only when you are very familiar with everything, you try a PBX. Because VoIP is an order of magnitude more difficult.
Wild guess: there is a protocol helper for sip in Firewall → service ports menu. It usually helps clients to pass the NAT (it patches registration commands, invite commands, and so on), try to disable it.
But +1 for checking TCP DNAT, then UDP DNAT, then SIP. I remember running asterisk behind the NAT without any problem, but I didn’t use hw-offload in that config.
I was able to fix it using a switch rule. Funny to have to tell the device to push the traffic to the CPU in order for it to process…but it is working finally!!
Just wish it was 100% working in Hardware….instead of having to copy it to the CPU. But our VOIP throughput is not very much so I am not worried about CPU tax here. I am weary though…..I think I need to look into a different firewall solution instead of attempting to combine ISP customer connections and ISP services (dns, billing, radius, etc…) on the same device. (core router)
I am the Sr. Network Engineer here….in fact the ONLY engineer here…so it’s fix it or die trying I tested everything when I migrated the config from 6.x → 7.x / upgraded the hardware. But this was not tested…so it is my fault…I shouldn’t have assumed traffic flow.
Glad you found that you need the helper. A more elegant way to achieve software forwarding is to selectively not apply the fasttrack-connection action to these connections. This way it won't get pushed to the switch chip and the switch rule becomes unnecessary.
The connection is marked for fasttrack in the forward/filter chain. You simply don't mark these connections.
There are lots of ways to skin that cat. One of the common ones is to add the filter condition connection-mark=no-mark to the fasttrack-connection rule and add a mangle/prerouting rule with action=mark-connection new-connection-mark=dont-fasttrack and the filter criteria for your dst-nat rule...
This way the connection won't get fasttracked, won't be pushed to the hw flow table, and won't be intercepted by the switch chip. It's altogether a nice approach to de-fasttrack connections that need any sort of special snowflake handling.
You are talking as if “this hobby equipment does not work” and “the professional equipment does not have such issues”, but you know very well that this is not the truth. I have worked with Cisco stuff in the past and they have bugs as well, and you may have a hard time to get a complicated config working too. E.g. I configured routers to use a VPN between branches over DSL with an ISDN backup path, including having 2 DSL lines at head office with failover/load-balancing, and I can assure you there were many issues that then were later resolved in a software update, or required workarounds like turning CEF off. Similar to what you are encountering here.
There are always going to be bugs that you encounter when having a niche configuration, and in other devices, that already cost more to buy, you may have to pay for support as well. Your decision if you want a cheaper solution that may require (in your belief) more tinkering, or if you go for the “professional” solution that may cost more money and less time, when you are lucky.
There is little point in complaining about it here…
niche configuration? wow. l3-offloading of UDP traffic onto an ASIC…is so…2026. Like….it hasn’t been an industry standard for……15+ years? Bugs are bugs…ya. I remember when Cisco first released the 6148 blades….they wouldn’t forward < 64byte frames. It broke DHCP and dell and hpe kickstart functionality. The only way to resolve was to replace the blades because the ASICs had a physical hardware limitation. Or back when I found a bug in red hat linux where it would send responses out of it’s inactive bonded interface which essentially doubled the traffic coming out of the server(duplicated) and I had to engage red hat to have them write a custom patch to fix their kernel (worked 20 hour days for 14 days straight to fix and patch 10,000+ servers). bugs are bugs….. Excuses…are exactly that. Fix the issues, work the problems, collaborate on the forums. It really gets tiring….of being directly….”attacked” for wanting better our of my equipment. I don’t make the buying decisions. If I did….Mikrotik wouldn’t be on the menu…..period. But here I am…asking for help and being told not to complain……the very definition of asking for “help” is a “complaint something doesn’t work”. I find it fascinating these devices can do so much with as little hardware as they have. I supposed I will keep my complaints off of this forum and just rely on myself like I always have done. Thanks pe1chl for pointing out how pointless this forum really is….and why I will not use it in the future.