Router getting NAT somehow confused

Hi all, we have a very strange issue that has suddenly come up.

We have two internet connections connected directly to our MTK router, and with it two “live” IPs. One is High-Quality, but costs per gig, and the other is Good-Quality, uncapped. So we obviously route our calls via the HQ link and the general internet via GQ link.

Routes:
0.0.0.0/0 via GQ-Eth0
X.X.X.X via HQ-Eth1 (specifying the SIP server IP)

This works perfectly “some of the time”. The problem that is coming up is that at odd times, usually at night, the SIP phones de-register and the NATing becomes “messed up”.

If i look at the Firewall > Connections tab, i can see clearly that the NAT is just wrong… the “Reply Dst. Address” shows the IP of the GQ link (defined in 0.0.0.0/0 route), even though it went out via the clearly defined HQ route.

Its like its “forgetting” to correctly NAT the traffic on that route, and marking its reply address as the GQ link’s address. I can solve the issue quickly by selecting the connections in the list and deleting them (- sign), they then connect again fairly quickly and all is good again… this is by no means a solution tho.

How are you identifying and marking your VoIP traffic? I wonder if you aren’t getting some of the VoIP traffic over each link. We identify VoIP traffic based upon either the source or destination being our VoIP server subnet. However if I didn’t control the VoIP server subnet then I would probably base it on being related to the sip device if you had dedicated sip devices. If your using soft phones and don’t have control or knowledge of the remote end then your probably stuck identifying traffic based upon protocol & port. One idea could be to add the remote ip to an address list when sip traffic is found then send any traffic to anything on that address list over the better connection.

If your sure your traffic is going through the proper connection then you can try disabling the SIP helper:
/ip firewall service-port disable sip

No marking, no nothing, the router is pretty much “out-the-box”, with just that one additional route in the IP > Routes section.

Ah… that’s the thing i was looking for. I take it, this is like “SIP ALG”.

I have disabled it, will see how it goes over the next few days.

The uplink chosen will be choose via the routing function. The miens of communication between the firewalling functions which are identifying the traffic and the routing functions will be routing marks. (For efficacy purposes as well as keeping connection based traffic together you would want to mark connections and then use the connection marks to add routing marks to packets that belong to that connection.) At any rate, if your not adding a routing marking to the packets then I suspect that your actually not reliabilly sending your VoIP over the high quality link.

These links may be helpful:
http://wiki.mikrotik.com/wiki/Manual:PCC
http://mum.mikrotik.com/presentations/US12/steve.pdf

You’ll have to translate for VoIP traffic but the following explains some concepts for policy based routing:
http://wiki.mikrotik.com/wiki/Policy_Base_Routing

Nope, its just happened again, about 10 of the 20 or so phones are not online because the Reply Dst. Address is “wrong”. The connections have a timeout of about an hour, so it seems like it could happen as often as every hour.

Please explain the “correct” way to do this reliably? You mentioned “tagging” the packet/connections/routes?

I’m pretty sure the traffic is flowing “out” the HQ route… if I perform a trace-route it correctly routes the traffic.

The traffic will be returning to the same IP that it is sourced from so I expect that it is not flowing out the better connection. I don’t know enough about your traffic to tell you the best way. Here is something that should work for you but I am making a bunch of assumptions:

  • Create an address list for internal IP’s called “VoIP”
  • Mark the routing on any packets from any internal IP that is in the VoIP address list with a mark of “HQ” in the prerouting mangle chain.
  • Create a route with the routing mark of “HQ” with the gateway of the HQ connection interface.

This will look something like:

/ip firewall address-list add list=VoIP address=192.168.88.55 # Adjust the IP for the IP of your VoIP device
/ip firewall address-list add list=VoIP address=192.168.88.56 # Adjust the IP for the IP of your VoIP device
/ip firewall mangle add chain=prerouting src-address=VoIP action=mark-routing routing-mark=HQ
/ip route add gateway=ether2 routing-mark=HQ # Adjust the gateway to the proper interface name

This will force any traffic from 192.168.88.5 through the HQ link and return traffic will return to the source link. This assumes your using NAT. You would probably want to make any DHCP assignments of VoIP equipment static or give the VoIP equipment a static IP that is not in your DHCP pool. If your worried about the ether2 connection going down you can add a check gateway option to automatically remove the route when it becomes unusable.

Thanks for the help joshaven, ill give that a go, and let you know.

Last night i came across the “Pref. Source” setting in the Routes section. In reading about it - it seems that this is exactly the setting it getting “confused” about. It says:

Which of the local IP addresses to use for locally originated packets that are sent via this route. Value of this property has no effect on forwarded packets. If value of this property is set to IP address that is not local address of this router then the route will be inactive. If pref-src value is not set, then for locally originated packets that are sent using this route router will choose one of local addresses attached to the output interface that match destination prefix of the route (an example).

So the part that says “…If pref-src value is not set, then for locally originated packets that are sent using this route router will choose one of local addresses attached to the output interface that match destination prefix of the route…

This seem exactly the behavior that’s happening - its choosing an IP address, but choosing the wrong one.

I have set the “Pref. Source” in my static route to the IP of the HQ link, and left it overnight. This morning a checked again, and only 2 connections (of the 20) where wrong! So it seems to have “improved” things, but its not 100%.

Let me see what i can do with your suggested technique. Will let you know, thanks.

Right, i have done what you said, still the same problem this morning.

I have the prerouting mark and the Pref Source, still the same thing!

Does it make any difference to mention that the HQ link is established with a PPPOE while the GQ link is a Ethernet (Diginet) connection.

I think i see what the problem is, but im not sure how to fix it.

In the IP > Routes section, the Routes are correctly defined, however if i tab over to Nexthops, i notice that the HQ link (PPPoE link) does not appear. I cant ping it, but it sure is established, and DOES work.

See i think at the time that the packets are routed, its looking to see if the “next hop” is available, maybe since it cannot ping it, it doesn’t think the route is reliable.

How do i “assure” it, or should I contact my ISP and ask them to open ICMP?

(BTW, i cant enable ARP ping to test the gw because its a PPPoE, and ARP requires an Ethernet connection.)

For those who stumble across this post - here is the solution:

We had no joy from the ISP - they where not interested in making any modifications. fair enough. So we got another RB750 to “terminate” the fibre. This way the new RB750 (let’s call this the Fibre Terminator router) was always available and ping-able by the main router that had the static routes set on it.

The Fibre Terminator router had the PPPOE connection etc and just handled the one connection.

Turns out this worked perfectly - the routes never got confused again, and even if the fibre “re-connected” its pppoe connection or whatever the case, the main router continued to route to the Fibre Terminator router.

Yay! :slight_smile:

PS, im not sure if this is possible using just one device, anyone?