I have a working failover setup, and we are using firewall rules. Well, it works for new connections once the failover has occured. The previosouly established connections are still trying the old route through the firewall.
When a failover happens, all works EXCEPT previsouly established UDP connections (through the now-failed route). What is happening is that the firewall isn’t letting through connections which aren’t established (and rightfully so, because the new connection ISN’t established).
If I manually go to the ip firewall connections and DELETE the established connection (the one that is now-failed, i.e. the “old” active connection), then on the next connection from the device, all works.
So I am thinking i should lower the UDP timeout to something low, but the issue is that this is being used with a SIP server (asterisk), and I dont want the timeout to be so low that the established connection times out when there is no failure and then we have no more incoming traffic working until it gets re-established.
Is there any way to essentially “flush old onnections” when the gateway does an auto-failover with routing (i.e. when we have check-gateway=ping, if it does a failover, to flush established conns on the failed route) ?
modify the script that changes the routing when the check-gateway ping fails (I assume I can modfy this ? I never made one, I am assuming though that one exists and is just being called when I use the ‘checkgateway’ parameter); to also flush the firewall connections (after the routes change)
change the firewall udp conn timeout to some lower value
change the asterisk behaviour to send some sort of keep alive more often
but #1 really would solve the issue, #2 and #3 would just ‘speed up’ the recovery after failover. So… is anyone else doing this ? Dont you HAVE to if you are using failover with firewall ? is this a FAQ that I couldn’t find ?
This is what I have for PCC with my failover. The script I created watches the status of the ISP connection, and if it goes down it first checks to see if the route is still active, and if it is, it then disables the route and runs this line. Any connection that has been marked as “outside1_connection” it will remove from the connections list.
I have it on a delay of 5 seconds after it sees the down state, and the script will exit/not clear the entire table if it runs into an entry it cannot delete, I’m hoping the 5 seconds will take care of that however. I also wasn’t able to clear the connection table reliably any other way however.