We are forwarding all traffic onto TE tunnels. the issue we are having is that we have redundant routers at every edge on the network, running VRRP for gateway protection.
We build tunnels from the BGP edge (also redundant) to each edge device that is 2 x 2 = 4 tunnels in each direction.
Now we setup static routes on the bgp edge and forward the traffic onto a tunnel, with a second tunnel as backup gateway. we have “disable running check=no” set on the tunnels.
if one of the edge routers dies for any reason (bgp or customer edge) it can take between 120 - 165 seconds for the other end to realize the tunnel is no longer live.
I read another post that said this timeout is determined by refresh-time * k-factor, where can i change these settings and what is the absolute lowest these values should be set to in order to have a stable setup that will survive normal packet loss?
You can configure k-factor and refresh-time in “/mpls traffic-eng interface” menu. Basically these parameters control how often RSVP Path and Resv messages are sent out the particular interface. You can think of it as of hello interval for OSPF. The basic idea is - refresh-time specifies how often messages are sent (that is - load on routers/network), but k-factor influences how timeout is calculated (that is what packet loss will be tolerated - how many consecutive messages must be lost in order to consider path down). For detailed info on this see RFC 2205 section 3.7.
If you are using CSPF, you can also use periodic path reoptimization. But take into account that path info must be redistributed to tunnel head-end router by the moment reoptimization occurs and this will depend on OSPF timeouts (OSPF adjacency timing out).
By adjusting RSVP timeouts and routing protocol timeouts you can make change response time smaller, but still this path change will not be immediate (like matter of milliseconds). The correct way would be to use BFD for RSVP or TE fast reroute. Unluckily none of these are available at this time yet.