Hi guys, sorry for late reply, been busy.
marrold, I'd like to help resolve it as much as I possibly can, however I'm operating in production, so I can't just decide to change things and pull readings etc whenever I want. Production world doesn't work like that.
And to follow up before someone says "you shouldn't do this in production anyway!"... I agree, and we did bench test it as much as was possible. We don't have a spare voip setup that matches this exact layout.
This router is doing OSPF, BGP, etc and is acting as a main gateway on a fiber line. There are too many variables to reproduce, and I have nowhere that I can put it where it is in an exact same configuration. We have gotten it as closely as possible to this situation on the bench, but have not been able to go further. (Hence why the cutover went perfectly, despite this one single issue) My goal this week is to try and get further testing done, and hopefully some *&(*&@ packet captures.
To confirm some of the questions:
- The Cisco ATAs are SPA122 and SPA112. Some are behind a PPPoE connection directly to this router, others are sitting on IP/Ethernet links (Passing through radio bridges). Network configs don't seem to matter. (This makes me feel that this is not an MTU issue either, as it was one of my theories.)
- The config between the upgrades on this router are absolutely identical export/imports. I can flip the firmware between 6.20 and 6.21+ like a switch and it's the only bug/problem that we encounter.
In the end this could be a possible VOIP config issue. But I'm seriously doubting it from what I'm seeing. Perhaps it's an incompatibility with the Cisco units as well, but this seems to occur across all Cisco firmware versions. I confirmed 4 different versions running on ATAs, including the newest version. I also don't believe this is a Cisco issue or a voip config issue as these ATAs run on every other provider, with almost every crazy configuration possible, without any issues. This is the first time I've seen anything actually "break" them.
As I said, I'm going to try and get some packet captures, but it's going to take time.
I just wish I could get a better idea of what could be causing the trigger for this problem between those firmware versions. From what I see, there is nothing in the changelogs that *should* have an effect on this. But something obviously is. Maybe it's not a bug, but it's still something that I need to identify.