Hello! Long time no see - it's been a while since I was here now - but today has been my strangest MikroTik day in some years - so I need to ask you guys for some tips.
I had a major failure at my data center today - it's actually the 3rd time in a year now this is happening - so I was rebuilding my core network a bit. Earlier this Sunday morning (it's always Sundays - right?) the core router connecting to my transit provider went offline, and I needed to take a trip to the Orange data center where I have my equipment installed. When I arrived, the core router was in a boot hang mode - and was rebooting and rebooting. This is the 3rd time this year - and last time I was suspecting a console cable to make this happen - but I was not sure. I have an original blue Cisco DB9 (RS232) to RJ45 cable connected directly from my CCR1036 to a Cisco switch to have a backdoor for login. When I arrived, I tried to take power in and out - without success 4-5 times. I then wanted to connect the CCR to my computer using USB to serial cable - but at once I plugged out the blue Cisco cable from the CCR - it started normally. This is for me a weird behavior - so I tried to plug the cable back in - but everything seems fine and there was no problem when it already was rebooted.
My plan has been for a while to have VRRP at the transit - and it has been ready config for this from the transit provider for a while. So I had a spare CCR (CCR1009) and mounted this for my second internet line. This was fine. I then thought I should test the console cable and plugged it also into this router - but everything seems fine - until I upgraded the software and rebooted the router. Then the same thing was happening to this router. It just hangs in the boot - times out after some minutes and tries to boot again - without success. Ok - Strange - might it be ROS upgrade? I tested a bit more - and it seems like every time I tried to reboot the router - it just hangs before even loading the kernel. For me, this is a strange behavior - but I know there is a boot menu and so on when connecting using a console cable - and push the right buttons in a boot - but should the Cisco do this by itself? Anyone having any clue to this? - Anyhow - I don't need the backdoor to the Cisco switch - so I just plugged the cable out - and had no problems with this after the job. Now it was time to configure VRRP. No problem - it went up like a charm and worked as planned.
Almost - After this job - almost all my WireGuard tunnels stopped working. I had 13 tunnels to different stuff. And 11 of these tunnels were down. The tunnel to my home was ok - and one more going through an LTE device. The 10 others were not working. I was doing all stuff I know to trying to figure out what's wrong here - but nothing helped, and I didn't know why 2 of the tunnels were online. After some research, I figured out that my line - that was working was the only one with a public IP - All the other ones were behind NAT. The 2 devices that were working - had an SSTP tunnel since the provider of the line was blocking all traffic except port 443. So it had a WireGuard going via an SSTP tunnel - And it was working.
Now I have kind of broken the problem down a bit - it seems that public IP - public IP works well - and direct IP-IP in SSTP - but at once as the traffic comes from a NATed device - it would not go up. One of the NATs is from AWS and a virtual router here - allowing all traffic to come without any firewall to come - anyway - the problem is also here.
As a quick fix- and to have the lines up (I was running EoIP via the WireGuard tunnels) I quickly changed all lines to SSTP and EoIP directly here - but due to the SSTP overhead - I want to go back to my WireGuard tunnels.
All the tunnels were working before the change to VRRP - and there are actually no other changes to the setup. Public/private key is the same. IP addresses, port numbers, etc., are the same - the only different thing is the link IP now using VRRP but the same router as the main router - the new one in backup state. The VRRP IP is just a link net IP between my transit provider and me and is a gateway for my subnets with public IP's. I have one IP from my own net to the router itself that is the IP that the tunnel is connecting to - I don't even use the VRRP link IP - but one routed behind this. All other traffic behind the router behaves normal - and there is nothing that has changed.
In the peers window at WireGuard - I see endpoint address from the client - so there is at least some traffic going here. The error I get is the Handshake for peer did not complete after 5 seconds, retrying message - so there is something about the handshake not working. - but why does it work public IP - public IP, SSTP - SSTP - but not Public to - NAT'ed? Anyone having any clues here? I can provide more config - but I don't think this will help here - as my config was working earlier - and I have done no change to this.
I tried a debug WireGuard logging - but there was really no more info there.
Sorry for my long post about a lot of stuff that I don't need here - but I needed to get some frustration out