L2TP VPN HP iLo 5 and 4 Ceases to Work After Roughly 1 Minute

Hi Everyone,

I have CCR2116-12G-4+ that I connect to via L2TP to be able administer local resources like HP iLo v5 and v4 and Esxi etc. I discovered that I was unable to access the iLo web UI to check some of the server settings. There are 2 servers, one running iLo v4 and the other iLo v5. iLo v5s IP is 192.168.10.230 and iLo v4s is 192.168.10.7.

If I reboot each instance of iLo and then after each and every reboot, I can connect to iLo through the VPN for about a minute and then it times out and you can’t access it again. I can still access the iLo interface from the local LAN, but not over the VPN. I can ping each instance over the VPN with no issue, but anything to do with the Web UI doesn’t work.

Coincidentally, I can access both instances of Esxi without issue and it never times out or has any problems whatsoever. One Esxi server is on 192.168.10.9 and the other is 192.168.10.231.

Has anyone experienced anything like this previously? How did you fix it? I can’t understand why one set of admin UIs work without problems over the VPN yet the other works only for a short period after a forced reboot of the iLo service.

Thanks

Duke

My advice is “sniff when it works and sniff when it doesn’t”, as it is totally unclear whether it is an issue of the iLO itself, of the connection tracking in the Mikrotik, or something else.

So I’d set /tool sniffer set file-name=iLO.pcap on the 2116, then run /tool sniffer quick ip-address=192.168.10.7 interface=the-bridge-to-which-the-iLO-is-connected, reboot the iLO, connect to it successfully, wait until it becomes unresponsive, try a new connection, wait until it times out, then stop the /tool sniffer …, download the .pcap and use Wireshark to see what happened. Further steps depend on the result - if there are no SYN packets for the 2nd connection attempt, it is something in the Mikrotik, otherwise it is something in the iLO.

Thanks Sindy,

I’ve done a you’ve suggested, but am not sure what to make of the output. I ran the Sniff at the remote location which is where the two iLo instances are. In both instance, Not Working and Working, I see alot of TCP Retransmission entries. I’m not sure if this is an error or informational. See attached screenshots to get a clearer picture.

IPs are as follows:
192.168.10.7 > Actual iLo interface
10.100.100.253 > Route to my Mikrotik in my office

Does this point a to any issue that you’re aware of?

Thanks for your time in helping me with this…

Duke


Regarding the retransmissions, the timestamps and the mutual order of the packets strongly suggest that you haven’t followed my recommendation to limit the sniffing to a single interface, so every packet made it to the sniff file three times - presumably from the L2TP interface, from the bridge interface, and from the Ethernet port of the bridge. Unfortunately, Mikrotik still keeps using the .pcap format rather than the .pcapng one, so the information about the interface at which the packet has been sniffed is not available in the file.

Regarding the actual contents, the difference between the two sniffs is just that in the Working one, the iLO responds with TCP reset packets to the SYN ones it receives, whereas in the NotWorking one, it is just silent. This is actually the case also for the SYNs from port 55781 in Working. But there is no successful session establishment even in Working - have you indeed connected to iLO’s web GUI during that attempt successfully, and if yes, have you started sniffing already before taking the first connection attempt?

It is also strange that the client sends the SYN packets from ports 55779 and 55780 again after the server has already sent the RST - it suggests that the client hasn’t received those RST.

So all in all there seem to be multiple issues. I suspect that the iLO has some access list preventing it from accepting management access from 10.100.100.253, is this configurable? If not, I’d suggest to use a src-nat (or masquerade) rule on the Mikrotik, so that the iLO would see the client requests coming via L2TP as being sent by the Mikrotik itself. I can imagine a bug making the access list start working only some time after the reboot, which would explain why it works for a short while in the current state.

Hi Sindy,

Apologies, I’ve redone the Packet Sniff and only used the L2TP connection as the interface. The results are quite stark, but my experience with evaluating packets is very limited. I’ve uploaded the pcap files in a zip file here if you want to take a look:

https://cutt.ly/b04TDiQ

There are no access list both in the Mikrotik Router or iLo itself preventing a connection and as outlined, the connection can be made successfully after an iLo reboot so I don’t think an access list is in play here. I’ve changed the L2TP Tunnel IPs to 10.22.22.0/24 to see if that made any difference, but it didn’t. I got the same issue after about a minute post reboot of the iLo interface.

Your suggestion of creating a src-nat (or masquerade) rule on the Mikrotik to make it seem like the L2TP traffic is coming by the Mikrotik itself, I’m not sure what you mean. Can you please give me an example?

Sorry to be a pain in the ‘A’. I really appreciate your input.

Thanks

Duke

OK, this time the Working one indeed shows some normal HTTPS conversations.

What attracts attention is that the client sometimes terminates the TLS handshake with

Alert Message
    Level: Fatal (2)
    Description: Certificate Unknown (46)

So I can imagine that the server gets fed up with these messages after a while and gives that client a ban. Maybe the actual difference between “working” and “not working” scenario is not the VPN but the contents of the certificate store at the client PC? Can you connect the laptop you normally use via L2TP directly to the LAN and try again, or vice versa, can you connect one of the PCs that work via L2TP rather than directly?


/ip firewall nat print where chain=srcnat !dynamic
/ip firewall nat add chain=srcnat place-before=0 src-address=10.22.22.253 dst-address=192.168.10.7 action=masquerade

Hi Sindy,

You’re a God! The masquerade rule did he trick. It still doesn’t make sense why this happened and I have a support request in with HP to see if they have seen something like this and whether it’s a known issue.

Also, before applying the rule, I logged onto another computer on a totally different network and created the VPN directly from a Windows 10 PC as you suggested. The same issue was present, so not isolated to the computer I’m using.

I’ll report back when I hear from HP and whether they have a work around in case someone else experiences this issue in the future.

Once again, thanks for taking the time to help me, it’s most appreciated.

Duke

:slight_smile: Všetko najlepšie do nového rocku :slight_smile:

My Croatian is really bad! :laughing: :open_mouth:

Hope you have a great New Year too!! Once again, thanks for all the advice…

I know this is and old thead, but I hope this would help someone who could get here with Google search as I did.
I had absolutely the same issue and decided to troubleshoot this properly by taking traces on the Mikrotik side when the problem starts.

Long story short, the culprit is iLO itself. For reasons unknown to me, it honors ICMP Type 9 Messages (IPv4 Mobile IP Router Advert) and changes statically configured default gateway to the value found in RA. In my case the source of RAs was Cisco SG switch in L3 mode that had its default gateway configured incorrectly and was unable to forward packets from iLO when being used as default router.

You can find mode details and trace in my blog post here.