But as I keep saying, I want some actual information on this. Not just 'it should work'
HOW does it work? I would like information on how all devices detect hotspot in the first place. Not just a brief overview of "they try and connect to a site and if it fails it'll show you the login page" that doesn't tell me anything. I want to know what sites are attempted to be accessed, does it do a http request? DNS? ping?
When the device connects to the network, or the browser starts, it does an HTTP request (not HTTPS) to download a specific file from the vendor website. The file is a special file only used for detecting captive portal. Different vendors use different files (URLs) in different versions sometimes. If it gets the expected file from the vendor website, it knows there is no captive portal and doesn't bother to present a login page. If the file from the vendor website is not as expected (different contents) or the HTTP response doesn't match (generally it expects 204 I believe) then the device knows there is a captive portal.
And how does does it know the hotspot login page to send the user to?
Lets say the hotspot is hosted at 220.127.116.11, how does it know to send the user to 18.104.22.168? This is handled behind the scenes by the MikroTik router but I want to know this information to properly troubleshoot and possibly improve it as much as possible
The MikroTik hotspot takes any HTTP request from an unauthenticated user destined for the Internet and replies instead with a HTTP 302 (redirect) to go to http://22.214.171.124/login
. Since the device is using HTTP (and not HTTPS) to request the captive portal detection page from the browser or device vendor site (ex. www.msftncsi.com
or clients3.google.com), when it makes this request to www.msftncsi.com
, it not only receives the wrong data and wrong code, so that it knows there is a captive portal, but it also receives the MikroTik's inserted reply of HTTP 302 ordering it to redirect to http://126.96.36.199/login
. The device knows since it got this 302 redirect it was not expecting that that must be the captive portal address and pops up a browser window and goes to the URL it received from the HTTP 302 request for the user to complete the captive portal login process.
So what if IPv6 is also enabled but hotspot isn't handling IPv6? Will that mean no hotspot notification, but sites that are on IPv6 work fine but when a user tries a IPv4 site it silently fails (never got a notification to login)
Yes, that is correct. If the hotspot has an IPv6 address and the customer gets IPv6 from the hotspot device, and firewall rules allow them to browse, they will be able to get online to any IPv6 sites but the captive portal detection will not work so they will not be prompted to log in and any v4 sites would be broken. If a user cannot get an IPv6 address connected to the hotspot you are safe - the device itself can have IPv6 as long as the interface your hotspot users are on does not have v6. If you really need IPv6 on your hotspot for some reason, it must be blocked until the user authenticates, and only then should they be allowed through IPv6. Hotspot doesn't do anything with IPv6 out of the box, so you would need to script this somehow with the IPv6 firewall and lookup their v6 address through neighbor discovery or something. If it even works it is a lot of work and so it would not be worth doing unless you really really need IPv6.
With the default hotspot configuration, the captive portal detection works properly for all devices, all vendors. What breaks it is if you start to change the configuration to poke holes through (with walled garden entries). For instance, suppose you want your captive portal login to be all flashy and professional looking and get fonts from google or something. So you create a walled garden entry to allow fonts.gstatic.com - great, your login page now has nice fonts. But if fonts.gstatic.com runs on the same IP address as clients3.google.com, now by allowing the fonts through you have broken the captive portal detection for all devices that use clients3.google.com. Even worse is that you can't just rely on the fact that for you fonts.gstatic.com always resolves to a different IP than clients3.google.com it doesn't mean it will be the case for all users, since there may be DNS round robin involved and content distribution networks that sometimes change the results, so those two sites may be on the same IP or on different IPs at different times of the day or for different people in different places.
In order to have reliable captive portal detection that just works and works all the time, suppose you have Windows users. You need to make sure that your captive portal does not have walled garden entries for any Microsoft-owned IP addresses, and also that it doesn't have walled garden entries for any CDN-delivered hosts that may possibly also hold Microsoft content and may hold a copy of the captive portal detection page. Any IP that is owned by Microsoft or related in any way to Microsoft could also be used to serve up a copy of the Microsoft captive portal detection page and if you have a walled garden entry to allow that through, whether it is a video or some fonts or a map or the weather, the walled garden entry will also allow the captive portal detection page through without modification and the end user device will incorrectly detect there is no captive portal and fail to display the login.
The same goes for Android based systems or Chrome browser or Chromium and Google. If you have any rule to allow Google maps through to display a map on the login page, or a rule to use Google fonts on the login page, or a rule to allow displaying the weather on a login page, or a rule to display a YouTube video on the login page, that rule is at risk to also allow through the unmodified captive portal detection page for Android or Chrome or Chromium users causing their captive portal detection to be broken and resulting in a failure to be redirected to the login screen.
So what can you do to pull external content in a safe way that will not break captive portal detection? The best way is to store all such content on a server controlled by you, and let that server and only that server through with a walled garden entry. If you are pulling the content from your own server, you should know that Microsoft is not also storing things on there and that Google is not storing things on there, and so you can be sure this will not break captive portal detection for any possible browsers or operating systems. You can even potentially use that as a reverse proxy to pull content from vendors like Google in a way that doesn't break the captive portal detection. Then you have only a single address allowed via walled garden.
For some people this may be too restrictive. If you would rather open some things (ex. display google fonts, youtube videos, etc.) for your login users, a workaround is to give them a website to go to that runs HTTP when they connect. For instance, the hotel desk clerk can tell the customer "connect to wifi and go to hotelconnect.com" or something, and you either register that domain or set up an internal DNS entry and set up a server there. The server itself can be almost empty, it really doesn't matter, but you would make sure it only runs http and not https and responds with some kind of page on http port 80 and then users hitting it would get redirected to the login page successfully even if captive portal detection is broken. You might even be able to create a local dns record pointing at the mikrotik itself with the name you want (ex. hotelconnect.com), I have not tried this.