I need an help troubleshooting some problems I'm having in the last period with the hotspot/web-proxy modules.
I'm using a CCR1036-8G-2S+ in the school where I work as the main router and firewall, and for many of the vlans that my network have I'm using the hotspot module to have the users authenticate to a RADIUS server using the captive portal.
I've to main configuration: one used for the vlans of the students wifi, the other for the PC laboratories.
In both the configurations I'm using the HTTP Pap and HTTPS authentication method with a certificate generated using letsencrypt that contain the dns name I've set in the Server Profile.
I've used this setup without problems for many years, but since the return from the summer break I'm having many problems with students that is not redirected to the captive portal page, and many of them can't see that page even if they manually insert the dns name of the hotspot (the CCR randomly doesn't answer to the user, reaching the page load timeout of the device).
Using the profiler I've noticed that the total CPUs usage of the CCR have increased with roughly the same amount of connected users to the captive portal that I had before the summer break, with some of the core completely used by the www module in the peak hours. I had some situations in the previous week where I reached the 60-70% of the total cpus usage, and an average of 30% in the other days during the peak hours.
The only thing I've changed from before the summer break is probably the packages version that I've upgraded, probably from a 6.47.x version to the 6.48.4 version.
I've started digging around the web proxy module logging in an external syslog and I've noticed that many of the request is done to the new web site that host the CA of the letsencrypt certificate (starting from some month ago letsencrypt have started using an autonomous CA that aren't present as a root certificate in the various operating system), so I've tried to put the sites used by this CA in the walled garden of the hotspot, hoping to reduce the web-proxy usage, but sadly I've discovered that even those request are handled by the web-proxy in that situation, so the usage of the proxy haven't changed, and in the peak hours the captive portal doesn't show up to some random device.
I would ask an help troubleshooting a bit more the situation and understand why this is happening. Are the internal web-proxy unable to manage the amount of HTTPS request to handle? Is that because without the letsencrypt root certificate there's more request from the devices? Or that can be an issue with the new firmware that's related to the web-proxy resource usage?
Do you suggest to switch to the LTS channel where there's the 6.47.x version of the firmware? (I've taken a look at the changelogs and I haven't seen any big change related to the hotspot/web-proxy module in the 6.48.x)
Here's some part of my CCR configuration that can be helpful to understand the situation.
Let me know if you need some other informations.
I hope someone can help me, or can suggest me how to troubleshoot the situation.
Code: Select all
/ip hotspot profile set [ find default=yes ] dns-name=router.osdb.it login-by=https,http-pap \ radius-accounting=no ssl-certificate=router.osdb.it.cer_0 use-radius=yes add dns-name=router.osdb.it https-redirect=no login-by=https,http-pap name=\ laboratori radius-accounting=no ssl-certificate=router.osdb.it.cer_0 \ use-radius=yes /ip hotspot add addresses-per-mac=unlimited disabled=no idle-timeout=10m interface=\ "lab_casper (70)" name=lab_casper profile=laboratori add addresses-per-mac=unlimited disabled=no idle-timeout=3d interface=\ "lab_cnc (40)" name=lab_cnc profile=laboratori ........ add disabled=no idle-timeout=3d interface="wifi_1aa (310)" name=wifi_1aa add disabled=no idle-timeout=3d interface="wifi_1as (311)" name=wifi_1as ....... /ip hotspot walled-garden add comment="place hotspot rules here" disabled=yes add dst-host=*.lencr.org add dst-host=apps.identrust.com
Thanks to all