This is not possible. The entire goal of using https is that you cannot see what the user is doing by
observing the traffic along the path to destination.
Or, better yet, create your own CA and add it to all the PCs on your network.
Then sign the certificates you need (ie: iaselfserve.gov.bc.ca) and set up an haproxy on a linux box.
On the router, redirect everything destined to the IPs that iaselfserve.gov.bc.ca resolves to and to port 443 TCP, to your haproxy instance.
On Haproxy you can then check the URLs and block whichever URL under that domain you don’t want your users to access.
It will listen to port 443 for the https requests (configured with the certificate/private key for the domain you signed under your CA), and it will then proxy those requests (those allowed) to the real server using their real certificate.
Anyone checking the SSL certificate of course will be able to see that it is not issued by a legit CA, but at least they will get no warnings (as long as you install the CA cert on all the PC’s browsers).
I know, it sounds like a total mess (and it is) but since you have to deal with https, there isn’t a way to decrypt the traffic without having the private key. This way, you fool the browsers that you are the site they are trying to access.
Adding to the hosts file edit solution, that will not do you much good.
You will be able to alter the IP that a domain will resolve to but that’s just about it.
You can’t define specific urls on the hosts file.
And even if you define an IP of a box of yours to be able to intercept the traffic and block what you need, without a ‘valid’ certificate for that domain, the browsers will throw warnings to the users.
Either you have to use self signed certs and accept/install them manually on every PC, or create a CA as mentioned above.
But be warned that the above solution will also stop working soon, because the browser programmers are
actively working on a solution to check the signing of the certificates involved.
(i.e. they will warn the user or block the page when a certificate is signed by another CA than is normal for that domain)
I think the easiest thing to do, long term, if you want to control what sites users may browse to is to set up a proxy server and force all workstations to configure the proxy server in the browsers.
When the browser knows it’s running through a proxy, it alleviates a lot of this man-in-the-middle certificate stuff. The browser must ask the proxy to take it to such-and-such a url, and the proxy gets a chance to reject the request and deliver an “access denied” page. If the URL is acceptable, then the proxy will proceed with the connection and do the SSL passthrough.
Proxies have an added bonus that they can cache content, which accelerates the web browsing performance at your location if users frequently visit the same sites repeatedly, and it will reduce the load on your internet connection.
Obviously, when you have the proxy up and running, and the browsers configured to use it, you’ll want to block outbound http/https from any internal host other than the proxy and any “privileged” workstations.
Ok but a proxy for https does not know what URL the user is visiting, it knows only the host name.
Only when the proxy is doing man-in-the-middle decryption it can know the URL.
And man-in-the-middle decryption can only be done using fake certificates.
As mentioned before, if you control the computers your users use to access the Internet you can pretty much use certificates self-signed, it would be basically a man-in-the-middle attack, be aware of this, I am quite sure that in some places this is ilegal as you’re pretty much intercepting and decrypting secure data.
You only need a CA and a Web Proxy/Cache working in transparent mode, some free web proxies like SQUID3 are now capable of generating “on-the-fly” certificates so as long the device got the CA on its trusted root, no alarms will trigger (for instance due to “Common Name” mismatch, or not trusted CA), as a matter of fact this is quite easy if you also got all the devices inside an Active Directory domain, since you can place the CA in the “trusted root” of all the computers via a GPO.
Some other proxy options (paid) that include this are: PaloAlto, Dell SonicWall, Sofos UTM (although they have a free version that protects up to 50 IPs) and in general anything that’s catalogued as a NGFW (Next Generation Firewall)
This is another option, that’s basically using the proxy in non-transparent mode, however it was my understanding that when in presence of HTTPs on this mode the proxy will only tunnel the requests and will just block absolute FQDNs based on IPs (from DNS records), can you clarify this please?
Hmm - interesting. Noted. Thanks for catching that for me.
It would seem that an explicitly configured proxy should require this information before performing the ssl passthrough - it makes a lot of sense for network administration purposes that such control should be possible without a lot of voodoo, as policy enforcement is one of the major reasons for having a proxy in the first place.
It appears that you can configure squid proxy with a feature called “SSL Bump” which does a MITM to gain access to the content itself.
Here’s a howto - but it looks pretty involved, and utilizes a second package called Diladele which manages the filtering rules. (the site is tailored to educational institutions). https://www.howtoforge.com/filtering-https-traffic-with-squid
I’m sure there’s an easier way to get this accomplished with more googling, but I was just curios enough to take a peek into the squid world to see what sort of things were available in this endeavor.
You’re correct and I was misinformed. (you posted while I was writing this post)
However, in browsers like Chrome an alarm will trigger that says “hey, certificates for google.com are normally signed by GeoTrust Global CA not by Yourcompany CA.
Be careful, you are probably a victim of a man-in-the-middle attack”.
When proxies were invented, the normal situation was to view the web using http and to use https only for “secure” things.
Unfortunately, a group has gotten its way who believed that “everything” should be treated as secure and even the most visible information has to be encrypted.
The result (and their motivation) is loss of control by middlemen like ISPs and nosey company ICT departments.
If it is a good or bad development, to each their own opinion.
A proxy in http mode receives the full URL so it can cache the contents based on that.
In https mode, the client only tells the proxy what host and port it wants to connect (CONNECT host.name.domain:port), the proxy makes that connection
and ties both connections together. Then the client starts the SSL negiotiation with the connected service. Once that is complete, it sends the
requested URL to the service over the encrypted connection. The proxy never knows what is being done, unless it performs man-in-the-middle
decryption and re-encryption, something of course never foreseen in the proxy system.
Yeah - but how simple would it have been to require CONNECT https://secure.example.com/account/login ?
I can see how that would expose the GET variables to scrutiny, but it still seems odd to me that a security-related construct such as proxies weren’t implemented in such a manner for full control. Oh well, I guess those smart people back in the 70s and 80s can be forgiven for not thinking of this.
Me, I tend to fall on the side of “Death to middleboxes! Vive la resistance!” but that’s just me.
I see, thanks for this information, I haven’t tried myself with chrome on this particular enviroment, most of the time we only use Firefox and Opera. Gotta try with Chrome though, just like with hotspots, HTTPS is messing everything up haha.
The proxy would not be able to verify that the client is indeed requesting the URL that it sent as part of the CONNECT string, so
it would not take long for clients to appear that send another URL to the proxy than they later fetch over https… it would be similar
to sending a fake Referrer:, User-Agent: or Via: header.
I don’t think they thought the same way about security back then. For example, this old Why An Application Level Proxy? document does mention possibility to use proxy for filtering. But does it sound like a thought of filtering encrypted connections came through author’s mind? Or even the filtering being something important? Not to me.