Contemplating web analytics
28th of March, 2021
I started to rekindle my, unfortunately, lost writing habit a couple of weeks ago. I set up Google Analytics for this page mainly due to its easy use to see simple analytics. I was only interested in visitor count, and possibly my readers' coming from. Google Analytics is a massive tool with massive amounts of data going into it. I tried to restrict this collection as much as possible, which suits my personal blog's needs.
Then my page rose to the front page of Hacker News, and it started to get a lot of traction. Suddenly there were thousands of readers coming every day to my pesky little page with just a few posts as I was following the visitor counts rising in my Google Analytics view. That got me thinking about the ethics of this kind of tracking. Which then ended up me deleting my account and data from it.
Discomfort of tracking
Before I deleted my data and account from Google Analytics, I first looked for alternatives. I stumbled upon many other privacy-oriented and GDPR-compliant analytics platforms, which at first seemed promising. Also, having good options for ever prevalent Google Analytics is a great thing. But despite these features, they don't remove the uneasiness what mining your users' data causes. Of course, we are talking about spying here. Thankfully there are now some restrictions regarding personally identifiable information (PII), at least in the GDPR, limiting the shadiness quite a lot. But that brings new issues in handling this kind of information since you need to be sure that your software doesn't leak this information. Thankfully, opting out entirely from collecting PII in your software is an option.
I understand why people might want to add at least simplistic tracking to their sites since it can provide helpful information about your content, and companies can see how users use their site, and the list goes on. Especially when you combine Google Analytics, or similar analytics tool, to ads, companies can reap significant benefits from this kind of tracking. But 9 of 10 sites shouldn't need this. You could argue that most administrators use this tracking only for dopamine fix and don't utilize the tracked data. Even though they might use it somehow, how do they inform the user? I dare to say that information about data usage is almost always written in some shallow boilerplate text or no way at all.
GDPR highlights mainly four things about data usage:
It enables EU citizens to have the final say on how their data is used. If your company handles PIIs, there are tighter restrictions on how these can be handled. Companies can store/use data only if the person consents to it. User has rights to their data.
Consent is the crucial part here since many sites lack on this front. There has been a lot of discussion about what should be considered consent. GDPR Art. 6.1(f) says that "processing is necessary for the legitimate interests pursued by the controller or by a third party". Now legitimate interest is relatively shallow and quite a few authorities in Germany, for example, consider that third-party analytics do not fall under "legitimate interest". You can utilize consent management platforms to ensure you have user's consent before you drop the tracking code in your page. But this then raises the question of what can be considered consent.
Drew DeVault wrote a great post about web analytics and informed consent. Informed consent is a principle from healthcare, but it still can offer significant elements to be utilized, especially in technology and privacy. Drew split up the essential elements of informed consent in tracking to these three points:
Disclosure of the nature and purpose of the research and its implications (risks and benefits) for the participant and the confidentiality of the collected information. An adequate understanding of these facts on the part of the participant, requiring an accessible explanation in lay terms and an assessment of understanding. The participant must exercise voluntary agreement, without coercion or fear of repercussions (e.g. not being allowed to use your website).
Considering these essential elements of informed consent, we agree that most tracking sites don't follow these guidelines.
Thankfully trivial tracker blocking is supported already in many browsers, which makes this issue slightly more bearable, and also, you're able to download external tools to do it. But still, this kind of approach is pretty upside down.
All kinds of cookies
Unfortunately, ad-tech companies have tried to make blocking these harder and harder by constantly evolving these cookies to evercookies, supercookies, etc. The way these have worked is that trackers have stored these harder to detect and delete cookies in different obscure places in the browser, like Flash storage or HSTS flags. Evercookies were a big thing in early 2010 since many sites were using Flash and Silverlight, and those were very exploitable. Today those technologies aren't used anymore, but that doesn't mean the evolution of cookies has stopped. On the other hand, Supercookies work on the network level of your service provider.
Thankfully lately, for example, Firefox has been able to start tackling these. In that post, the Firefox team discloses what they had to do to take some actions against this, and it is wild. First, they had to re-architect the whole connection handling in the browser, which was first made to increase user experience by reducing overhead to eliminate these pesky cache-based cookies.
Still, browser fingerprinting could be considered the evilest cookie of them all. Browser fingerprinting identifies everything it can from your system. Like some cookies, this has real use cases, e.g., preventing fraud in financial institutions. Still, principally this is just another intrusive way to track people. Thankfully some modern browsers offer at least some ways to avoid this, but not a full-fledged solution (other than disposable systems).
Future of cookies
Lately, there has been some news about privacy-friendly substitutes to cookies by tech giants. Cookies have been a relatively significant issue privacy-wise for decades, and since the ad industry is so large, finding a replacement for these have been hard. So only time will tell. We cannot get rid of cookies entirely in the near future. They might change into something else, maybe this kind of API utilizing machine learning analyzing user data. Which I don't know is better or worse. So cannot wait! tin-foil hat tightens
So what is the conclusion here? Probably nothing. Recently started small-time blogger just got scared from big numbers coming into his site collecting all kinds of data which ended up him stopping this kind of action at least in his site. Since for most users/sites, this kind of tracking is just a silly monkey-get-banana dopamine fix.
Don't track unless you need to and if you do, inform it thoroughly.
If you have any questions or suggestions, write to topi at topikettunen dot com.