Contemplating Web Analytics

Posted on 28th of March 2021 | 1125 words

I started to rekindle my, unfortunately, lost writing habit a couple of weeks ago. I set up Google Analytics for this page mainly due to its easy use to see simple analytics. I was only interested in visitor count and possibly where my readers’ were coming from. Google Analytics is a massive tool with massive amounts of data going into it. I tried to restrict this collection as much as possible, which suits my personal blog’s needs.

Then my page rose to the front page of Hacker News, and it started to get a lot of traction. Suddenly, thousands of readers came every day to my pesky little page with just a few posts as I followed the visitor counts rising in my Google Analytics view. That got me thinking about the ethics of this kind of tracking. Which then ended up with me deleting my account and data from it.

Discomfort With Tracking

Before I deleted my data and account from Google Analytics, I looked for alternatives. I stumbled upon many other privacy-oriented and GDPR-compliant analytics platforms, which at first seemed promising. Also, having good options for ever-prevalent Google Analytics is a great thing. But despite these features, they don’t remove the uneasiness mining your users’ data causes. Of course, we are talking about spying here. Thankfully there are now some restrictions regarding personally identifiable information (PII), at least in the GDPR, limiting the shadiness quite a lot. But that brings new issues in handling this kind of information since you need to be sure that your software doesn’t leak this information. Thankfully, opting out entirely from collecting PII in your software is an option.

I understand why people might want to add at least simplistic tracking to their sites since it can provide helpful information about your content, companies can see how users use their site, and the list goes on. Especially when you combine Google Analytics, or similar analytics tool, with ads, companies can reap significant benefits from this kind of tracking. But 9 of 10 sites shouldn’t need this. You could argue that most administrators use this tracking only for dopamine fixes and don’t utilize the tracked data. Even though they might use it somehow, how do they inform the user? I dare to say that information about data usage is almost always written in some shallow boilerplate text or in no way at all.

GDPR highlights mainly four things about data usage:

It gives EU citizens the final say on how their data is used. If your company handles PIIs, there are tighter restrictions on handling these. Companies can store/use data only if the person consents to it. User has rights to their data.

Consent is the crucial part here since many sites lack on this front. There has been a lot of discussion about what should be considered consent. GDPR Art. 6.1(f) says that “processing is necessary for the legitimate interests pursued by the controller or by a third party”. Now legitimate interest is relatively shallow, and quite a few authorities in Germany, for example, consider that third-party analytics do not fall under “legitimate interest” . You can utilize consent management platforms to ensure the user’s consent before dropping the tracking code on your page. But this then raises the question of what can be considered consent.

Drew DeVault wrote a great post about web analytics and informed consent . Informed consent is a principle from healthcare, but it still can offer significant elements to be utilized, especially in technology and privacy. Drew split up the essential elements of informed consent in tracking to these three points:

Disclosure of the nature and purpose of the research and its implications (risks and benefits) for the participant and the confidentiality of the collected information. An adequate understanding of these facts on the part of the participant, requiring an accessible explanation in lay terms and an assessment of understanding. The participant must exercise voluntary agreement, without coercion or fear of repercussions (e.g. not being allowed to use your website).

Considering these essential elements of informed consent, we agree that most tracking sites don’t follow these guidelines.

Thankfully trivial tracker blocking is supported already in many browsers, which makes this issue slightly more bearable, and also, you’re able to download external tools to do it. But still, this kind of approach is pretty upside down.

All Kinds of Cookies

Unfortunately, ad-tech companies have tried to make blocking these harder and harder by constantly evolving these cookies to evercookies, supercookies, etc. The way these have worked is that trackers have stored these harder-to-detect and delete cookies in different obscure places in the browser, like Flash storage or HSTS flags. Evercookies were a big thing in early 2010 since many sites were using Flash and Silverlight, and those were very exploitable. Today those technologies aren’t used anymore, but that doesn’t mean the evolution of cookies has stopped. On the other hand, Supercookies work on the network level of your service provider.

Thankfully lately, for example, Firefox has been able to start tackling these . In that post, the Firefox team discloses what they had to do to take some action against this, and it is wild. First, they had to re-architect the whole connection handling in the browser, which was first made to increase user experience by reducing overhead to eliminate these pesky cache-based cookies.

Still, browser fingerprinting could be considered the evilest cookie of them all. Browser fingerprinting identifies everything it can from your system. Like some cookies, this has real use cases, e.g., preventing fraud in financial institutions. Still, principally this is just another intrusive way to track people. Thankfully some modern browsers offer at least some ways to avoid this, but not a full-fledged solution (other than disposable systems).

Future of Cookies

Lately, there has been some news about privacy-friendly substitutes to cookies by tech giants. Cookies have been a relatively significant issue privacy-wise for decades, and since the ad industry is so large, finding a replacement for these has been hard. So only time will tell. We cannot get rid of cookies entirely in the near future. They might change into something else, maybe this kind of API utilizing machine learning to analyze user data. Which I don’t know is better or worse. So cannot wait! tin-foil hat tightens

Conclusion

So what is the conclusion here? Probably nothing. Recently started small-time blogger just got scared from big numbers coming into his site collecting all kinds of data which ended up with him stopping this kind of action, at least on his site. Since for most users/sites, this kind of tracking is just a silly monkey-get-banana dopamine fix.

Don’t track unless you need to; if you do, inform it thoroughly.