19th of January, 2022
Some time ago I wrote a short post about my my feelings towards web analytics
which were sparked due to a spike in visitors on my site (mainly coming from
Hacker News). Due to that surge, I decided to part ways completely from any
sort of tracking, since for me it was mainly a unnecessary dopamine fix rather
than anything useful.
Today I stumbled upon big news on the front of legitimacy of web analytics
from the point of view of privacy. Turns out, as most suspected, it's not so
good, at least according to Austria's data protection authority.
Basically this case dates back to invalidation of Privacy Shield data sharing
system between the EU and the US, because of overreaching US
surveillance. Turns out that many companies in US have largely ignored this
invalidation, which happened in 2020, and despite this they have still
continued to transfer data from EU to US. The Austrian DPA held that the use
of Google Analytics by an Austrian website provider led to transfers of
personal data to Google LLC in the U.S. in violation of Chapter V. of the
GDPR.
Future of Google Analytics in EU
In the long run, there will be two options: Either the US changes its
surveillance laws to strengthen their tech businesses, or US providers will
have to host data of European users in Europe. This kind of transcontinental
transfer is currently (as the time of writing this) only illegal Austria, but
Dutch's DPA (data protection authority) has stated that Google Analytics "may
may soon no longer be allowed".
Any case, this is great thing for privacy in EU and hopefully many more
countries would join Austria in this effort. You can follow what countries
have started to follow this at Is Google Analytics ILLEGAL in your country?
28th of March, 2021
I started to rekindle my, unfortunately, lost writing habit a couple of weeks
ago. I set up Google Analytics for this page mainly due to its easy use to see
simple analytics. I was only interested in visitor count, and possibly my
readers' coming from. Google Analytics is a massive tool with massive amounts of
data going into it. I tried to restrict this collection as much as possible,
which suits my personal blog's needs.
Then my page rose to the front page of Hacker News, and it started to get a lot
of traction. Suddenly there were thousands of readers coming every day to my
pesky little page with just a few posts as I was following the visitor counts
rising in my Google Analytics view. That got me thinking about the ethics of
this kind of tracking. Which then ended up me deleting my account and data from
it.
Discomfort of tracking
Before I deleted my data and account from Google Analytics, I first looked for
alternatives. I stumbled upon many other privacy-oriented and GDPR-compliant
analytics platforms, which at first seemed promising. Also, having good options
for ever prevalent Google Analytics is a great thing. But despite these
features, they don't remove the uneasiness what mining your users' data
causes. Of course, we are talking about spying here. Thankfully there are now
some restrictions regarding personally identifiable information (PII), at least
in the GDPR, limiting the shadiness quite a lot. But that brings new issues in
handling this kind of information since you need to be sure that your software
doesn't leak this information. Thankfully, opting out entirely from collecting
PII in your software is an option.
I understand why people might want to add at least simplistic tracking to their
sites since it can provide helpful information about your content, and companies
can see how users use their site, and the list goes on. Especially when you
combine Google Analytics, or similar analytics tool, to ads, companies can reap
significant benefits from this kind of tracking. But 9 of 10 sites shouldn't
need this. You could argue that most administrators use this tracking only for
dopamine fix and don't utilize the tracked data. Even though they might use it
somehow, how do they inform the user? I dare to say that information about data
usage is almost always written in some shallow boilerplate text or no way at
all.
Informed consent
GDPR highlights mainly four things about data usage:
It enables EU citizens to have the final say on how their data is used. If your
company handles PIIs, there are tighter restrictions on how these can be
handled. Companies can store/use data only if the person consents to it. User
has rights to their data.
Consent is the crucial part here since many sites lack on this front. There has
been a lot of discussion about what should be considered consent. GDPR
Art. 6.1(f) says that "processing is necessary for the legitimate interests
pursued by the controller or by a third party". Now legitimate interest is
relatively shallow and quite a few authorities in Germany, for example, consider
that third-party analytics do not fall under "legitimate interest". You can
utilize consent management platforms to ensure you have user's consent before
you drop the tracking code in your page. But this then raises the question of
what can be considered consent.
Drew DeVault wrote a great post about web analytics and informed
consent. Informed consent is a principle from healthcare, but it still can offer
significant elements to be utilized, especially in technology and privacy. Drew
split up the essential elements of informed consent in tracking to these three
points:
Disclosure of the nature and purpose of the research and its implications (risks
and benefits) for the participant and the confidentiality of the collected
information. An adequate understanding of these facts on the part of the
participant, requiring an accessible explanation in lay terms and an assessment
of understanding. The participant must exercise voluntary agreement, without
coercion or fear of repercussions (e.g. not being allowed to use your website).
Considering these essential elements of informed consent, we agree that most
tracking sites don't follow these guidelines.
Thankfully trivial tracker blocking is supported already in many browsers, which
makes this issue slightly more bearable, and also, you're able to download
external tools to do it. But still, this kind of approach is pretty upside down.
All kinds of cookies
Unfortunately, ad-tech companies have tried to make blocking these harder and
harder by constantly evolving these cookies to evercookies, supercookies, etc.
The way these have worked is that trackers have stored these harder to detect and
delete cookies in different obscure places in the browser, like Flash storage or
HSTS flags. Evercookies were a big thing in early 2010 since many sites were
using Flash and Silverlight, and those were very exploitable. Today those
technologies aren't used anymore, but that doesn't mean the evolution of cookies
has stopped. On the other hand, Supercookies work on the network level of your
service provider.
Thankfully lately, for example, Firefox has been able to start tackling
these. In that post, the Firefox team discloses what they had to do to take some
actions against this, and it is wild. First, they had to re-architect the whole
connection handling in the browser, which was first made to increase user
experience by reducing overhead to eliminate these pesky cache-based cookies.
Still, browser fingerprinting could be considered the evilest cookie of them
all. Browser fingerprinting identifies everything it can from your system. Like
some cookies, this has real use cases, e.g., preventing fraud in financial
institutions. Still, principally this is just another intrusive way to track
people. Thankfully some modern browsers offer at least some ways to avoid this,
but not a full-fledged solution (other than disposable systems).
Future of cookies
Lately, there has been some news about privacy-friendly substitutes to cookies
by tech giants. Cookies have been a relatively significant issue privacy-wise
for decades, and since the ad industry is so large, finding a replacement for
these have been hard. So only time will tell. We cannot get rid of cookies
entirely in the near future. They might change into something else, maybe this
kind of API utilizing machine learning analyzing user data. Which I don't know
is better or worse. So cannot wait! tin-foil hat tightens
Conclusion
So what is the conclusion here? Probably nothing. Recently started small-time
blogger just got scared from big numbers coming into his site collecting all
kinds of data which ended up him stopping this kind of action at least in his
site. Since for most users/sites, this kind of tracking is just a silly
monkey-get-banana dopamine fix.
Don't track unless you need to and if you do, inform it thoroughly.