Contemplating Web Analytics
Posted on 28th of March 2021I started to rekindle my, unfortunately, lost writing habit a couple
of weeks ago. I set up Google Analytics for this page mainly due to
its easy use to see simple analytics. I was only interested in visitor
count and possibly where my readers’ were coming from. Google
Analytics is a massive tool with massive amounts of data going into
it. I tried to restrict this collection as much as possible, which
suits my personal blog’s needs.
Then my page rose to the front page of Hacker News, and it started to
get a lot of traction. Suddenly, thousands of readers came every day
to my pesky little page with just a few posts as I followed the
visitor counts rising in my Google Analytics view. That got me
thinking about the ethics of this kind of tracking. Which then ended
up with me deleting my account and data from it.
Discomfort With Tracking
Before I deleted my data and account from Google Analytics, I looked
for alternatives. I stumbled upon many other privacy-oriented and
GDPR-compliant analytics platforms, which at first seemed promising.
Also, having good options for ever-prevalent Google Analytics is a
great thing. But despite these features, they don’t remove the
uneasiness mining your users’ data causes. Of course, we are talking
about spying here. Thankfully there are now some restrictions
regarding personally identifiable information (PII), at least in the
GDPR, limiting the shadiness quite a lot. But that brings new issues
in handling this kind of information since you need to be sure that
your software doesn’t leak this information. Thankfully, opting out
entirely from collecting PII in your software is an option.
I understand why people might want to add at least simplistic tracking
to their sites since it can provide helpful information about your
content, companies can see how users use their site, and the list goes
on. Especially when you combine Google Analytics, or similar analytics
tool, with ads, companies can reap significant benefits from this kind
of tracking. But 9 of 10 sites shouldn’t need this. You could argue
that most administrators use this tracking only for dopamine fixes and
don’t utilize the tracked data. Even though they might use it somehow,
how do they inform the user? I dare to say that information about data
usage is almost always written in some shallow boilerplate text or in
no way at all.
GDPR highlights mainly four things about data
usage:
It gives EU citizens the final say on how their data is used. If your
company handles PIIs, there are tighter restrictions on handling
these. Companies can store/use data only if the person consents to
it. User has rights to their data.
Consent is the crucial part here since many sites lack on this front.
There has been a lot of discussion about what should be considered
consent. GDPR Art. 6.1(f) says
that “processing is necessary for the legitimate interests pursued by
the controller or by a third party”. Now legitimate interest is
relatively shallow, and quite a few authorities in Germany, for
example, consider that third-party analytics do not fall under
“legitimate
interest”.
You can utilize consent management platforms to ensure the user’s
consent before dropping the tracking code on your page. But this then
raises the question of what can be considered consent.
Drew DeVault wrote a great post about web analytics and informed
consent.
Informed consent is a principle from healthcare, but it still can
offer significant elements to be utilized, especially in technology
and privacy. Drew split up the essential elements of informed consent
in tracking to these three points:
Disclosure of the nature and purpose of the research and its
implications (risks and benefits) for the participant and the
confidentiality of the collected information. An adequate
understanding of these facts on the part of the participant, requiring
an accessible explanation in lay terms and an assessment of
understanding. The participant must exercise voluntary agreement,
without coercion or fear of repercussions (e.g. not being allowed to
use your website).
Considering these essential elements of informed consent, we agree
that most tracking sites don’t follow these guidelines.
Thankfully trivial tracker blocking is supported already in many
browsers, which makes this issue slightly more bearable, and also,
you’re able to download external tools to do it. But still, this kind
of approach is pretty upside down.
All Kinds of Cookies
Unfortunately, ad-tech companies have tried to make blocking these
harder and harder by constantly evolving these cookies to
evercookies, supercookies,
etc.
The way these have worked is that trackers have stored these
harder-to-detect and delete cookies in different obscure places in the
browser, like Flash storage or HSTS flags. Evercookies were a big
thing in early 2010 since many sites were using Flash and Silverlight,
and those were very exploitable. Today those technologies aren’t used
anymore, but that doesn’t mean the evolution of cookies has
stopped. On the other hand, Supercookies work on the network level of
your service provider.
Thankfully lately, for example, Firefox has been able to start
tackling
these.
In that post, the Firefox team discloses what they had to do to take
some action against this, and it is wild. First, they had to
re-architect the whole connection handling in the browser, which was
first made to increase user experience by reducing overhead to
eliminate these pesky cache-based cookies.
Still, browser
fingerprinting
could be considered the evilest cookie of them all. Browser
fingerprinting identifies everything it can from your system. Like
some cookies, this has real use cases, e.g., preventing fraud in
financial institutions. Still, principally this is just another
intrusive way to track people. Thankfully some modern browsers offer
at least some ways to avoid this, but not a full-fledged solution
(other than disposable systems).
Future of Cookies
Lately, there has been some news about privacy-friendly substitutes
to cookies by tech
giants.
Cookies have been a relatively significant issue privacy-wise for
decades, and since the ad industry is so large, finding a replacement
for these has been hard. So only time will tell. We cannot get rid of
cookies entirely in the near future. They might change into something
else, maybe this kind of API utilizing machine learning to analyze
user data. Which I don’t know is better or worse. So cannot wait!
tin-foil hat tightens
Conclusion
So what is the conclusion here? Probably nothing. Recently started
small-time blogger just got scared from big numbers coming into his
site collecting all kinds of data which ended up with him stopping
this kind of action, at least on his site. Since for most users/sites,
this kind of tracking is just a silly monkey-get-banana dopamine fix.
Don’t track unless you need to; if you do, inform it thoroughly.