Lemmy and GDPR - What is the current state?

NewBrainWhoThis@lemmy.world · 1 year ago

Lemmy and GDPR - What is the current state?

philpo@feddit.de · 1 year ago

Former (small scale) data protection officer here. While I am long out off the data protection game and there are surely a lot more qualified people out there I maybe can clear up a few misconceptions here and answer a few questions that come up regularly:

(BTW: My first language is not English and all my comments/books on that topic are not in English so excuse me if my translations are sometimes not 100% accurate)

Does the GDPR even apply to a instance hosted outside the European Union? It absolutely does. And in fact it is harder to comply to the GDPR outside of the European union. The GDPR does apply to all data collectors (from now on DCs) that collect data of European citiziens. While §2 Section 2a GDPR limits the application of the GDPR to usage within EU laws the collection of EU citiziens information clearly falls under the EU law as long as the EU citizien is within the EU during the collection process.
So why is it harder to comply to EU law outside of the EU? Because of local laws. A good example are US homeland security laws that do contradict the GDPR (and various other EU laws) and therefore make it impossible for someone to host EU data in the US complying to the GDPR. Facebook made a pretty costly experience in that regard recently. To comply to the GDPR one would need to keep EU citiziens out of their service AND defederate all EU instances. More of that later.
Does the GDPR even apply to Lemmy posts? It absolutely does! GDPR §4.1 states clearly that all information relating to an “online identifier” (aka username) is already protected. So the IP adresses, etc. collected by the initial server aren’t even the only personal data. This makes the whole topic a clusterfuck in terms of federation.
But what about my small/medium size instance? I am not a business! I make no money. The GDPR does not care a bit about ones intentions here - it applies to all instances that are beyond “personal or intrafamiliy” data collection. This basically means that you can absolutely do what you want with the data you collected at the last family reunion. Maybe one can even get away with a invitation only private instance that only caters to a group of friends knowing each other. But any DC having a public instance is not, by definition, a private DC anymore. Therefore the GDPR does absolutely apply.
Can I simply the user for permission to use their data indefinitly and however I want? One surely can ask that. But that automatically invalidates the agreement. (Funnily enough this is exactly what reddit does and why reddit is not in compliance. Which might turn out costly.) The consent always has to be revokeable, amongst other things.
So what does the GDPR stipulate? There are three main topic we need to look at: Data deletion, traceability of data transfers and connected to this information about data usage.

Lets start with traceability. Because that makes the federation a federation!

What does traceability of data transfers mean? It basically means that a DC must record its data transfers to third parties and ensure that data is handled there according to the consent agreement with the user and the GDPR. Usually a data transfer agreement is necessary to ensure the rights of all parties. This makes it so difficult for a federated system: In theory a instance would need a data transfer agreement with ALL instances that federate data from it. And these instances woud then need to make sure that they don’t transfer OR their transferpartner is covered in the original data transfer agreement as well their own one. A receipe for a pretty nice clusterfuck.
What does data deletion mean? Under the GDPR every user has the right to have his data deleted from a DC. This does not include data necessary for legal obligations but basically everything else. So the user can at any point revoke his consent and make the instance delete all their data.
Okay, I deleted the data on my instance, do I now comply to the GDPR? Surely I can simply ask the user to go to the other instances and ask them to remove the data? No. And here is another problem: The original DC (the users instance) is responsible for the data handled through transfer. That’s why one needs a transfer agreement. To ensure that the data is deleted on all instances it was transfered to. There are two exceptions here: “Involuntary data transfer” is generally seen as not being part of the data handling. But that mainly applies to datascrapers like the web archive and similar usage where the data is transfered through general usage of a page that the DC cannot reasonaby prevent without limiting the usage of their service massively. That would very very likely not apply to a service that does provide a specialised api for the transfer. The other one is a data transfer partner not complying. In that case the user can sue the DC, but the DC can sue the transfer partner for breach of contract.
What does right to information usage mean? Basically a user has a right to know what happened to their data. So in case of the federation: To what instances got my data transfered to? How did they use it? Did they transfer it?
The end: What does that mean for Lemmy? To be honest: I can not fathom a way that put Lemmy in a position that is fully GDPR compliance. There might be one, but I can’t imagine one that does not entail full defederation. But Lemmy can and must urgently improve the GDPR compliance as far as possible:

We need tooling for administrators to easily remove a users personal information from their own instances. Currently this is still very bothersome and time consuming manual work as far as I know.
We need a tool to federate deletion requests. So once the administrator of the “original instance” deletes the data a request is sent out to all instances and they automatically delete the user data then.
We need a system to deal with instances who do not follow deletion requests. This, for example, could include a “karma” system - once you are caught to not delete the userdata you are getting bad karma. And with enough bad Karma you get defederated by more and more instances.
We need a tool to inform people which instances did federate their data.
We need to optimize data frugality: The less data is collected the better it is.
We should consider data transfer agreements between the instances being set up automatically.

In theory even then someone can sue an instance owner. Even then we are not 100% in compliance. But it is a far better position in court if one can argue that they did basically everything they can to ensure the users right compared to “I don’t give a f****, your honour”.

Additionally we should lobby for change in the GDPR to include better rules for federated systems. Also because E-Mail as another federated system is not in compliance - that can easily be weaponized as a good point.

hamburglar26@wilbo.tech · 1 year ago

Just wanted to let you know your English is significantly better than many native speakers. Thank you for the great and amazingly detailed response!

philpo@feddit.de · 1 year ago

Thank you. But especially with the legalese English it is sometimes fairly hard to find the proper translation.

hamburglar26@wilbo.tech · 1 year ago

Well I’m not an attorney but from my read you did great. 👍

NewBrainWhoThis@lemmy.world · 1 year ago

Best answer so far thank you!

MBM@lemmy.world · 1 year ago

This is a great answer, you (or someone else) should make sure the devs see this! Maybe as a Github issue

HobbitFoot @thelemmy.club · 1 year ago

It isn’t up to Lemmy to be GDPR compliant, but the individual instances.

moreeni@lemm.ee · 1 year ago

People are struggling really bad to understand the concept of software federation

Drunemeton@lemmy.world · 1 year ago

Both ways are a wheel with a hub in the center and spokes out to the wheel. The users are the spoke/wheel location, the “corporation” is the spoke/hub connection

The Old Way was users connecting to a corporation that provided a service. The corporation controls almost everything.

The New Way is that users control almost everything and connect to the hub which allows them to connect with each other.

Lemmy is the hub, instances are the users, and communities are the data shared.

chaorace@lemmy.sdf.org · 1 year ago

Has this actually been court-tested? I get the feeling that this is all really quite grey until something in the Fediverse actually gets sued over this.

For example: when you create something (a comment, a post, a community), the “true” version exists on your home-instance, but copies also get sent and saved across the entire Fediverse. Is an instance really able to be GDPR compliant if it’s constantly “backing up” data to non-compliant instances?

On the one hand, you could make the case that these outside instances are separate entities. Like the equivalent of a webarchive. Simply being public on the internet means other people can save copies and that’s obviously all fair play under the GDPR.

On the other hand, you could make the case that saving copies to the outside instances is a lot like using third-party cookies. It’s not technically “strictly necessary” for the instance to send your data to outside instances, even though it would seriously complicate the underlying design to allow specific users to opt-out of federating their content specifically.

jmcs@discuss.tchncs.de · 1 year ago

There’s no reason why activitypub would be considered any different from email, nntp, or even search engines and internet archives. When an website or email server gets a GDPR request it’s not propagated in any way, and it would be a stretch to expect it to.

chaorace@lemmy.sdf.org · 1 year ago

There’s no reason why activitypub would be considered any different from email

Are you sure? Email only sends your message to servers which you explicitly ask it to. If you only trust protonmail, you can choose to only send emails to other protonmail addresses. If protonmail chose to share your emails with other third parties regardless, I can’t help but think maybe that breaches the GDPR.

Lemmy, by design, propagates copies to instances based on opaque factors outside of the user’s control, even when the UI suggests that you are sending content locally. In the case of posting a comment to a community hosted on your home instance: Lemmy will send a copy to whichever servers happen to have users that are currently subscribed to that community. It’s a very opaque outcome and pretty far from the outcome you’d experience when sending an email message to someone using the same email provider.

even search engines and internet archives

Yes, but these are genuinely disconnected entities who come across the data as a user might. Lemmy doesn’t personally phone up Google and send them a copy of your comment as soon as you post it, but that’s basically exactly what happens when Lemmy federates a comment with other instances via ActivityPub.

FWIW: I think Lemmy as a piece of software is actually very aligned with the interests of the EU more generally and I think it would be a bad idea for them to come down on federated social media as a GDPR issue. I nevertheless worry that it represents untested waters and can certainly imagine a reality where it receives a raw deal from regulators.

LoreleiSankTheShip@lemmy.ml · 1 year ago