Getting Some Privacy on the Web
Hi today, we're going to look at privacy.
Privacy on the web, as I'm sure isn't doing great these days.
It often feels like there isn't much that as folks who work on web things, we can do about it.
But that's not entirely true.
First.
We'll look at what privacy is.
Then we'll take a quick look at why we so often fail to respect it on the web, and then we'll see what we can do about it.
I hope that after the session you'll have some hope as to the future.
So first let's take a quick step back.
In the late 19th century, when electricity started to become relatively common in households, not everyone wanted to pay for it.
Some people figured that they could hook the households directly to the power grid and just, get electricity that way without any, without having to sign up officially and, without of course, having to pay any kind of bills.
Not everyone knows this as you might have guessed, and the matter was taken to court.
However, this might sound surprising, but considering taking electricity directly from the grid as theft was not easily decided.
The German Supreme court, for instance, ruled that electricity wasn't an asset and therefore could not be considered, taking electricity, could not be considered as theft in the traditional sense.
US courts decided differently.
They decided that electricity could be stolen, but the matter was nevertheless contentious.
In fact, it was still being debated in New York courts until 1978.
Why you may ask, am I giving you a short lecture on the history of the legal status of electricity as an asset?
Well, simply because we're seeing very similar issues with data today.
Even though it might seem absolutely obvious now that hooking your house directly into the grid is theft, it's stealing, it really wasn't obvious until even 50 years ago.
And we're seeing the same kind of issue with data today.
I'm pretty convinced that a few decades from today, some of the things that are being done today with data will be considered utterly illegal and unthinkable for most people.
But we're not there yet.
But first we need to understand that privacy, isn't some kind of new fangled concept.
It rose to prominence recently, but not because itself is new.
Privacy is something that we consider essential in human relationships.
It existed ... as far back as we can ellt.
In everyday situations, we treat issues of privacy very straightforwardly.
We consider it to be very easy.
If I read DMs over your shoulder, on the bus, if I repeat a secret that you told me in confidence, you know that this is a violation of privacy.
You don't need to hire an ethicist.
You don't need to attend a conference session to figure that out.
But in a digital environment, we have a lot less certainty.
A key reason to that is that there is a loss of context.
Offline, that you are in the doctor's office and this conditions the kind of flow of information that can happen.
You also know that you are at the local bar and that is a different situation for flows of information what's right in each context.
But online, everything looks like a dark slab of plastic.
And so it's very hard to tell how contexts change and also what's going on behind the scenes.
But how do we go about figuring this out?
Well, first, we need to agree.
As to, about what we mean when we say privacy, what even is privacy itself?
Well, there's a definition that we've been using on my team that's actually quite useful.
It is that privacy is a right to appropriate flows of information.
That's quite a lot to unpack in this definition.
And we'll be going into this over several slides.
This definition is due to, to Heather Nissenbaum and it's part, it's a key part of what is known as the contextual integrity framework.
Most definitions of privacy, the ones that people easily reach for when the, when they're looking for one like privacy is secrecy or privacy is control over your data, or privacy is consent a way too simplistic and they always run into trouble.
The advantage of contextual integrity is that even though it might you know, require a little bit more thinking to get it going a little bit more work.
It works in all situations and it, actually allows us to make pretty well-informed decisions.
So the way to work with contextual integrity is that we first understand how different contexts have, will have different norms, different rules that apply to them.
And for each of these contexts that we wish to analyze, we want to break it down into smaller components so that we can understand what in that context is relevant to privacy.
So first, this is the type of context-am I talking to a friend?
Am I visiting a doctor's office, as I was saying earlier?
Am I shopping?
In general I find that using examples from offline familiar situations really helps because we have strong privacy intuitions about them.
And then if we can somehow find ways of comparing online situations that are still somewhat novel, even for us tech people and find ways of comparing these online situations to offline situations that we're very familiar with, then we can figure out what the norms should be and how to treat people respecfully.
And so in addition to the type of context, there's actors, who the information is about, who's sending it, who's receiving it, it's the attributes what kind of information is being transmitted in the first place?
What is it that we were talking about?
And then there are the transmission principles.
The transmission principles are a bit initially fuzzy, but we get used to them.
And the idea really is that what, constraints are placed upon the information flow?
So for instance, maybe data is shared, but there are legal constraints as to what can then be done with the data, or maybe the technical constraints as to what can be done with the data.
For instance, encryption that might prevent certain people from, accessing it afterwards.
But I realize that this is still fairly abstract.
And so we're going to work through a couple of examples.
And so the first example is really basic.
You're walking down the street and the flow of information happens.
The information is about you.
But it's being shared by a random stranger.
Maybe me, maybe someone else.
And they are sending it to another party that might be maybe a governmental agency or a private company of some kind.
And what they'rw sharing is pretty intimate.
They're sharing your current health status and your precise geolocation-as precise as they can make it.
And of course you haven't consented to this in any way whatsoever.
Now, ideally, I'd be doing this in front of a crowd with an audience.
And I will then ask you a question about this so that we can figure out if we agree as to whether this is appropriate or not.
And people would normally say no [elongated and exaggerated no].
So I really trust you in front of your computer alone or with others to answer properly and go like "no".
It's part of the ambience right?
So is this OK?
No.
No, of course not.
It's not okay.
And again you don't need to hire an ethicist to tell you that.
You already knew that this was not okay.
You could tell for yourself, however, let's change the context just a little bit to see if it matters to that decision.
Now everything's the same.
Same people, same data, same recipients, still no consent.
But instead of just walking down the street, you're currently be bleeding to death on the ground.
And this alone changes everything.
It's not just that it is now okay for this random stranger to share that information.
It would, in fact, not be okay for them not to share it, they have to share.
That is ethically mandated.
And this is still not a violation of privacy, the privacy of that person of your, of bleeding on the ground is not being violated here by this information being shared with others.
On the contrary, it would be inappropriate to not share that information.
And so while that example might feel slightly contrived.
Even though it's a very much a real world one, it shows that it's important to get all the details of the context right so as to assess it ethically and to know what's appropriate and what's not.
But now let's look at a situation that might be closer to the kind of situation that you would encounter in, pretty typical web work.
So here we're dealing with a regular site, maybe it's a blog or media site of some kind.
And that website is sharing analytics data, which is to say the full reading history and approximate to your location of, its readers to a third party analytics company.
That third party analytics company, it happens has full control over the data, which is to say that yes, they render an analytics service, but they also may use that data to drive any kind of entirely unrelated business.
They might resell it.
They might repackage it.
They might use it to inform a completely unrelated product to what, to, to the situation that data is being gathered in.
And you that analytics situation is, sort of like vaguely mentioned in some legalese on the first party website and there's no way to opt out of it.
In my mind, that's not an okay situation, but compared to what's going on the web most of the time, this is a pretty mild example.
And so let's see here, if we can intervene on some of the aspects of the situation to change without changing everything, but by changing a few things.
I'm not claiming that the situation on the right is good.
I'm pretty sure that if we've dug into details here we, I would find a few more things to fix.
It would find another, a number of things that I would like to improve.
And there's some people who would find it still impractical or not matching their expectations, but it does bring forward a number of interesting points.
The most, perhaps the most important thing to focus on here is that building websites on the modern web is a complex affair.
And it's absolutely normal to need to rely on third parties now and then for this or that service.
It's not very different from using a third party library and reusing components is, absolutely key to succeeding in this space.
However, it makes a huge difference whether the data that's being collected by the first party remains under the full control of that party, even when it's interacting with others, or if when that data is shared, those third parties can do whatever they want with it.
And so that's the first thing.
This idea of sole controllership of the first party is really key to building privacy on the Web.
It's also important to less important than sole controllership, but it's also important to find ways of explaining what is being done with data clearly anonymously.
And this doesn't mean that we should expect every user of the website to read everything, but by presenting things clearly, and really honestly by trying to explain what is being done with the data, it enables a form of accountability that really helps sustain a trust, a trustful relationship.
And ideally people should be able to opt-out maybe what's on the right was given a few other changes that we'll get into later maybe that would be perfectly fine for the vast majority of your users, of your readers whatever.
But for some of them for whatever reason that might be personal to them, it might still not be okay.
And for them it should be easy to opt out.
And ideally they shouldn't have to opt out on every single website that they visit.
They should be able to use something like the global privacy control, GPC to opt out automatically from every context.
Now it's a wild internet out there and there are many reasons why privacy isn't respected.
Almost every single company out there says that they care about your privacy.
They value your privacy.
They really really work hard for your privacy.
And a lot of people even think that they're doing something about it, but in practice, the situation is, quite dismal.
So.
Let's take a quick dive to see, to understand the major reasons why this data is flowing is being shared in the first place.
A lot of the time when people start finding out about privacy violations on the web, in digital, in general, they start to ask angry questions.
Like why can't you just remove that thing?
I completely understand the sentiment.
In fact, I think everyone should be angry about privacy online, but the kind of question like, "why can't you just do X?" really don't.
A lot of the time people are doing things for a reason.
It might not be an ideal reason.
It might not be done in the best way, but it is done for a reason.
And understanding that reason is key to fixing it.
And so there are three primary broad bags of use cases for which people,.
Companies mostly use personal data online.
And in some cases, the, they use them in ways that violate privacy, but that could be fixed, that could be improved so that the use cases can be saved without without the same privacy implications.
So one of these is analytics improvements, all kinds of first party things that have to do with operating your own business while keeping the data in house.
Then there's the second area, which is advertising and marketing that has justified a lot of bad things, but doesn't necessarily have to be this way.
And the third primary reason, one of the major reasons why data is being collected online today is you know, to dominate markets, to build empires and, to leverage one's power in one market to extend it in another and keep growing and growing.
We're going to deep dive a little bit more deeply into these three.
So looking first at first party purposes, it depends on the kind of site that you're running.
But in a lot of cases, I find that it's a pretty good comparison to see to try to translate what you're doing with data in that context, to what would happen in a cafe, or maybe your local bookshop or something like that.
Comparing what you're doing with what would happen offline can often be useful and revealing and, help us figure out the right direct rules and the right laws.
So, one thing that's important to understand is that offline people don't people are not anonymous.
People you have a decent idea of what's happening in your book shop.
You can't monitor every last detail of what your customers do and neither should you, but you can see what's popular, you can recognize people who come often.
You can see how your customers react.
You can use this to, to run a business, and there's no reason why the same kind of connection shouldn't be possible online.
And if we figure out ways of, limiting data collection and data processing online in such a way that it roughly more or less matches that kind of offline processing, we'll be perfectly fine.
So if you're talking about sole controllership, same thing in the bookshop.
Maybe you will be using some third party software to run inventory management or sales management, something like that.
And that's perfectly acceptable.
So long as the data about your customers, isn't being resold, reused, repurposed for completely different purposes unrelated to those of your bookshop.
It's perfectly fine for you to observe what your customers are doing so long as you eventually forget them.
So that's limited data retention right there.
If someone comes every day, of course, you're going to start recognizing them.
You going to start knowing all kinds of things about this person.
But if they stop coming after maybe six months, a year, two years, you'll have forgotten who they are, you'll have forgotten what their habits were, what their tastes were, and that's perfectly fine.
And of course you should avoid learning things about, your customers that might be too sensitive.
You don't want to overstep, certain boundaries.
And so you don't want to, produce some inferences with the information you have, that you really shouldn't be knowing in the first place.
And if we consider the situation, it's even possible to think of respectful ways of running an advertising business.
So for instance, completely imaginary situation, but let's assume that the cheese shop guy down, down the street, realy knows those that whoever reads metaphysics really likes cheese and they could come to you and say, Hey whenever you have customers who come into your bookshop who are into metaphysics, whether they're buying a metaphysics book right now, or not, because them maybe give them this leaflet for a discount at the cheese shop.
And that's respectful advertising.
You as the bookstore owner, it's perfectly normal that you understand your customers.
The cheese shop owner understands their customers.
They, know that their customers like metaphysics, maybe from conversations they have, as people are thinking cheese.
All of that is perfectly fine.
So long as no specific information about people is being shared.
We can actually run this right.
So this is the typical use of data online that is usually today not done in an appropriate way, but that really could be fixed to to, work well and to work respectfully of people.
And most of us have some control over that how that happens so we could really make that happen.
The second one is more complicated.
Everyone loves to hate advertising, but it's really important to distinguish the genuine value that advertising brings from the really stupid way in which it has often been implemented.
Because even if you assume that everything switches, every online service switches from a paid, from an advertising based model to a paid model you still need people running those paid services will still need to advertise for the services.
Users don't miraculously show up.
They show up because you've advertised somewhere most of the time.
And also if everyone has to pay for every service that they're getting online, we getting, we're building a web for rich people.
And that just doesn't work overall.
So I'm not going to try to convince anyone that they should love advertising, but if we start to think of it as just a basic taxation system over our attention, and we use that tax to finance content, finance services that are accessible to the greatest number, then we can stop to think of ways of doing advertising, right.
And so if you're running a taxation system you want to keep the taxes as low as you can.
You want to make them make what you do with them as effective as possible.
And so ideally, you can start building a system of advertising that is privacy preserving that does not have excessive ad presence precisely so that you can finance these services.
And this is possible.
This is a lot of work.
We'll give a few pointers later on in the talk.
There's a lot of work going on about privacy, preserving advertising systems and how we can develop towards those.
There's also a number of solutions that are looking at first party only advertising systems that can also help make this work in a manner that's effective.
But that, that, but that doesn't imply the kind of broadcasting of data, the kind of privacy violation that, that we see today with advertising.
So there's good hope there.
And now the third, as I said, primary reason behind inappropriate flows of information is market domination and empire building.
Now, contrary to the previous two use cases for, data sharing, those tend to be justified.
Those tend to have ethical alternatives.
This one never is.
There is no ethical variant of violating privacy for the pure reason of building up more power and, gaining, a bigger stranglehold over an existing market.
But it's important.
It's important.
Nevertheless, for us web people to understand why this is being done and what is the dynamic at play here.
And a very simple way of thinking about it is if you have a company that's already successful with product A and so product A is, making money and you can always use data to gain an extra competitive edge, at some point, the data its getting from its own operation is it's maxed out, it can't drag out more data.
But now you look around and you can see that there are other markets with other product types that you could gain data from.
Now, it might be less obvious how to run a profitable business in those other secondary markets.
But what's clear is that you can have, you can recognize people if they have the same identity and you can use the data in those markets to inform and, give competitive advantages to your product in this market.
And so what happens is that you use the money you're making with this product to subsidize product B in a data rich market that doesn't have a clear business model or an obvious business model.
And you can make it, you can make this product free.
You can out-compete everyone because you're subsidizing it and then you get the data and then out compete everyone here.
And so basically you're creating this sort of vicious loop where you're getting money from an already dominated markets to dominate another market, kill the competition, even though they may have better products, but no one can beat a, no one can be free.
And then you get the data over to here to keep your absolute dominance here.
This is not a dynamic that's easy to break.
It's not something that we can individually just fix.
We need fixes to tech.
We need fixes to law to make it work, but it's still possible to help from the trenches by understanding this dynamic, by understanding that you might be using your dominance in search to subsidize a browser and then use that browser to track people, to strengthen your dominance in search.
And then again, in advertising, again, in social networking, again, in any number of other markets.
If working in the trenches, we recognize, this dynamic we can help if not stop it at least contain some of its power.
And, see if we can shape technology so that it is less conducive to this kind of cross context recognition that enables anti-competitive dynamics.
Overall that might seem like a pretty dim view of where we are today in terms of, the industry, in terms of where the web is headed.
But there's a fair bit that overall we can do in everyday situations to improve privacy for everyone.
And particularly for our users who aren't necessarily tech savvy.
So it's in our hands and we can do stuff about it.
One of the things one of the very first steps to understand is that as explained early on in a talk with contextual integrity, we can really figure out ways of fixing the situation in increments.
It's very easy to get stuck with privacy issues, feeling that if any data is shared, if any third party is involved, then there's necessarily a privacy violation.
That is not the case.
It can be done in ways that are respectful of people.
And that respect their privacy.
And so.
The first step is to not fall into that trap, to break down the situations, using contextual integrity and to spot what differences, what changes can be made to the context as currently implemented to make it work and to make it more privacy respectful.
And you don't, it doesn't have to be a big bang change.
You can make incremental changes.
My team makes incremental changes pretty much every month.
And I'm not saying that anything is perfect yet, but it's definitely better than it was yesterday, much better than it was last year.
And that's always a good thing.
Also a lot of what happens on the web and on the websites that we all manage is due to choices that people in marketing or advertising, maybe making.
And it's very easy to if you don't know them, if you're not familiar with, that, with their work and why they're doing all these things, it's very easy to point the finger and the blame them for all these why do they want all these trackers?
Who knows what they even use that stuff for.
Well it turns out they're not just doing it for the fun of just throwing some random JavaScript onto your website.
They're doing this to make money, hopefully to drive some success for you, for your company.
And that probably covers some of your own income.
And you really can help change their practices if you don't understand what they're doing and why they're doing it.
And if you don't have a good relationship with them.
So talk to them, make friends, it turns out they're actually really nice.
And building this mutual understanding of what problem's their tools are creating and why they're doing that in the first place really works.
As an example working with marketing at The Times, we managed to reduce the volume of data shared with third party data controllers by over 90% without losing any effectiveness, simply because we talked and we figured things out.
Same thing, my team has worked a lot with the people who work in advertising to develop new privacy, preserving advertising systems.
And it's been rather fun.
We've really enjoyed that work.
Another thing that's particularly important to keep in mind is that if you build something and you put it on the web, it's your responsibility to respect the users.
So you can't offload to the user, the work of figuring out how to have their privacy respected.
This is a typical big big red flag about a company's privacy practices.
It's when you see that they start showing off how much control they give users over their data.
This often shows up in things like consent dialogues cookie consent complex privacy settings, annual privacy checkups, that kind of stuff.
One thing that might feel counterintuitive is that if you want to violate people's privacy, why are you giving them control?
It feels like this is backwards.
Well it's because everyone actually knows that most users don't know what to do with these controls.
And so don't use them and by giving them these controls that you know they're not going to use because you have the experience, you have the expertise, you have the AB testing to know that they're just not going to use them, by giving them these controls, they can then not complain that you're violating their privacy, because if they do, you can blame them.
You can say, "Hey, you had the choice, you have the control, you could have changed it".
That's absolutely backwards.
It's not on users to fight every single last website and understand every single last privacy checkup, every single last complex dialogue to figure out why they what they what, the right setting is for them.
It's on you to do things right by them.
And so it's not about giving them control.
It's about giving them respect.
Then another thing that's really hurting, they, the work towards better privacy online is that there's a lot of bullshit.
People will justify all kinds of, activities, all kinds of, behavior, with lies that have become completely entrenched.
One of them is that if you have, identifiers, they may be anonymous or pseudonymous.
So pseudo anonymous, or whatever it is this week.
But by definition, if you have an identifier, then you can identify someone.
And this has been used in many situations to harm people afterwards, even though it might seem innocuous at first.
Another lie that keeps getting repeated is that reading history is somehow not sensitive.
And it turns out that you can actually learn a lot about people from their reading history and from the coarse geolocation that you get with it.
And what you learn might not actually be a lot of the time will not be useful for any kind of legitimate business, but can be used for blackmail or, direct privacy violations, finding where people live, where they work etc.
And finally another a third big common lie is that the trade off is somehow worth it, that by giving all this data and broadcasting all this data, we're actually getting some value out of it.
And the reason I have avocados here is because there's this great example, it's one example amongst so many of third party data providers claiming to be really good at this because they collect so much data, but then being really terrible at providing any kind of useful insight from them.
And the case of avocados from Mexico, what happened is they this company bought targeting data in order to find they wanted to show ads to women who were between 20 and 55.
So this is a very, broad demographic, but they were only hitting it about 20% of the time.
Now that's an absymsally bad number, for targetted advertising of any kind, including contextual.
If you took an avocado and threw it randomly in a crowd, you would be more likely than 20% to hit a woman between 20 and 55.
Now I'm not saying do that with an avocado, but this is a bad use of data and it's, a bad justification for this violation of privacy.
One of the biggest lies that is being told-and again, I'm sharing these lies, I'm examplifying them, so as to, help you call them out and to help you pull away from them, because they might seem seductive if presented in certain contexts-one of the biggest lies is that there is some kind of value exchange that, you know maybe if people accepted it or if we explain things better, they would understand this value exchange.
And what is meant by value exchange is this idea that you should lose privacy in order to use the internet and to use internet services and to read internet content.
And there's no reason to believe that.
No user accepts this idea of a value exchange.
It's completely imagined by the data industry.
No one thinks there's a value exchange.
What happens is that people just feel cheated and they hate you in silence.
And so let's not pretend that it's there when it's not, what are our users actually want.
And as a final, lie or body of issues to, call out there's an entire body of practice often from people who really mean well, but you know, this body of practice claims that privacy is primarily about notices things like cookie banners or terms of services or privacy policies and about transparency and choices.
But we knew for a fact that none of this works for users.
It just offloads the work to them.
And then only serves to protect the company against its users if they want to turn around...if they want to, if they want to claim back some, rights over that data.
So don't fall for this.
It's entirely fake and the worst offenders in privacy all give you a lot of transparency and choice because they know it doesn't work.
Finally in this in, in this vein of things that we can think about and that we can practice in, our common day work, it's maybe you personally don't care about respecting your users or maybe the people you need to convince to change don't care about respecting their uses.
But if you don't do it for the people who use your services, who read your website, you can still do it for profit because actually making these changes is better for your business.
The one thing that people often don't pay a lot of attention to, is that when a third party learns about your users on your site, they also learn about your business and they can use that data to compete against you.
Now many of those third parties are hugely bigger than your company, than whatever business you're running.
And so it is very tempting to think that they're not competing, that you're not really in the competition with them, even if they are in the same segment.
But that's not actually the right way to think about.
If you think of supermarket chains, there's a lot that they can do to compete against bodegas.
And it doesn't matter that each bodega on its own is a tiny little business the supermarket chain can still take actions that will actually push back against each and every one bodega's ability to run a successful business.
And it's the same thing here.
And so for instance, if any part of your income is derived from advertising, for instance, you are competing against the Googles and the Facebooks of the world.
If you're running any kind of e-commerce, then you are competing against the Amazons of the world.
And it's really important to understand that, normally in the typical business situation, giving your competitors information about your own business is considered pretty stupid.
And that's exactly what's happening here.
These companies are taking data about our businesses and using it to build competing products.
And that's something that even if you don't care about privacy is worth putting a stop to.
One of the best ways of putting a stop to it, even though it's not necessarily enough, but it can at least slow down that data leakage, is to work with companies that respect your rights to solve controllership over your data.
So now before we part ways, I would like to point to a few areas in which there is work to help make the world, the web and the world better prices with respect to, the privacy issues that we're facing today.
As I said in the beginning just as it was with electricity, we still haven't figured out the right rules for data.
The way the web works today, isn't set in stone at all.
And we need to put in the work to fix these problems that we're seeing.
It's all very much possible.
And the sooner we do it, the better results we'll get.
So very quickly, because this would be an entire other presentation, there's a number of efforts underway to change this on, to change the situation and improve it on multiple fronts.
And one of the first things that I want to point out is that I've gone through this presentation, explaining, giving some basic introduction to what privacy is, and it's important to document that for the web at large.
And so a group of people with the W3C's technical architecture group, the TAG is working on a set of privacy principles that are meant to be used by technologists as we design new technology, and role makers as they create new laws.
That way we can all sort of like push in more or less the same direction.
The idea is to gain really broad consensus around these privacy principles that are extensions of what you've seen in the past 20 minutes.
As far as this work on the tech side, there's a lot of movement around privacy, preserving advertising.
It was a bit slow to get going because it's an entirely new community.
It's an entire new body of work, but there's a lot of promising stuff happening in that space right now.
There have been proposals from Apple, from Google, from Microsoft, from the New York Times to, and many others as well to find ways of doing advertisin that do not violate privacy, do not require broadcasting data as current ad tech does, but can be done in ways that are really respectful of people.
And at the same, in the same way, there's a lot happening in the regulation space.
And so it's really important to keep thinking about these issues and to keep developing expertise about what's happening and where we're heading next in this, in the privacy space.
And with this, thank you very much for listening.
If you have any questions, please don't hesitate to reach out on Twitter.
Privacy online is absolutely fixable.
We can help each other out we can really help each other out, fix it and build a web that actually works for its users.
I wish you the best of days.
Wherever you are.
Thank you.