Performance versus security, or Why we can’t have nice things

Performant web sites are critical for your user’s experience. No doubt about that. But keeping our users’ information private and secure is similarly critical to maintain their trust in the web platform and keep them around. Those two requirements are somewhat at odds.

There are many cases where performance optimizations ended up creating security or privacy holes. There are also many cases where privacy and security restrictions introduce significant performance overhead, or prevent us from getting access to performance-critical information in the wild.

In this talk, we’ll discuss different examples outlining this tension, dig deeper into them, understand the underlying principles behind the web’s security model, and hopefully agree that we need both a performant and safe web to keep our users happy.

Performance versus security, or Why we can’t have nice things

Yoav Weiss, Co-chair W3C Web Performance Working Group

Yoav gets a lot of questions in the form why can’t we just…? …access more data about our users? …avoid CORS for my specific case?

The reasons are user security and privacy… and people tend to respond “yeah I get that but…” which shows they don’t truly understand all the threat models on the web.

They basically don’t know what browsers are trying to defend against.

Note this talk is not about server-side security, defense in depth, third party tracking or fingerprinting.

Yoav will be talking about the broader categories of attacks, surfaces and vectors; and giving some examples where things went wrong.

Hopefully this will give insights into the constraints browsers operate with; and help answer a few of those “why can’t we just” questions.

Threat categories:

  • History leaks
  • Cross-site leaks
  • Speculative execution attacks (dangerous sub-type of cross-site leaks)

So what are history leaks?

Let’s say you love kittens and you often browse – but you don’t want every other website to start sending you kitten-related advertising. To avoid this, the browser has to prevent history data leaking between different websites. This is much harder than it seems…

The oldest history leak is :visited style – any website could put in a bunch of links to other URLs; then check properties like computed style to see if it was visited.

Blog: Plugging the CSS History Leak – Mozilla Security Blog

Mozilla closed a lot of these attacks but it resulted in very limited styling and slow rendering. Yoav feels visited link history really needs to be blocked entirely between sites – users probably wouldn’t have noticed but we’d have had nicer styles and faster rendering!

These links trigger paints that open up timing attacks – attackers can derive state from things like Frame Timing, Paint Timing or Element Timing.

Browsers are mostly defending against this, but it’s adding a lot of complexity to protect something that doesn’t give a lot of value to users.

Caching is great for performance; but it’s not always unicorns and rainbows – the dark side is caching attacks. If visited links were blocked, caching attacks would be a great way to find out about the user’s history.

A site can load a static resource from and time how long it takes to load; and if it’s fast it can deduce that you have visited that site. While you can defend against this by not caching anything, that’s not so great.

Safari was the first browser to add cache partitioning (or double-key caching) as another defence.

This means storing both the URL of the cached resource; and the top-level domain it’s loaded from. This prevents caching across sites (ie. prevents loading from a TLD other than the cached TLD) and prevents the leaks.

Partitioned caching also tackled a range of other privacy and security issues. Sadly other browsers are yet to follow, mostly for performance concerns; although the Chrome team are looking this again.

Another example of cross-origin state leak is Service Worker installation state. Resource timing has an attribute called worker state, which reveals the time it takes for a SW to start up. It was possible for a site to load another in an iframe and inspect the state of the service worker; and figure out if the user had visited it before.

This was a bug that was fixed in both the specification and implementation.

Cross-site leaks are the next big category. This is where one site can deduce information about you from another site.

For example if you are logged into a social media site, this may be revealed to other sites; and it may even reveal details like the sections of the social site you use.

To prevent these leaks, browsers put in the Same-origin Policy (SOP). The mechanism for this is CORS (Cross Origin Resource Sharing). This allows legitimate sharing by enabling specific sites to read the data.

This protects us from direct leaks, but there are side channels like resource size. Let’s say you have been shopping on a website that’s running an A/B tests of different icons for different age groups – the hypothesis being older users like bigger icons. The resource size of those icons now reveals your age.

Cross-site search reveals information because an attacker can send search queries (eg. to your email inbox) to see if a certain keyword returns results. Let’s say it reveals once again that you are into kittens, because a search of your inbox for “kittens” does not return a zero result.

So exposing resource size is bad… what kind of idiot would have bugs like that? Uhh… (slide highlighting Yoav’s name on a security bug…) (The ticket lists the detail of the attack)

In a busy to-do list a task about changing resource timing implementation hadn’t made it to the top of the list (it didn’t seem high priority!). The bug coming in certainly provided new motivation to get it fixed, although Yoav also felt pretty bad…

Another place content sizes get exposed is the Cache API. It turns out the API has a quota; and initial implementations took the cache size into account while calculating that quota. This revealed the size. So now browsers have to pad the sizes out to arbitrary values to block the attack… which sadly makes things slower than they’d otherwise be.

Beyond bugs, there are also features that get blocked due to resource size exposure. The Content Performance Policy spec had to be abandoned because there was no way to expose the information that made it useful, without exposing that information to attack.

The Performance Memory API also fell victim to a similar problem. They are reviving some parts of the idea in the performance.measureMemory API.

Other things that give out details:

  • status code
  • processing and rendering timing

This is why Timing-Allow-Origin is another opt-in.

Speculative execution attacks – Yoav kept the best for last!

It turns out that CPUs also have caches. We’ve seen many attacks but nothing quite as dramatic as Meltdown and Spectre, which shattered previous expectations around multi-tenanted computing.

Basically when modern CPUs see an IF statement, they can speculatively execute both branches to save time; which gives big performance gains. It turns out there is an unexpected side effect: it can keep things in the CPU cache which can be observed by other programs running on the CPU.

Mitigation for this requires keeping processes separated. Chrome was relatively fortunate as they had already launched Site Isolation, which limited the impact on desktop at least. Other browsers didn’t have this in their architecture, although they are working on it now.

CORB (Cross Origin Resource Blocking) came out of this as well.

Spectre attacks also read the CPU cache state through timing attacks. High-resolution timers were facilitating this problem, which led to some being disabled or “coarsed” to lower resolution (although some have been re-enabled in isolated contexts).

Because these features are pretty useful… is there a way to re-enable them? We can try to create new types of secure or isolated contexts, which limit cross-origin vulnerabilities.

performance.measureMemory and JS Self-Profiling are still risky; but may be ok to expose in isolated contexts.

CORS is a high bar for opting in (there’s a lot of friction) and not something they want to require for isolated contexts. To tackle that…

  • Cross-Origin-Resource-Policy: cross-origin
  • Cross-Origin-Opener-Policy: same-origin
  • Cross-Origin-Embedder-Policy: require-corp

How many opt-ins is that? CORS, CORP, TAO, COOP, COEP… how do they all fit together and what should devs be doing?

  • CORS – good for public resources that don’t require credentialled requests, noting there are limits (eg. you can’t CORS-enable CSS background images)
  • CORP – where exposing some details like size are ok; but the content can’t be CORS enabled
  • TAO – can be used when exposing timing doesn’t reveal anything about the user

If it all sounds a bit vague… that’s because it is. There’s work to be done to clarify it all. It needs to be very clear what you are doing with those opt-ins.

To summarise…

  • adding APIs to browsers is hard – particularly because people do so much in the browser now and browsers have a duty to protect their data
  • fundamental changes are coming – cache partitioning, isolated contexts, opt-in rationalisation
  • we can’t have everything – some features just can’t be done safely!