On the Cutting Edge: a Glimpse into the Future of Web Performance

Rita Kozlov at Lazy Load '21

Transcript
Slides

He - low everyone.

See what I did there?

Performance on the web is really, really important, especially because we don't like our videos buffering and our shopping websites taking forever to load.

So today I'll be talking about not only why it's important, but also how we can solve web performance and especially the role that the 'Edge' is going to play in the future of solving all of these problems.

So the standards for performance are really, really high today, right?

And they feel nearly impossible to achieve.

We all want those sweet one hundreds across the board Lighthouse scores to get as many users as possible, to improve retention rates.

You know all of the reasons that performance matters, but that doesn't make it easy.

In fact, it's really, really hard.

So, well, what makes it so hard?

Even answering the question of what makes it hard or how to even measure performance reliably is really hard.

At CloudFlare for when customers report any sort of performance degradation, we have an entire process around triaging it.

Because how you measure performance alone is really important to the results you get.

And then from the moment that a person's eye perceives how the browser loads their website through the network, to the server, there are so many different things that can go wrong.

Today, in particular, we'll be focusing on things that can happen around the network and solutions to that.

Even at the network layer, there are a few different ways in which performance can be degraded - from routing to, depending on the protocol that you're using and how efficient it is at transferring information.

People keep blaming things on DNS.

So we can throw that one out there too, right.

But to really frame the problem of web performance and where you can get the most bang for your buck, I think it's really helpful to go through this little thought experiment: So given an unlimited budget, could you transfer data from San Francisco to Melbourne in less than five milliseconds?

It seems like with an unlimited budget you should be able to do well just about anything, right?

Fortunately, it's not so simple.

So the distance between San Francisco and Melbourne is 12,650 kilometers.

Our culprit here that we're battling against though - and that's just as fast as humans have been able to get, or the fastest thing that we've been able to measure - is the speed of light.

So the speed of light travels at 300,000 kilometers per second.

So even if we're traveling as quickly as we possibly could, we would get from San Francisco to Melbourne in 42 milliseconds.

So unfortunately, even with an unlimited budget, it doesn't seem like 5 milliseconds is a feasible goal.

So the thing that we're really up against here is the speed of light.

And if we can't improve on it, then we need to think about ways to work around it.

And even that 42 millisecond measurement is really, really optimistic.

I like to think of the internet as having its own weather and geography, really.

So in the time that you connect to a server, even there, there are so many things that can go sideways from ISPs being congested, to literally actual weather knocking a cell tower over, right?

And geography itself is not very straightforward either, even though you might be near a data center that you think you would connect to, routing may end up taking you through another ISP, depending on who the provider peers with and the ISP that you yourself are connected to.

So all of this complicates the matters quite a bit, which is why, even though with the speed of light, you could get through that distance in 42 milliseconds, in reality, the average latency of just a single ping from San Francisco to Melbourne is around 171 milliseconds.

You might get routed around it.

It's a long distance.

There's a lot that can happen.

So the obvious answer is: if we can't beat the speed of light and go faster, well, maybe we can just travel shorter distances, right?

And if you have a user in Melbourne, hopefully they can connect to a data center that's right there.

So if your data center is 45 kilometers away from you, less than even 1 millisecond should be possible.

We figured this out quite a long time ago, but there's been an evolution of getting content closer to the actual end user.

So I'm going to talk through that evolution a bit.

The first generation of getting content closer to the users has been the traditional CDNs, right?

For static assets that rarely change, like images or JavaScript, once you've traveled all of that long distance once, we realized for the future users, you don't necessarily have to do it again.

You can hopefully serve them from the nearest location on any subsequent request.

There is another iteration of that.

So, it's one thing to say, "I want to blindly cache everything", but as we know, one of the hardest problems in computing is cache and validation.

So you want to be very specific about the things that you cache, and there's additional logic that you might want to move up to the Edge or closer to the user itself.

So there are different ways of doing this.

I can tell you at CloudFlare, the way that we started out getting into the business of having more rules or more logic at the Edge was by running this thing called EdgeSight code, where literally engineers such as myself are teasingly crafted a little snippet of code in Lua for every single customer that needed a custom TTL on an asset, or if they wanted to do a Geo redirect, but eventually the industry has reached the same conclusion.

In order for people to be able to fully express themselves on the Edge, they needed to be provided the capability to - well, how do developers express themselves?

- write code or logic there.

So the third generation is where the network itself is actually becoming the computer, allowing you to run Turing complete code and have access to storage, which we'll talk about in a bit as well.

So a few of the solutions out there that adhere to this model are Cloudflare Workers, which I'll talk about today since that's the product that I work on, but there are other similar solutions out there like Fastly's Compute@Edge, which allows you to run WebAssembly on the edge.

And AWS has recently released CoudFront Functions.

So, how are we able to achieve this?

Since we don't have massive data centers in the way that centralized compute does, where there are massive server farms that are able to fit in as many large virtual machines in them as they can.

The edge by comparison is really small.

When you're distributing code to 200 locations around the world, each of those locations can be pretty small - as small as sometimes a few servers.

So even though these solutions all work differently under the hood, they do have a similar component behind the scenes, which is, it relies on the isolate model.

Isolates, unlike Virtual machines are really, really tiny.

And what allows them to be really tiny is that the only thing that customers have to bring is the application itself.

So previous models, like a virtual machine, for example, the host only provided the hardware, everything else you had to bring yourself - the application, the libraries, the language runtime, and the operating system.

So for a VM to start up takes a whole bunch of time and it's really hefty, right?

It has to store an operating system for every single copy that it has.

An iteration and improvement on that, were containers, where you can have a single layer of the operating system, but you still have to bring an entire language runtime with you, which is what leads to the gnarly cold starts that you hear people talking about.

But for us, since we're running code for customers who are looking for performance improvement, you don't expect your CDN to take a few seconds to start up every time.

And so by running in the isolate model, we're actually able to provide the runtime itself and only bring in and spin up the application code every single time.

As a product manager, the first question I have whenever someone is talking about the technology is what are the use cases?

How are people using this?

So as far as stateless use cases on the edge go, and these were the first use cases that we started seeing as we put this in people's hands, a few of them include content personalization.

So whether we're on cookie, geography or weather, if you can inspect the request and serve customized content to the end user, they're obviously going to have a better experience.

Another example is A/B testing and traffic splitting, which I will dive into in greater detail.

Even simple things like header manipulation if you are maintaining a bunch of different origins or have wildly different assets that all need customization.

Managing that through code can be really, really helpful.

Another really great improvement has been authentication on the edge.

So previously, if you authenticated content at your origin, it would be really difficult to scale because it meant that content that had to be authenticated couldn't be cached, but by being able to run logic at the edge, you're now able to authenticate it right there, where it's already cached.

which means that's all load that's taken off your origin and better performance for your customers.

And this is especially important because it impacts paying customers, right?

Who paid for the content, and now they're able to have it be served through a cache version.

Bot mitigation or letting only real users through, through smart identification can even trick bots into wasting time on your website, right?

Since you can be really clever around the things that you put there.

And lastly, if you have an origin that has a rigid CMS running on it, it just gives you so much more control over it.

So let's talk about why it's so beneficial to run these things on the edge.

I think A/B testing is a really great example of that.

So going back to our original diagram, we have the browser, the network and the server.

Now A/B testing started as something that would run on the browser, but here it encountered a really big problem called render blocking, whereby every single time you would access it, the rest of the assets before they were loaded, they would have to wait to solve a previous problem that was encountered through a flicker problem or inconsistent experiences, right?

So you would load the webpage and it would load in green.

And then as the experiment kicked in, it would turn into blue, which would impact your web Core Vitals right?

That's exactly what we're talking about here.

Things like your Time to actual interaction or consistent layout shifts.

Running it at the server also has its own problems.

It's far from the user, which means that it impacts the latency.

It's also harder to update.

So if you are the A/B testing framework itself, you have to rely on the developers updating your SDK consistently.

And so, whereas the first - whereas running it at the browser impacts the experience metrics running in at the server impacts your Time to First Byte.

And this is where the network is really the best of both worlds, right?

Because it can be fast since it's running close to the user, it's easy to update and it can provide a consistent experience right out of the gate.

So let's walk through what an example looks like in code in real life.

Let's say I have an experiment that's running, and it's called experiment zero.

So here, I'm going to define it and I'm going to have a test response and a control response.

And the first thing that I do here is I'm going to inspect the cookie and see if the cookie for either experiment has been set.

So, if it has, then I can control the control response, but if it hasn't, then I can return the test response.

There are also many different things that we can do to iterate on top of that.

For example, this allows us to send logs somewhere that let us know who encountered, which example or rewrite the HTML on the edge, through the different experimentation.

So here, I'm going to introduce you to a couple of different APIs that are really handy when running operations at the edge.

And I can just show you an example of how it works here.

The Cache API allows you to write data directly into the edge cache.

So here, the first thing that we're going to do is define a cache key and the cache key - the way that we're going to look at it is through defining it as a request effectively.

We're going to create a new cache-is instance.

And when a request comes in, we're going to check the cache-is instance to see whether we get a match on the cache key or not.

If there's not a match, then we have to generate a new response.

In this instance, we fetch it from the origin and then we adjust the cache control and put it in the cache so that it can be fetched and matched on next time.

But another thing that we could do is run computationally expensive tasks on it, where we don't have to run that computation every single time, because we're able to store it and put it in the cache.

Speaking of computationally expensive things: One other really useful API to have at the edge is the HTMLRewriter, which is a streaming, HTML parser that's available to you directly at the edge.

The streaming means that it doesn't have to parse through the entire HTML at once before it starts returning any bytes back to the client, it can do so on the fly.

So here we have an example for using the HTML rewriter for internationalization -where I look for an element and see if it has the data, 'i18 nkey' for the internationalization.

I fetched the content from a predefined 'countryStrings' cache table that I have, right?

And then I sent the inner content of the element to the translation, through the HTML rewriter API that provides this functionality for me.

In here as a next step, you can see that determining the language of the request through the language header, which is the 'accept-language' header, then I am grabbing the asset from my key-value store, which is a type of storage on the edge that we will again, dive into in a bit, and then I'm able to call a new instance of HTMLRewriter to handle the response and transform it with the new translated string.

So these are all stateless use cases.

And if you thought that stateless use cases on the edge were cool, wait until you hear about state on the edge.

So you have a couple options here.

One that we talked about is caching, right?

And here you have a lot more granularity and control than you ever had before.

You're able to move the content itself closer to the user.

So assuming that it's already there, that leg is going to be fast, but the leg that's always going to be slow is the connection to the database or to the origin itself.

One option that Cloudflare provides is also a key-value store.

Similarly, it has a few locations around the world where it's storing the content so that you don't have to worry about configuring or setting up an origin, but it still relies on you caching the content and it's really great for high read frequency and low write frequency.

So once I write a configuration, like an A/B test variation, everyone else can read from it really, really quickly.

But if I'm updating a value such as a counter, the synchronization doesn't allow me to get the outputs that I want as quickly as I need to without incurring additional latency.

But even here we've had really fascinating innovation.

So a really new model of thinking about this, that Cloudflare started out with, but we believe others will adopt as well, is called 'Durable Objects'.

So what's really clever about durable objects is that in reality, data doesn't need to exist in every single edge location at all times.

For example, if I have a user profile on some website, I'm probably most frequently accessing it from a single location.

So for me, it's from San Francisco, or maybe if I'm traveling around my location might move around, but still I'm only accessing it from one place in time.

Maybe someone else will access it as well, but there's almost always a single most common point of entry.

And so what durable objects do is they define the location of the object per the object and move it around depending on what the access patterns are, allowing you to have the fastest performance possible while also being able to maintain the right guarantees that users need.

While today you might still need databases in order to run fully fledged applications in the edge.

We're iterating on this more and more, and this is going to become an increasingly viable option for running applications.

So, what's next?

I alluded to this a bit just earlier, right?

But, well, this is the diagram that we're so used to where there's the browser, the network, at which level you can do some things, and the server.

With things like durable objects, what we're trending towards more and more is eliminating the server and having the network become the server itself, with capabilities around running the actual logic of your application and storing the data for your users in every single edge location, and being able to provide that really quick latency to every single one of your users.

And again, when we're talking about hundreds of milliseconds, rather than micro - measuring the micro benchmarks on your server that vary between zero and 5 milliseconds, this is where you're able to actually get the most bang for your buck by not beating the speed of light, but running applications even closer.

There's one more thing that I want to point out about this model.

One type of performance that we talk about that we're all here to talk about today is the performance of web applications or applications themselves, right?

But another type of performance that matters a lot is the performance of your engineering teams.

The way that engineering for the most part works today is that you have to develop an MVP, right?

You develop it locally.

You then have to deploy it to a particular region that you choose.

Then, if you want to service millions and millions of users, you have to scale this, righ?

And you have to also pick and choose where the users are that are most important to you, literally from the very first decision that you're making.

When you're deploying an application on the cloud today, it asks you: "What region do you want to run on?" Or "Which users do you care about the most?" And then you have to spend time optimizing it, making it faster, through caching, through improving the performance of your application.

What's really powerful about this model is that it turns all of this into a single step of just deploying your application from "Hello, World!" to millions of users around the world.

That way every single user is able to get that 100 perfect Lighthouse score without you having to spend any time optimizing it.

Applications should be really fast out of the box, and we're already starting to see applications trending in that direction more and more.

By spending less time optimizing and more rethinking the entire way in which we're deploying applications given what we now know about how the internet works and where those really big long poles are, that are degrading our performance.

So, thank you so much for taking the time to think about the future of performance and hopefully I'll see you on the edge!

Impossible beauty performance standards

Facepalm emoji

Image of a generic Lighthouse performance interface screen with 100% performance metric rankings

Why is this so hard?

What's slowing down your website?

Three images representing performance in the Browser (represented by a rocket ship icon), the Network (represented by a globe icon overlaid with connection nodes, and the Server (represented by three hardware stacks with large storage capacity numbers overlaid)

Under Browser, a red X appears underneath indicating: X Bloated client-side applications

Under Server, three red X marks appear:
X Cold starts
X Rendering
X Databases are slow

Under Network, three red X marks appear saying:
X Routing
X Protocols
X DNS? (palms to sky 'shrugs' emoji)

Given an unlimited budget, could you transfer data from San Francisco to Melbourne in less than 5 milliseconds?

Timeline calculation attempting to answer the above question

12,650km

Distance from San Francisco to Melbourne

300k

Speed of light is 300,000 km per second

42ms

Taking the speed of light, is would take 42 milliseconds to get from San Francisco to Melbourne

No : (

Unless you invest heavier in beating the speed of light

Flattened map of the world with a line overlaid traversing the route from San Francisco, U.S.A to Melbourne, Australia, with the label:
42ms
min

Internet weather and geography

Mockup of a hybrid San Francisco weather map but overlaid with a web speed of 133mbsp and 'weather' notes saying: "Partly congested with a chance of jitter" and "Small Craft Advisory" along with a picture of the sun coming out from behind a cloud with the forecast "11.3 mbps/upload"

Flattened map of the world with a line overlaid traversing the route from San Francisco, U.S.A to Melbourne, Australia, with the label: 42ms
min

Image of the flattened world map overlaid with a zoomed in aerial map of Melbourne showing the closest data centre within 45 kilometres. The zoomed map is bounded by parenthesis indicating that the data centre is approximately 45km away and could therefore be reached in a minimum of 1 millisecond assuming you were travelling at the speed of light

~ 45km
< 1ms minimum

The evolution of the edge and bringing content closer to the user

The journey of content in the edge

Image of three steps in the content journey. These are (from left to right):

Generation one; CDN (represented by a static image icon.)

Cache static assets
Serve them from the nearest location

Generation two; Smart edge (represented by a nested file stack icon)

Edge-side code
VCL
Rules

Generation three; The network is the computer (represented by a globe icon)

Functions (turing-complete language support)
Storage
Rules

The evolution of the edge

Repeat image of the prior 3 generations slide overlaid with a a box on the far right dellinating the current evolution of the edge. In the box are the Fastly Compute@Edge,Cloudflare Workers, and AW CloudFront Functions logos

Isolates

Side by side diagrams representing Virtual machine and Isolate model. Both are made of boxes in a grid layout, each with two elements - grey boxes with a circle inside composed of two curved arrows chasing their tails, and blue boxes with a pair of cruelty brackets. A key shows the blue boxes represent User code and the grey boxes represent Process overhead. On the Virtual machine diagram, the grid is uniform grey boxes, each inset with a small blue box in one corner, representing user code within the process overhead. In the Isolate model diagram, the grid is uniformly user code boxes but for one process overhead in the bottom right corner.

Table delineating Virtual Machines, Containers, and Isolates and comparing which elements are powered by the guest and which are powered by the host. Powered by Guest elements are colored in orange, and Powered by Host elements are colored in blue. For VM's, the following are 'guest' powered:

Application
Libraries
Language runtime
Operating System
Language runtime

...and the following is 'host' powered:

Hardware

For Containers, the following are 'guest' powered:

Application
Libraries
Language runtime

...and the following are 'host' powered:

Operating System
Hardware

For Isolates, the following are 'guest' powered:

Application
Uncommon Libraries

...and the following are 'host' powered:

Web Platform API
JS Runtimes
Operating System
Hardware

Show me the use cases

Stateless use cases

Content personalization

Vary content based on cookie, geography, weather

A/B testing

Split traffic and test different content

Header manipulation

Cache control, CORS, CSP...

Authentication

Caching content that needs to be authorized

Bot mitigation

Only let the real homies through

Uncontrollable origin

Rigid CMSs can be hard to deal with

Case study: A/B Testing

For the browser, a red X appears underneath indicating: Render blocking

A/B Testing

Image of a WebPageTest waterfall chart with a red box overlaid on a vertical line indicating Start Render

A/B Testing

Repeat of the Browser/Network/Image diagram with a new X added under Browser reading: Flicker problem

A/B Testing

Repeat of the Browser/Network/Image diagram with a new X added under Browser reading: Bad web core vitals (LCP, FID, CLS)

A/B Testing

Repeat of the Browser/Network/Image diagram with a new X added under Server reading: Far from user (latency)

A/B Testing

Repeat of the Browser/Network/Image diagram with a new X added under Server reading: Harder to update

A/B Testing

Repeat of the Browser/Network/Image diagram with a new X added under Server reading: Bad TTFB

A/B Testing

Repeat of the Browser/Network/Image diagram with three green check marks added under Network reading:
✓ Fast (close to the user)
✓ Easy to update
✓ consistent experience

function handleRequest(request) {
	const NAME = "experiment-0"

	// The Responses below are placeholders. You can set up a custom path for each test (e.g. /control/somepath ).
	const TEST RESPONSE = new Response("Test group") // e.g. await fetch("/test/sompath", request) const CONTROL_RESPONSE = new Response("Control group") // e.g. await fetch("/control/sompath", request)

	// Determine which group this requester is in. const cookie= request.headers.get("cookie")
	if (cookie && cookie.includes(`${NAME}=control`)) {
		return CONTROL_RESPONSE
	} else if (cookie && cookie.includes(`${NAME}=test`)) {
		return TEST_RESPONSE
	} else {
		// If there is no cookie, this is a new client. Choose a group and set the cookie. 
		const group = Math.random() < 0.5 ? "test" : "control" // 50/50 split
		const response group "control" ? CONTROL RESPONSE TEST RESPONSE response.headers.append("Set-Cookie", "${NAME}=${group); path=/`)

			return response
		}
	}

	addEventListener("fetch", event => {
		event.respondwith(handleRequest(event.request))
	})

APIs to know

Cache API

// Cache API

// Construct the cache key from the cache URL
const cacheKey new Request(cacheUrl.toString(), request)
const cache caches.default

// Check whether the value is already available in the cache
let response await cache.match(cacheKey)

if (!response) {

	// If not in cache, get it from origin
	response = await fetch(request)
	// Must use Response constructor to inherit all of response's fields
	response = new Response(response.body, response)
	// Any changes made to the response here will be reflected in the cached value
	// Also good for storing something computationally expensive
	response.headers.append("Cache-Control", "s-maxage=10")

	event.waitUntil(cache.put(cacheKey, response.clone()))
}

HTMLRewriter

// HTMLRewriter (streaming HTML parser)
class ElementHandler {
	constructor(countryStrings) {
		this.countryStrings countryStrings =
	}

	element(element) {
		const i18nKey element.getAttribute("data-i18n-key")
		if (i18nKey) {
			const translation = this.countryStrings[i18nkey]
			if (translation) {
				element.setInnerContent(translation)
			}
		}
	}
}
}

// Set the language based on the "Accept-Language" header
const languageHeader = event.request.headers.get("Accept-Language") const language = parser.pick(["de"], languageHeader)
const countryStrings = strings[language] || (

		// Get the translation from storage (in this case KV)
		const response = await getAssetFromKV(event, options)

		// Run the transform on the country
		// strings using the HTML Rewriter
		return new HTMLRewriter().on("[data-i18n-key]",
			new ElementHandler(countryStrings)).transform(response)

What about state?

Caching

Diagram of a rocket representing the server connected with a short bidirectional arrow to a globe representing the user on a network in the middle. The network/user is connected by an additional, much longer bidirectional arrow to a data stack on the far right of the diagram representing the database or origin

Key-value store

Underneath the cache diagram a 'Key-value store" diagram appears. It features the same elements and request times, but the database has been replaced in this diagram by a cloud representing Cloudflare's functionality to improve connection speed

Repeat of the prior slide with a green checkmark underneath the Key-value store diagram indicating:
✓ High read, low write

Durable Objects

Diagram of a rocket representing the server connected with a short bidirectional arrow to a data stack surrounded by network activity representing a what Cloudflare conceives of as a durable object

What’s next?

Repeat image from earlier in the presentation with the Browser, Network, and Server.

What’s next?

A large red X mark has covered the server in the diagram

One more thing…

The other kind of performance

Diagrammatic timeline of the way engineering teams typically work. From left to right: Build MVP (represented by a cog icon), Display MVP (represented by an arrow pointing to a metric), Scale (represented by a nested stack of screens), and Make it fast (represented by a speedometer dial).

Image of a screen interface reading "Hello, World!" with stars in the backdrop

Image of a generic Lighthouse performance interface screen with 100% performance metric rankings