(upbeat music) - So yeah, so I'm going to talk about HTTP/2 Server Push. Who's heard of Server Push? Okay, good.

That's great.

That's a good start.

So as I mentioned, I work at Fastly.

My Twitter handle is mnot and to start off, I'm gonna go back a fair ways.

Actually now I feel really old after that intro. In the very beginning, before the Internet when I was in university, we didn't have web browsers. We downloaded software and so you had this experience in 1992 and 93 where if you wanted to do something on the Internet, you downloaded something a big chunk and then you ran it.

And so there was this delayed gratification. Much of my university years were spent kind of sitting here and saying, okay, let's get the download come down.

Oh, now I can do this.

And then everything changed.

Tim came up with the web and we had these fantastic things called web browsers, which allowed you to interact with things in a much more vibrant fashion. All of a sudden you could just click on a link and it seemed like instantly or nearly instantly, you'd be able to do something.

Maybe not the full functionality, but you'd be able to do something and that was really cool. And the reason was that you didn't download the whole application in one chunk.

You downloaded the html and then referred to other things then you downloaded those things.

And so the browser could be smart about how it presented that to the user.

It could start to show you the page and then fill in the images and then maybe put the CSS to to the to the page eventually and so forth and so on and eventually run the scripts. That's a whole other story.

But this kind of iterative incremental model allowed the page to become functional earlier and that was great.

That gave us a lot of interesting capabilities in the web. Like for example, the browser can prioritise what it's loading.

It can say I'm going to hold off and loading images right now because I need to get this JavaScript and CSS down first to make the page interactive more quickly.

And so we have this.

In modern browsers, it can even do things like think to itself, okay.

These images down at the bottom or this material down at the bottom isn't in the viewport whereas these things up here are.

And so as a result, I'm going to prioritise this stuff that's in the viewport.

We have the capability to do that.

Great for performance.

Likewise, resources are often shared between pages. You don't want to download the same CSS and JavaScript and images for every page view have, if they're shared between the pages.

You want to download them once, keep them in the web cache and then when they become invalid, if they become stale or they change, you want to update it once, not for every page view.

That gets you a lot more value out of cacheing. It's especially because the bandwidth that you have between the browser and the server is always going to be constrained.

And every wasted byte is an opportunity cost effectively. Another interesting thing about having this model for how we compose pages in the web is that we allow other views of the data.

So this is the safari reader mode.

You have screen readers, you have lots of different ways of looking at this data and the browser can choose whether or not it downloads those other assets. Likewise for search engines, same sort of thing. That's a really great capability.

I didn't know this by the way, but in Safari, this little thing, you can change the fonts and you can change the colours and everything. And I was inordinately excited about that when I found out yesterday.

This model has allowed us through the history of the web to come to a place where an average page load is a meg and a half on desktop and well north of a meg on mobile.

This is from the HTTP archive, which is kinda crazy if you go back to 20 years ago. The megabyte was a lot of data and we're doing this just to browse the web and somehow people are able to get the information they want and not wonder away because we have this efficient loading model.

Does anybody read hacker news here? Does anybody read the end gate summary of hacker news? They just started listing the page wait for each of the articles they review.

I thought that was really cool.

But anyway, it was really fun.

At the same time, our average number of requests to do a page load has steadily gone up over the years as well.

So this is again from HTTP archive and you can see where we're kind of at a steady state now, which is I guess good.

On the desktop to load a page, it's on average 75 requests and on mobile at 67 requests and so that's a lot of requests.

In HTTP/1, requests had a lot of overhead to them. The way that the protocol was engineered, we have what we call head of line blocking and I'm not going to go into the whole thing now because that would be effectively an HTP/2 talk. But the way that it used TCP to get those requests, that was highly inefficient.

There were a lot of really nasty trade offs that were made. It was really bad for performance.

HTTP/2 fixed that with multiplexing.

HTTP/2 requests are much lower overhead.

So the fact that we have 75 requests on a page load is not nearly as worrisome as it used to be with HTTP/1. That's great.

However, there's still another problem.

In this model of how we lead pages, we have to make requests to find out what requests to make. So you know my silly illustration, which is cool because it's a little JavaScript library that did this for me.

If you look, you get the HTML page and it comes back and then the browser parses the HTML and then you get, I need to get some JavaScript and some CSS. So you make that request and you get the JavaScript and CSS back the top level stuff.

But it refers to other images, let's say in the CSS or the JavaScript wants to dynamically load other JavaScript or other resources.

So you make those requests and you get them back and it might take a little while.

And then finally you get the images on the page and you get that back and so what you're seeing here is a number of round trips that you have to make to load the webpage and each round trip is an opportunity to introduce more latency. Another view of this which might be more familiar is this, the waterfall and you see we have these places in the waterfall where nothing is happening. And that's because we've loaded the HTML and we paused the HTML and we make those requests but we don't get those responses back for a round trip. We waste around trip here and so we get the, this is probably the JavaScript and CSS and some some ancillary resources that we've learned from that front page.

Maybe some images, but then we have another round trip here where we have to wait to load some more stuff because we get discovered some stuff up here and so forth and so on.

We have this wasted bandwidth and so that's opportunity cost.

That's adding latency to your page load.

This is in protocol design, what we call a chatty protocol. There's lots of back and forth and that's something we want to try and avoid if at all possible, to keep things performant. I'm not gonna talk about why performance is so important on the web.

I hope that everybody's on board with that now. There's lots of studies that show why but this is one attribute of the protocol that we want to try and engineer out as much as possible. While still making it flexible, while still keeping that fine grained cache ability there and the ability to compose pages from multiple resources because that's important.

And so in HTTP/1, a number of different techniques were developed to combat this.

You got CSS spreading where you combine a bunch of different images into one and then you use CSS to kind of have a window go over that so that you download all of your icons in one go and then you can go and show them selectively. Inlining your in data URL.

So a lot of sites still do this where you can inline a font for example, in a data URL and then you don't have that round trip of, okay, I need to go fetch the font.

Here comes the front.

It's right there.

Of course when you're inline, you have to base 64. So that's not terribly efficient and you have to inline it in every page.

So that means that you don't really get the benefit of the cache.

It has to be downloaded for each page.

Again, not great for performance.

Finally, we have JavaScript and CSS concatenation where you lump it all together as much as possible into one download.

Again, to avoid those round trips.

But again, doing these things takes us back to a world where you download this big, lumpy thing and then you run it.

And that's not great for interactive performance. Flash and shockwave are one example, I guess JavaScript was another.

Great once everything's loaded, but you have to sit around and wait for that progress bar.

So what we need is a way to avoid this request gap for these deep resources.

For these resources that are referred to by other resources. We need a shortcut.

And so the idea, this is the thinking behind what became Server Push.

It was introduced in Speedy Google's protocol in I guess 2009 and then it became part of HTTP/2 in 2015. This is Mike Belshe.

Used to be at Google now at a Bitcoin startup. Roberto Peon used to be at Google now at Facebook and Martin Thomson, the editor of the HTTP Spec. Roberto and Mike were the authors of SPDY. They weren't really the people behind SPDY at Google. They digital their 20% time and they came along and helped us do HTTP/2 in the IETF.

And so Server Push was part of HTTP/2 and everybody got really excited about it.

We've got this great new capability to do interesting new things and maybe avoid these latencies.

And so Server Push, the idea behind it is pretty simple. It's conceptually the server saying, here's a request and response pair, I think that you're about to need. It allows you to push a synthetic request response from the server to the client and then the client effectively put it in cache and says, okay thanks.

And then when it has a reference it looks and it's cache and says, oh wait, I don't need that.

You already sent it to me.

I've already got it right here.

And so you avoid that round trip of actually sending the request and getting the response because it's already in the appliance cache. Sounds great and people got really excited about it. And so this is kind of that silly little diagram again, but for Server Push where you get the page and the server says, okay, here's your HTML and yeah, I'm going to go and get the images at the same time once I discovered that.

But it sends back not only the HTML but the top level of CSS and script and the images referred and whatever else gets pushed.

And then finally the images referred to from the image tags and so forth.

So it gives the server the flexibility to decide when it wants to send the secondary resources. Notice there aren't any actual requests being sent from the client to the server.

The server is sending the response and a synthetic request along with it because you need the request to match what is going to be sent.

So it says, well, I think you're going to send a request that looks kind of like this.

Gets imaged or whatever.

And then the client can match that up and say, Oh yeah, it was about to send a request like that. Here you go.

The protocol mechanism inside of HTTP/2 to do this, is called push promise.

It's a frame type.

And so it's just this very low level message inside of the HTTP protocol that says, I'm promising you that I'm about to push a response that looks like this.

And as I mentioned before, it has to send a long and synthetic request alongside it.

The response has to be cacheable because effectively you want to keep it around for a little while so you can match it to a subsequent request.

It has to be associated with a previous request. You have to say this is in relation to that thing over there and that so that if multiple things are going on, for example, if a proxy is serving multiple clients, it can say, well I should push this that way rather than okay, I don't know where to send it. And finally it is hop-by-hop, which means that if there is a proxy in the middle, for example, a CDN, it has to decide what to do with it. It doesn't necessarily automatically forward it. It has to be configured to say, I'll forward this or I'll do something else based on this push that I get. We do have tooling that lets you see how push is being used. This is loading my website right now in Chrome. You can see in the Dev tools on the network tab, the protocol is HTTP/2.

This was the initial page load where it was figuring out that the site support HTTP/2 and then there's the HTML and then you get pushed a few bits of CSS and JavaScript, I think and those get listed as Push and if you dig a little deeper, you get this view which shows you yes, indeed it was pushed and how long it took to receive the push and how long it took to read the push from the cache. One of the big questions that comes up when people use Server Push is what if the client didn't want it? So for example, if the client already has it in cache or I can't match it to any outstanding request or it knows it's not going to use them because maybe images are turned off or something like that. If the server pushes something, you don't want to waste that bandwidth if at all possible. So there are a couple of answers to this.

One is there's a setting.

There are connection wide settings and HTTP /2 and there's a setting called settings enable push. The client can explicitly turn off push for the whole connection.

That's somewhat drastic, but if you're a maybe a search engine spider or if you're in reader mode, that might be a sensible thing to do.

A much more fine grained thing to do is we have a mechanism, another frame type in HTTP code reset stream. This allows a server or a either appear to cancel stream in flight.

So in HTTP/1, if you wanted to cancel a download, the only option you really had was to close the connection. That's the only reliable clean way to cancel a download. In HTTP/2 you can use reset stream to explicitly shut down just that one request so the server stops sending you that response. So if you're doing a big download, you can cancel it or if the server pushes you something and you've already got it in your cache, for example, you can send the reset stream and now the server once it receives that, it will stop sending that response to you.

And so that's a way the client can retain control over the interaction much more so than something with a like, for example data URL in lining.

With inlining, it's part of the HTML.

You don't have a choice as the client as to whether you receive it or not.

With Server Push, you can always reset that stream. The downside here is that if you reset a stream, you have one round trip before you see the results because that reset stream has to go from the browser up to the server and the server then has to say, oh okay, I should stop sending this.

And then the connection has to drain whatever is in flight. So you have a round trip before the reset set stream is actually honoured.

Again, that's not great for the opportunity cost. That round trip of data in flight could have been used for other things.

So this is an option.

It's better than nothing, but it's not ideal. To try and address that, a couple of folks have come up with a proposal called Cache Digest.

And the idea here is quite interesting.

It's what if the browser sends up a representation of what it has in cache to the server when it first opens the connection so that the server can look at it, examine that and say oh, I see you've already got this in cache therefore I won't send that push.

So you'd get an optimal kind of interaction there. The server knows what the client has in cache and it makes decisions based upon that so that it only sends things that the client actually needs. It turns out this is pretty workable.

There's a really interesting little thing.

It's kind of like a bloom filter, but it's called a golden compressed set and a GCS gives you a very compact representation of set membership.

So you can say okay, here's a thousand things I have in cache, run the algorithm on it and that comes out over the wire as depending on the parameters you put in it. Something like 300 to 1,000 bytes, if I remember correctly. And so the idea being that when the browser open up the new connection, it sends this set up to the server. The server can then use that to train itself about what it should be sending in pushes.

I put the asterisk there because this isn't supported in any browsers yet.

It's just an idea.

It's a very interesting idea, but you can't use it yet. Sorry.

The other issue that comes up is the server not knowing what the client needs right now. If you're 100 milliseconds away from the server, all of the decisions that the server makes about what to send are based upon what it knows from what the browser did 100 milliseconds ago. And that means that it can't adjust quickly enough in some circumstances and it can't send the right information in some circumstances. Server Push is not magical and if you take nothing else away from this talk, please, please take this slide away from this talk. Developers approach it as if oh, okay, I have this extra channel that I can push things to the client and it's free bandwidth.

This is not the case.

Whatever you do in Server Push competes with the requests that the browser is making at the same time.

And there's only so much bandwidth to go around. And so if you look at it, the server is acting on information that it has which is incomplete because it's not the browser, doesn't know what the browser needs and is at least whatever a round trip time is old. Whereas the browser has current and complete information about what it needs.

And so this changes how we thought about Server Push as we thought more and more about it.

So this is a slide from the last ITF meeting we had in Montreal in July.

We had a presentation from the head of networking for Google Chrome, Brad Lassie who talked about some experiments they ran with Server Push and Chrome.

They're evaluating how it's working.

And the place they came to was that there's a maximum usefulness of push in that if you don't want push to compete with what the browser, the requests the browser is sending, it means that you only want to fill dead air. You only went to fill times when the browser isn't making requests.

And practically speaking, that's after the browser makes its HTML request and before you send in any thinking response. So this is the algorithm they came up with. The maximum size of push resources that you should be filling the pipe with are your bandwidth times around trip time, minus the size of the main resource.

You just want to fill up extra bandwidth that's otherwise we'd be unutilized.

And so to give some examples which hopefully, make this a little more digestible, if you look at the round trip times in places like South Korea and the US and India and multiply them by the connection speeds, you get very similar numbers for the maximum amount of data you can put. You can stuff into one round trip time.

And so this is the maximum amount of data you should be thinking about putting it into push. But there's an asterisk is there as well because the real use case for Server Push is when you've just opened a connection.

And if I go way back to that slide with the waterfall, just that one gap there.

When you open up a TCP connection, you have what's called an initial congestion window, which is 10 packets and that's only a 14k-ish. And so that's really what you want to be targeting because when you open a connection, TCP paces at how much data you can send and if you want that to be fully utilised and interactive, you want the browser to be able to make the requests at once and get the responses as quickly as possible. That means you don't want to overstuff it with pushes and once you put your main HTML down, there's not a lot of data left to play around with push. In other words, don't do this.

Don't go and just merrily push as much as you can with a client.

It's a very bad pattern.

And I think the best way I've heard it summed up was Patrick McManus who until recently was the core network developer for Firefox is now one of my colleagues at Fastly.

He said, "Push is generally a one rep optimization." You're looking to optimise just that round trip where there is that dead air.

And so google ran some experiments where they turned off Push in the browser and I won't show you the chart that just showed the results for all domains because one of the issues of Push's adoption is very, very, very low.

Not many website send Push at all.

this is filtered out for sites that actually use Push. The difference between the blue is with Push off and the red is with Push on.

You can see it saves a little bit, but at the 50th percentile you're talking about tens of milliseconds for a page load that you're saving, that you're shaving off.

In the 99th percentile, it's bigger because you're accounting probably for last there. Some packets get lost and then you have to do rain transmits and so it amplifies these differences that we see.

But generally speaking, for most of your users, there's very little difference when you turn Push on. In the same ITF I gave a presentation where they measured their deployment of HTTP Server Push.

They have a product as I understand it, that uses a lot of heuristics and a lot of interesting algorithms to figure out exactly when and where to push. They've put a lot of engineering effort into this and these were the results they presented and you can see the blue lines, there was no statistical difference of the effect of Push for, for a given set of origins and endpoints that were accessing them.

And there were a few where there was, yes, there was a statistical change in Push, but this representative their best effort to date. So with a lot of work, sometimes it might make a difference but not a huge one.

Another aspect of Push you need to be aware of is that for better or worse, when we specified it, we were quite vague about a lot of different parts of it and as a result, there's a lot of drift between the different browsers implementations. Browser support from push is good.

All the major browsers do you support Server Push but the way that they support it is very different. And Jake Archibald from Google wrote an excellent article about why it's a lot harder than you'd think it would be to start with. I'd encourage you to read that if you're interested in it. We were aware of this in the working group. We had a whole number of discussions about the role of Server Push and HTTP and how we can tighten it up.

But as you can see by the number of replies, we never really got anywhere with those discussions. That's unfortunate but at this point, it's what I would personally call an under specified mechanism the protocol.

As a result, some engineers at Google on the Chrome team came up with some rules of thumb for how to use push. These are a bit old.

You can see they're in 2016 but they're a good resource to figure out, if you are going to use Server Push.

These are the sorts of things you need to be thinking about. As you can see, unfortunately Server Push does not always improve page load performance. And as I said before, put just enough resources to fill idled network time and no more.

So to summarise, Server Push is cool because yeah, you can push the deep resources and that's great. You can avoid those extra round trips and you can send a push as soon as the request is received, which is great, but there are a lot of issues. The server doesn't have good information sometimes or a lot of the time.

The push responses can compete with the more important browser requests or the more shall we say, well informed browser requests and while it's supported by a lot of browsers, there are a lot of gotchas to using push.

At the same time, there's been other developments in web performance and the most interesting for the purposes of this talk is a specification in the W3C called preload by Ilya Grigorik and Yoav Weiss. Yoav was the same one who was working on the stuff I was talking about before, where Zach mine and getting metrics on push.

He's just moved over from Akamai to Google. Interestingly enough, working for Brad Lassie I don't know if that's gonna change his position on push or not, I guess we'll find out.

But the preload Spec is a browser spec and what it does is it gives you a way to effectively say, you probably are going to need this that I'm linking to. And so it's not actually sending the response from the server to the client.

It's just giving a hint and saying, hey, here's a link you might need to follow up pretty soon. And you do it, you can do it in the HTML with a link relation, preload link relation. You just give a little later Or you can do it on an HTP Hetero link header. you can also do it in JavaScript.

And so the browser now gets this little hint that it knows, I'm probably going to need this pretty soon and I can slot that into its loading regime and figure out when the best time to load it is. So if you have some CSS that references some images or if you have some JavaScript that references other JavaScript, you can put these preload hints in the HTML to give the browser engine a heads up about what request it's going to need to be making pretty soon.

But it balances that with the other requests that it's already making.

It can be much more intelligent about it.

And so again, you get your page and you get your HTML back, but it contains these hints.

And so then the browser can decide what requests it's going to make.

You notice it is making requests here and then the responses come back.

But it can use all of these different hints that it gets to figure out how to load those things most efficiently. Even if they're not actually referred to you by the HTML except by this hint.

The one downside that's notable here is that you still have a round trip time here. You still have to make those requests.

You don't just get the data.

And so it does introduce some latency there. And so that's the big trade-off of using preload is that it's not strictly speaking, the most optimal way to do it, but the browser has a lot more information about what needs to be loaded and how than the server does. And so it's a much more intelligent way to do it. And in practise, what people are finding is that this really works well.

So for example, one of vacillates customers, Shopify, they said they switched to preloading their fonts and using the preload tech for their fonts. They saw 50% improvement in time detects paints and they got rid of the flush of invisible texts completely. that's pretty amazing.

That's a really good performance benefit from just adding a couple of texts or HTML or a couple of link headers.

It works really well.

And so summarise preload, you can request the deep resources.

The browser is the one making the decisions about what gets fetched and when.

It has this one downside only after the HTML response starts.

There's one more factor to consider here though. The delay, this penalty that you're paying about that round trip for those requests.

It's not just necessarily the latency on the network. In some cases it can be your server think time. If it takes a long time for your server to generate some HTML and by a long time, I mean a couple hundred milliseconds that is latency that could have been used for other purposes to start loading your CSS and your JavaScript in your fonts and whatever else.

And so this becomes a consideration.

It's something we like to try and optimise. The server, if you're using Server Push, you can push during that think time and that's great. Preload relies on the HTTP or the HTML headers to come down before we can do it.

And so if your application can't generate, the beginning of an HTTP response, if you need to block on the status code, for example. You need a block on the HTTP headers, or you need to block on your HTML header, you don't have a way to inform the client that it should start downloading these other things while you're thinking.

So one optimization that we've talked about and that is a published as RFC 8097, it's an experimental RFC and it's by one of my colleagues at Fastly, Kazuko Oku, who is the author of the H2O Server, which is fantastic if you haven't checked it out It's called an HTTP status code for indicating hints. It's a new HTTP status code and it's the 103 status code. And 103 is in the 1XX class of status codes. They're not as familiar for most people.

It's what we call an HTP, a non final response. And so normally an HTTP, you send a request and you get one response.

That's a one to one relationship.

But with the 1XX status codes, it's different. You can send as many of these as you want before sending a final status code.

So like a 200 or 400 or 500.

And so what Kazuko has done in this spec is he's defined a new status code that allows you to send HTTP headers to the client as hints. They're called early hints.

Before you send that final response and what you can do is you can put link headers in those 103 responses to hint to the client. Hey, you should start downloading this because I'm not ready to give you a final response. I'm still thinking about it but I know you're probably going to need these. Let's use this for something good.

And so this is the interaction that you might think of here. You get the page, but you're not ready to send the final response back but you do send this early hints response back with a couple of link headers that tell the browser start preloading.

And so the browser says, okay, I'll start making those requests.

And so you don't have the HTML for me yet.

And it can do that and eventually you can start sending the HTML as well.

Remember, this is HTTP/2, so it's multiplex. So you can do lots of different things at once and you're using this time that otherwise would have just been hanging, waiting for that HTML to come back for more useful things. That's great.

The one problem with one of 103 is that the browsers were initially very excited about this and said, hey, that looks really cool and we got great engagement.

We got the RFC at really fast.

We actually have ways we supported it at Fastly from the origin server to us but from the browser standpoint, to integrate it into their stack, it turns out it's more complex than they thought at first because of the way they do loading internally. It's not a slam dunk from their standpoint. They're still interested, we're still talking about it with them, but it's not available today unfortunately. So, assuming that one day we do get 103, we'll be able to request deep resources with preload. The browser can decide the priority and whether to fetch as soon as the request is received with 103.

It still requires that one round trip for the hint and the requests.

So you can't mitigate that initial network latency, but it is just one round trip.

And some of this is not yet supported in browsers. Again, I take you to this notion that Patrick came up with. Push is generally a one round trip optimization. And when you're thinking about Server Push and these things, you really do need to think how much engineering efforts and uncertainty and am I willing to go through for one round trip.

If you're using a CDN or you're using cloud hosts all over the world, one round trip is probably 20, 30 milliseconds, maybe 50.

If it's a mobile client in an emerging market, then yeah, it's a bit more but it's still by some accounts below the threshold of human perception.

Do you want to go to the engineering effort? And in fact, this was one of the last slides in Brad's, from the Chrome teams presentation back in Montreal back in July.

If we destroyed push, if we stopped supporting the browser, would anyone really notice? And I'm hearing this more and more from folks in the community.

Server Push is misunderstood by developers. They think it's free data and it's not.

In the optimal case, it saves you one round trip and in a lot of other cases it's not optimal. And so, despite the hugeness of the companies working on them, browser resources are always constrained. They only have so many developers to work on this fantastically complex piece of software we call a web browser.

And so to their mind, they can be using those resources on more productive, more fruitful things than Server Push. And so this is an open question in the community right now. How much effort are we going to put into actually cleaning up Server Push and making it more usable when the situation around it is fairly doubtful? My advice that I give people right now is go ahead and use preload for those deep resources. It's an easy win.

It gives you a nice performance aspects.

It's available today in browsers.

In many cases, Server Push is not necessary, but if you do use it, really focus it on that one round trip use case after the HTML, no more. Don't try and push everything merrily down the pipe. If you have server, think time, that's where it gets a little more interesting. So think about that.

All of this is still evolving.

We don't have the answers now.

Anybody who says Server Push is a best practise I would have a lot of doubt about right now. We're still feeling our way around all of these aspects of performance and they're still new mechanisms being talked about and vetted through.

But if you do any of this, collect metrics. Base your decisions on data.

That really is the key.

The more data we have, the better.

And if you collect that data, please share it. The more data that people on the web bring to us helps us inform our decisions about how to evolve the web itself.

It's great that folks like Google come to us and bring lots of data.

I really don't like basing all of our decisions just on Google.

So please, if you do collect this data, bring it to us That's all I have.

How'd I go on time? - [Female] Plenty of time left.

- Awesome.

(applause) (upbeat music)