Here Comes QUIC

Mark Nottingham at Code 2018

Transcript

(upbeat music) (audience applauds) - Hi. Thanks, thanks John.

That's correct, uh, Kelly O'Dwyer told me I'm just as Australian as you are, when I got naturalised.

She also told me not to go for Collingwood. So, yeah. (members of audience laugh)

So, I'm gonna talk about QUIC.

John almost got that right.

This is the quick definition of QUIC.

So it's a new internet transport protocol.

It started at Google, and now we're standardising it in the IETF. So, it does involve TLS.

It also involves the transport layer.

When we talk about the internet, we talk about a stack of abstractions.

And we often have this what we call the hourglass model, where IP is the waist, everything goes through IP. There's really only one IP, although we'll get to that in a second.

Below that you have a lot of choices about the different physical transports you can operate upon.

You know, whether it's Cat5 or Wi-Fi or whatever. 5G is now the big thing that has tonnes of money talking about it.

And then above the waist you've got lots of choices as well. You've got this transport layer with things like TCP and UDP and ICMP, and then above that you've got the actual application layer.

So why do we have this, this, you know? IP is just packets, it's just routing packets around the internet. Why do we need this layer between that and the applications? And the answer is that there are a bunch of common services. Which, does my laser work? Can you see that? Okay.

That, that applications really need, and they don't wanna have to reinvent them themselves. These are really tedious and quite complex things like flow control, so to make sure that your buffers don't get too big.

And congestion control so that, you know when I'm surfing the web my wife can still use voice over IP or whatever on the same connection.

Reliable delivery, because, you know, the IP layer is unreliable, delivery can get lost anywhere along the way. Multiplexing so we can do different things at the same time and some sort of abstraction above just hears a bunch of packets flung from A to B. So, you know, whether that's a stream abstraction, a message abstraction, whatever.

Transport protocols provide this, and the big one is TCP. Who's familiar with TCP? Okay, cool.

TCP is the abstraction that kind of the internet was built upon.

We often see TCP slash IP, and that's seen as one atomic thing, but they're really two different layers.

It's just so common.

Less used is UDP, because it's much closer, it's basically a very thin shim on top of IP.

And UDP is used for things like, right now like gaming and voice and real-time video, that sort of stuff. And ICMP, I won't get into now 'cause I've got 20 minutes. So this is where we're at, kind of.

Or where we've been for a long time.

This is how the internet was architected.

Just so you know, I work up at this layer, at the application layer. I'm not a transport person.

The transport person laugh at me when I say things a lot. But I hang out with 'em a lot, and I've been stretching my comfort zone so I work down here more as part of the QUIC work. So if any of you ask any really deep TCP questions, I'm gonna look really dumb.

So just, just so you know.

But I do play with them, so I get to know a little bit about what they're doing.

So, stepping back a little bit, we worked in the HTTP working group on HTTP two a while back. (coughs) Excuse me, started in 2013, finished up in I think 2015, and in that process, we wanted to make HTTP more efficient. And this is the table of contents for H two. And you'll notice that a lot of the stuff on that previous slide, over here, we had to talk about in HTTP two.

We had to do flow control.

We had to do a stream mechanism, which is multiplexing. We ended up reinventing a lot of these transport services in the application layer, you know, from pushing them up into the application layer. And we did it, you know, pretty well, but that duplication made us wonder, why are we doing this? You know, why is this happening? And at the same time, what we've really developed in the last, oh, five, 10 years, is a different view of the internet hourglass. It's kind of a dual-waisted approach where we have IPV four and IPV six.

A lot of the IETF graybeards are really adamant that IPV 6 is coming.

(audience laughs) But for the purposes of this talk, the more interesting part is just that more and more people are just using HTTP, on top of TLS, and TCP, and this effectively forms the waist, where everybody is building on top of HTTP. Which is great for me, that's job security. But, we can do better as an architecture.

We can start, you know, thinking about, well, where is it appropriate to have these application services? Do we need to recreate them up here or can we do better down here? And that's where QUIC comes in.

So, this is the classic stack that you talk about, where you have, you know, IP, and you build TCP on top of that.

TCP provides us transport services.

TLS is for security.

That gives you encryption, which gives you authentication, integrity, and uh, confidentiality.

And then on top of that we have the application protocol, was HTTP, which hopefully everybody's very comfortable with. QUIC does something different.

Because we realised we were duplicating a lot of this stuff up here, QUIC is a re-engineering of all the transport services and the security services and some of the application services into one thick layer. We're taking these different abstractions and collapsing them down and putting it on top a very small shim of UDP.

Remember, UDP is very lightweight.

But all this stuff that happened over in here is now happening here, and there's a very thin shim on top of it to do the actual application layer stuff. And in IETF QUICK, we took TLS 1.3.

as the basis for the security services.

It's kind of shoe-horned into the side in a really kinda weird way, which I won't get into. But in this fashion, it's interesting.

We're collapsing these layers and, you know, the initial reaction to that is, "Oh, well, layered architectures are good." It also turns out layered architectures can be re-- (phone rings) What? Oh, no, don't call me right now.

Come on. (audience laughs)

Uh, layered architectures are good in the sense that they allow us to not worry about stuff below us, but layers, you know, obstructions, are also leaky. You have stuff that happens down here that affects stuff up here.

When you collapse these layers, you're able to do things much more efficiently, and that turns out to be one of the big purposes for QUIC. So, why are we replacing TCP? Three big reasons I'll get into.

Performance, security, and this thing called ossification. First of all, the performance case for QUIC. This is literally a, an illustration that I cut out of a presentation I was giving in probably, I wanna say 2013, 2012, about HTTP two.

HTTP had this problem called head of line blocking. Who's heard of this? Anybody? Okay, a few people.

(coughs) In HTTP one, you send a request on a connection, and you can't really do anything else with that connection until you get a response back, and so one request blocks further use of that connection until that request completes. This is called head of line blocking, and it's a problem because now you have to figure out how do I use my resources effectively.

Do I put, if I have more requests to make, you know, we all know web pages make lots of requests. Do I spool by the connections? That has other problems that, you know, we can get into. But, it's a sign of inefficiency in the protocol. So HTTP two, one of the major drivers for that protocol was fixing head of line blocking in HTTP one. And we did that.

We have full multiplexing.

You can send tonnes of requests on the same connection. They don't interfere with each other, it's great. The problem that that uncovers is immediately, TCP has head of line blocking too.

So we clean up one layer, and it just shows up one layer down.

Oops.

And so, you know, that's because, you think about TCP. When you read data off of a connection, if one of the packets in a series of packets is lost, you can't read the subsequent packets, the data from those subsequent packets, until the missing packet arrives.

And so because of the way that the TCP offers an in order reliable abstraction to applications, you have head of line blocking as well.

And this affects efficiency and performance for TCP. So, I talked about this a bit before.

HTTP encourages one connection per origin, uh, in H two. And that's so that you get the most benefit from that protocol.

It turns out that when you have lots of connections open for one web browser, for example, there's contention for resources.

Especially contention for congestion.

If I have a bottleneck link, and I have a bunch of connections open from a web browser, and all those different websites slam data down at the same time, it's very likely that you're gonna see congestion and then lost packets, and then retries, and performance goes to hell.

And TCP has built-in mechanisms to try and address that. But when you have so many simultaneous connections open at one time, they're very easily overcome.

And there's lots of data to support this, that a lot of performance problems we've seen have been because there's so many connections open for the web at one time.

This is why charting is no longer a best practise when you talk to a lot of performance people, for example. So, H two encourages, whoops, bugger.

H2 encourages one connection per origin.

The best practise for HTTP one was six.

That's basically a very awkward balance between, okay, I don't wanna open too many connections 'cause that's bad performance.

But if I only use one or two connections, that's also bad for performance because of head line blocking.

And so, you know, we can hopefully do better. So, QUIC is, is trying to help H two effectively address those last remaining problems.

The problems we saw with a lot of H two deployment, and you see a few talks out there about this, are that, yes, it's better for performance if you have a really great connection.

But for very high-latency or loss connections, HTTP two can be not so great.

And the reason is that TCP head of line blocking. When you have a lost packet, it disproportionally affects HTTP two.

'Cause you only have one connection open.

If you have a lost packet and you're using HTTP one, you've got six connections per origin open and hey, you know, that's one out of six.

That's not as bad as one out of one.

So, for QUIC, we're, we, we're getting rid of the head of line blocking in the application and the transport at the same time. We've effectively move the streams concept from HTTP down into the transport where it belongs. We also have a bunch of other fun stuff.

There's zero round trip handshake, or one round trip handshake, depending on whether you've talked to a server before, so you can send application data immediately when you open a connection.

And finally there's lots of opportunities for improved congestion control and loss recovery because we've moved everything up into the user space. It's no longer in kernel space (audience member sneezes) so it can be a lot more agile when we're working with QUIC. So, you probably are wondering how much does this really matter.

This is data from the best source we have right now, which is a Google paper from last year that they presented. They are running QUIC at a fairly large scale now. I think it's an experiment that's Google-wide, thank you, and they're seeing things.

This is handshake latency, so for example you see a comparison between TCP, the orange line, and QUIC. All QUIC is right here, and if you have, if you haven't seen the server before, it's right here. So it's, you know, of course half of it, if you have a one roundtrip handshake, and you know, almost nothing.

Just your latency for zero roundtrip handshake. So that's good.

Setting up a new connection is a lot cheaper with QUIC than with TCP.

Perhaps more interesting, this is percent reduction in search and video latency.

So this is using QUIC versus using HTTP two in the case of search and HTTP one for video. And you can see that it, you know, high-latency customers, uh, sorry, customers. Clients, high latency browsers, so 90, 95, 99%, are seeing a pretty significant reduction in latency here. Five, 10, 16%.

You might say, "Well, okay, 5%, 10%, that's okay, it's not mind blowing." But consider this is Google Search.

This is one of the most highly optimised web pages on the planet, and they regularly throw a party if they see a 1% improvement in performance. So these are really big numbers out here at the tail latencies.

Also, it means that, that in, people who are in poorly connected places, so emerging markets or western Australia or wherever. I shouldn't, no, (audience laughs) I shouldn't say that.

Um, yes, on the NBN, perhaps.

Yes, just, just Australia, okay.

They're going to be benefited more by QUIC than if you're right next to a data centre. And that's great, that's a good story for the web, I think. Video as well, you see some numbers here.

The more interesting video one here is, when you're measuring video, one of the most interesting stats is re-buffer rates. So, how often do you see that re-buffering spinner on a page when you're watching a video.

And again, you see here at the tail latencies, you get really great improvements in uh, re-buffer rates, where they're not nearly being re-buffered as much because QUIC is much more efficient for delivery than H one or H two.

Current best practise, most people are saying don't use H two for video at all.

It just isn't well-suited for it.

QUIC is especially well-suited for video.

So, second thing, security.

So, paging back again, uh, 2013 or so, We were kinda halfway through working on HTTP two. We were having a debate about whether we were gonna make it HTTPS only or not. And then something happened.

Probably about two weeks before we were having an IETF meeting in Berlin, Snowden did his first drop of documents and we busily re-jigged our entire schedule to talk about what it meant to secure HTTP. That's a discussion that went on for a long time. We're still playing that out.

The IETF has always taken security very seriously. Obviously this made it much more real for a lot of people. And so there's a continuing discussion about how do we secure internet protocols against pervasive monitoring, which is what we call the attacks that he revealed, as well as bunch of other attacks.

And arguably I think, in H two, it probably slowed us down a little bit.

Because, this is one slide from that document dump, where, showing that the NSA was getting into Google by looking at things after SSL was stripped in Google front-end server. And we lost a bunch of engineers for a while who went off to try and fix that while, when this was revealed, instead of coming and working on H two.

We got them back, which was good.

But that wasn't the only thing that was revealed. There were a bunch of attacks, not against the application layer, not against do you, are you using SSL or not or against TLS or whatever, but against the actual transport protocol.

So for example, quantum insert, where you're redirecting someone to another place by racing the packets on the network and utilising the actual protocol metadata at the transport layer to perform that attack. And that's because in TCP, there's all this metadata. You know, if this is the payload down here below, but you've got all this stuff here about, okay, the sequence number, so we know how many packets have been sent, and the acknowledgement number so we know the acknowledged packets.

And all the stuff like, you can reset a connection. You can declare the connection finished.

All this is in the clear.

None of this is encrypted.

Anybody on the path can observe this, and as we saw in quantum insert, in quantum insert, you can also manipulate this data on the path, if you're a pervasive attacker.

And in fact, we've seen other stuff as well recently. This is a study from I think last year, where they were able to tell what Netflix video you were watching, even though Netflix moved to full TLS.

And that's because they were looking at the protocol metadata at the transport layer. So it's not great to have that there.

And they did it with I think, yeah, 99.99% accuracy. That's fun.

That's not good from a protocol designer standpoint. And so in QUIC, we've taken a fairly drastic approach. There's a story behind this.

In HTTP 2 we tended to use Lego digital designer to illustrate packet formats, but we thought that was a little bit old, so now we're trying to use Fortnite.

(audience laughs) I don't know this is a great way to communicate, but I think it serves the purpose just fine. Because this is the QUIC short packet header. This is the majority of, there's also a long header. It's just used for the handshake.

This is almost all the data that's sent across QUIC, has this packet header on it.

And these fields up here, these are just fixed ones and zeroes.

That's a key rotation schedule.

These are reserved bits for experimentation. Almost no information there.

And then you get the destination connection ID, which allows you to stitch different connections together. But that rotates on a regular basis, and if I change end points it rotates based upon a negotiation behind the scenes. And then you've got the packet number in the clear. Except it's not in the clear.

We have a packet number encryption scheme that's being put in right now, so that's encrypted as well using a different key. And then you've got the actual protected payload, in metal here.

And uh, that's all encrypted using TLS 1.3, effectively using the handshake, the output of the handshake, so there.

And so, all the acts, all that protocol layer metadata that usually in TCP happened, you know, up here in the clear is happening in that protected payload.

It doesn't give anybody on the path really any information at all.

And in fact this has caused a lot of strife because some network operators are used to having access to that data and they wanted to use it to operate their networks or tell you what you can and can't view on the internet. And it turns out that if you hide it all, they can't do that.

Finally, ossification.

So there's a medical term for ossification, but in the internet community, we use it to mean, when protocols get deployed widely, if there are places where people can interfere on the wire or make assumptions about the format, it can restrict your choices in the future about how you use that protocol.

So, if you know, you use HTTP and somebody assumes that a certain header means something, you can't change that header down the road. This affects our ability to change or to improve the protocols over time, and we've seen a lot of this with TCP.

It is very difficult to deploy new TCP extensions or changes to the protocol because there are all sorts of people on the path, "helping" how it works.

So we have these TCP accelerators or firewalls, people doing DPI, (coughs) Excuse me. We've got people making assumptions that, well that bit right there means this and therefore I'm gonna drop that packet because I don't like the look of it.

All of this affects our ability to change the protocol over time.

And as a result, you know, the internet as it was architected, we used to be able to do a lot of these things on the left. You know, sending packets to or from everywhere, using many different transport protocols.

Using end to end addressing, you know, the end to end model. Using IP options as a viable way to signal things on the path.

Assuming the bits are layered.

You know, that people don't mess with the bits on the wire. We can't assume any of these things anymore, because people do things in the path.

It's ossified.

And so, fixing ossification, we have two major tools. One's called encryption, which we just talked about. And the other one's grease.

So grease is if you can't encrypt something on the wire, make sure that it changes often.

Make sure that you exercise that joint so it doesn't lock up.

And this is something we're starting to use in a couple different places, but encryption's the big one. So again, if the middle box, the guy on the wire, can't see any of this stuff, they can't change it, they can't make assumptions based on what they see, and so we have the ability to evolve the protocols much more quickly and much more reliably than we have with TCP.

So, where are we right now.

We have these two things, very awkwardly called gQUIC and iQUIC.

gQUIC is Google QUIC and iQUIC is IETF QUIC. They are converging.

We're talking about calling iQUIC QUIC two just to kinda differentiate them a bit better. We've done a bunch of changes.

We've incorporated TLS one three, Google QUIC used their own bespoke crypto, which makes a lot of cryptographers really nervous. (audience laughs) We're charted to be done in November and the editors tell me that they really think they can be done in November.

We'll see.

We've still got a few issues open, as you can see. And we're doing interop work.

This is one of the more recent interop matrixes. So you can see, we've got bunches of implementations. I think it's around twelve or fourteen.

Lots of names you might recognise doing implementations. Pretty good interop, but most of this right now is down at the transport layer, so doing the handshake, doing loss recovery, that sort of stuff.

Not quite up to HTTP yet.

We're getting there.

So, uh, takeaways.

We're trying to address the issues of performance security and ossification in transport protocols.

We're having great perf benefits, especially for poorly-connected networks.

And hopefully, this should remove the downsides for H two deployment.

Where right now you have to make some judgement calls, hopefully it'll be even easier to make that decision. And I'm really hoping that we'll see implementations emerging in roughly in 2019. We'll see how it goes.

I don't want to be negative, but just to end up, a few things that do worry me.

I don't think this is all sunshine and daisies. We're trying to do something really big here. H two was big.

This is a lot bigger.

It's a big change for the infrastructure of the internet. We'll see how well we're able to do that in one go. UDP's performance is bad on some networks, so I think there's a lot of anecdotal evidence of that. Google assures us that it's not as bad as you might think. We'll see.

I think there's a tremendous amount of specialist knowledge in tuning this stuff to get it right, to get it to perform well.

How well that knowledge gets communicated inside the community is still up for grabs. And in the meantime, TCP, there's been a lot of research, a lot of work in pushing things like TCP fast only, BBR congestion control, that means that the differential between TCP and QUIC is gonna close over time. How much? We'll see.

Some people are thinking of QUIC as the next transport protocol, full stop.

Right now most the work we've done is HTTP specific. We'll see how much that can generalise.

And finally, well, there's lots of other stuff to worry about.

(audience laughs) So, this is our homepage.

If you're interested in this stuff, there's a lot more information there.

Hopefully this kinda gave you just a really high-level look at where it's going, and hopefully we'll have a lot more data and a lot more experience to report next year.

Thanks.

(audience applauds) (upbeat music)

You may also be interested in

More presentations from Code 2018