Compression: slashing the bytes for faster web apps

Neil Jenkins at Code 2019

Transcript

(keyboard music) (applause) - Hello, good afternoon, welcome.

That's a lot of redundant words for a talk on compression. Should probably just shorten it to hi and hope that my computer doesn't go to sleep before I've finished.

This is me Neil, thanks for the introduction. Let's jump right in.

So, what is compression? Obviously it's about making things smaller and of course it's a checkbox on your list of best practises for performant web apps. But really it's a bet.

It's a bet that your CPU is faster than your network. And this turns out to be a pretty good bet because your CPU is really, really fast compared to just about anything else going on on your computer or phone.

Even those of you lucky enough to be on our super fast MBM network (audience laughs) sometimes anyway, your CPU is still way faster.

And the key thing about web apps compared to say a normal desktop app or phone app is that the user has to wait for the code to be delivered over the network before they can use it. So how fast you load things really makes a huge difference to how fast your app feels.

And that's why double O seven here is all in on compression. Now that's interesting.

Rather than laboriously say O O 7 I just told you how many Os there are, shortening it to double O seven.

Okay it's not actually shorter, it's one extra syllable, but if in some hypothetical future MI6 agent numbering system we have one of Bond's successors called O O O O O O O O O O O O seven, I think that's right. We could just shorten that to 12 O seven.

And presuming that you knew I was compressing it like this, you could expand that back to the original title. We've just invented run-length encoding which is one method of compression.

See, it's not just magic.

Now specifically this is a type of lossless compression. That's where we get back exactly the bits we put in when we decompress it again.

The other type is lossy compression where we actually throw away information and get back something similar but not quite identical when we come to decompress.

This is used in images, audio and especially video of course to massively reduce file sizes.

It relies on a model of the limits of human perception to throw away mostly information that we're not going to notice.

It's fiendishly clever.

It's quite complicated and I don't have time to talk about it right now. So, back to lossless compression.

As we saw with the double O seven example here, lossless compression works by looking for statistical biases in the text and re-expressing those in a shorter manner. The more patterns and repetition there are the smaller we can compress it.

And this is why not all text can be compressed. A truly random string has no statistical bias we can exploit.

And now I can see you all muttering, "Neil, my site is full of highly compressible, repetitive "HTML, CSS, JavaScript.

"I'm not loading the periods of random decay "of radioactive caesium or something," and you're right. Except of course, that when it's actually sent over the network you use, or I hope you use, TLS.

And if your encryption is any good then the encrypted byte string is indistinguishable from random.

And this is why you have to compress first before you can encrypt.

You can't encrypt, then compress.

So you do that and you go, "This is easy," and you move on.

Except, if I can take another slight digression, I just want to know that this combination of compression and encryption needs to be treated quite carefully because it can actually lead to big security flaws. To take a contrived example.

Let's suppose you're building an online bank and you have a page to transfer funds.

The user is logged into your site and you've set a cookie with a secret key up there. It's a random secret.

It's long enough.

You use it to look up in the session database. No-one can guess it.

They can't brute force it.

Only requests that have this cookie will be authenticated to perform actions for the user.

No-one can steal your dollaroos, hooray.

But then you learn about cross-site request forgery and you realise that actually the user could end up on a malicious site which then submits a form back to your server, the browser helpfully adds the cookie for them and it's authenticated.

Damn, this security thing is hard.

Okay, so you think, "I'll fix that "by adding the cookie secret "to the forms on the page I generate as well." So, looks a bit like this and you can see we've added that in as another input. Now I can check that the form has a value that matches the cookie before I allow it.

That stops the cross-site request forgery 'cause they don't actually know what this is. They're just relying on the cookie being submitted with the form.

The form does irregular post submission to the server over HTTPS and the server of course only makes the payment with the correct secret key.

But before it does that, it validates the amount property and let's suppose that you submit a value of deadbeef for the amount.

Well that's not a valid amount to transfer to someone, so you're probably gonna get back an error. Might look something like this.

"Sorry, 'deadbeef' is not a valid dollaroos amount. "Please check and try again." Now of course, following best practise all your pages are returned compressed and then encrypted, which is cool, except now we have a security hole. If the attacker can both do form submissions with the user's cookie and somehow see the size of the response.

And that might sound a bit far-fetched, but it's actually quite trivial to set up if someone joins a open wifi network, like the one you're probably all on right now. Any unencrypted site they visit can be intercepted and replaced with your malicious HTML that runs the attack while you observe the network traffic.

It doesn't matter that the response itself is encrypted. So why not? Can anyone see how this attack works? Anyone wanna shout out? Okay, it's because the error message includes the attacker's string, there were are, deadbeef, and the page also includes the secret key, down here. And so the closer this string is to this value, the more statistical redundancy we have in this page and so the smaller it will compress.

And so even though you can't see the encrypted result, you can see how long it is and as you get closer to the correct value you'll compress more and the page will get shorter. And this is the kind of oracle that let's you guess much, much more quickly what this secret key is than just trying to brute force every result 'cause it can give you information about how close you are. And before you know it, evil genius has stolen all your dollaroos.

So, anyway, this is a talk about compression, not security. So let's get back to that.

JavaScript, HTML, CSS, really most text files have lots of repetition and patterns in them so they compress really well and you should definitely do it.

Every browser supports gzip and most support newer standards such as Brotli, as can do an even better job. I mentioned earlier that compression is a bet, that your CPU is faster than your network and this is because generally with compression while decompressing is super fast, for example just repeat O twelve times and then there's a seven, compressing the data in the first place can require quite a lot of work searching for the optimum encoding, trying to find all that redundancy in the file. For static assets, it's worth doing this at build time with the maximum slowest settings.

You do it once and does slow down your build so you may want to not do this for debug builds or quick testing, but in production it's worth it.

Once, when the user actually requests the file, the server already has a perfectly compressed option. It does zero work, it just spits it straight back out. For dynamic HTML pages or API responses you can't do this. You don't know ahead of time what the data's gonna be. It's still worth compressing the response, but you probably want to use a cheaper, faster mode. The compression's not quite as good, but it will be still way better than uncompressed and you won't hammer your server quite so much and also the response will start coming back quicker 'cause it doesn't have to wait to compress it with a higher setting.

Okay, this is all good common, best practise, but I just mentioned API responses.

What about the actual API request? Wouldn't you make a PUT or a PATCH or a POST request? You are sending data from the browser to the server and like any data it can probably be compressed. Now I hear you say, "Isn't that data pretty small, Neil?" And yeah, sometimes it is, but sometimes it isn't and even when it is small, compressing it could still make a big difference to performance. Networks are slow, but upload is particularly slow. Most consumer network connections, including mobile, are heavily optimised for download bandwidth and have much slower upload speeds.

Upload is slow on mobile.

And it turns out, unfortunately, the browser doesn't give you a way to automatically compress those API requests, the data that you're uploading.

The trouble is, it doesn't know what kind of compression algorithms the server will understand.

It needs to know that 12 O seven should be expanded out and isn't just a strangely specific time to meet for lunch. There is an RFC from 2015, it's number seven six nine four if you want to look it up, that lets the server inform a client that future requests may be compressed.

But as far as I know to date there's no browser support, even though that was four years ago.

In your single page app though, you control both the client and the server. And you know what the server supports.

I know, we can solve this with JavasScript, yes. Or probably WebAssembly as well, but for now we're gonna use JavaScript.

Specifically I'm going to show you how we can solve this with a library called Pako.

Pako is a port of zlib to JavaScript by Andrey Tupitsin and Vitaly Puzrin and probably just massacred those names, so apologies to them if by some random chance they end up watching this video.

It's JIT friendly and really fast.

They claim almost as fast as the original C implementation We found it's not quite that fast, but pretty damn good, certainly within reaching distance of that. We only need the compressor bit, not the decompressor bit which is 30 kilobytes minified, and my flicker's stopped working, I'll go to this. There we go.

Or nine kilobytes minifieed and gzipped, 'cause of course we're going to compress the code that we send as well.

Zlib influenced the Deflate compression algorithm which is also used by gzip.

It's really common.

It's used in a number of different places.

Gzip is probably where you've come across it the most. Deflate uses two techniques to compress text. I haven't got time to go over them in detail, but let's look at them quickly to get the basic idea 'cause it's interesting to see what it's doing under the hood.

So the first technique is that it finds duplicate strings, converts them to pointers.

So if you had bar space X, bar space Y, common JavaScript code, we can see there's a lot of redundancy here. We've got the same sequencing characters, V-A-R space there and there.

So we can remove that second set of them and just replace that with a pointer back to the beginning, say and copy that next four characters starting from there again.

We can represent this quite compactly, much more compactly than having four individual characters. So that's great.

We have to try and find all the duplicate text and convert it back into pointers and we can shave off quite a lot of bytes.

The second technique is to replace symbols based on frequency.

The basic idea is that some characters are more common than others.

For example in HTML the open angle bracket will probably feature quite a lot or in English text the letter E.

So if we just consider asking, to keep life simple, normally each character is represented by one byte, each one is eight bits long, but not all characters are equal.

In most text files some characters are much more common than others and many are not used at all.

Deflate uses a technique called Huffman coding to assign a variable length representation based on the probability of the character occurring based on its frequency in the document.

So we can have a look at a quick example.

For simplicity I'm just presuming I've got a document that only uses four characters A, B, C and E. Here is an example D, not used at all.

We can see in this row at the top, the original ASCII representation, eight bits each character.

But we can also see that E is actually 50% of the characters in this document.

A quarter of them are A and then the final ones are evenly B and C and these are very contrived numbers, I picked it to make free Huffman coding perfect and easy but still the principal's like this.

So we've replaced this eight bit sequence for E with just a single bit zero.

And 50% of the time, that's all we're gonna need. Now of course 'cause that's zero, that means anything else has to start with a one and so they're gonna have to be longer, 'cause you're gonna have to work out what happens after the one.

So A is one zero and then finally B and C are one one zero and one one one.

Our original ASCII took eight bytes, eight bits sorry per character.

If you multiply the number of bits by a probability and work it out, this now averages only 1.5 bits per character.

Okay, let's get back to Pako.

How can we use this? Well, it's really easy.

Presuming we've loaded the library into our page we just replace this, with this.

So you can see I've added two extra lines.

Firstly I'm taking my JSON data and asking Pako to deflate it with the level one, which is the fastest, least compressed setting. And then I'm adding a content-encoding deflate header to let the server know that that's what I've done. So the library will turn a U into eight array and you can just pass that straight to the request to submit to the server.

So deflate by extreme which is a standard format, so detecting the content-encoding header on the server and decompressing it, is just a few lines of code. And I hope you're now thinking, "Well that was easy. "So what's the catch?" And the catch of course, is that as I mentioned earlier, compression is a trade-off between CPU time and network speed.

We profiled it and on desktop machines we're confident that it's negligible.

The time taken to compress it, you just won't notice. On slower, cheaper mobile phones though, as we've heard quite common, it could make a difference to the perceived performance if you're putting this in the UI thread.

However, since the XHR is asynchronous this is quite easy to fix.

We can just move the compression into a Web Worker and then it doesn't block the UI thread so you still get the responsive UI with the same performance as before, but now you also get the compression and the faster network usage.

Okay, let's do a very quick demo, as I know you all want to get to lunch.

So, here is the fast mail web UI I've got it in mobile mode so you can see some code on the side as well. And I've set up log in to log the size of API request it makes both uncompressed and then the compressed size that it's sending.

So if I open a file and something's been marked red and I'll go through a few and then maybe go back up and archive something and load another folder or something like that.

Okay, that's enough.

We can see it's made a bunch of API requests and the compressed size is somewhere between a quarter and a third of the original size in all of these.

It's a massive saving especially on slow networks. Now the eagle-eyed amongst you may have noticed that there is one extra thing happening in the code above compared to the example I spoke about earlier, which is this dictionary line here.

And what's that? It's another trade-off.

We profiled a bunch of API requests from our app to work out what the common strings are and then we built up a two kilobyte string with the most common phrases in it.

The server also has a copy of the sames string and we can initialise both the compressor and the decompressor with that string and then you can do pointers back into that, if you remember the first technique.

We don't have to do this, the compressor automatically does this so that it doesn't have to transmit any strings at all in those cases, it can just go, "Oh yeah, you've already got this reference that." And that shaves off even more bytes.

We found without the dictionary we shaved off about 50%.

With the dictionary as you can see, it's more like two thirds to three quarters. Okay, quick recap then.

Ah, we've had the demo.

Compression isn't magic, it's maths.

Compression is a trade-off of your CPU versus your network. Compression can make your app a lot faster especially on shitty networks, so make sure you're compressing all your resources. And with a bit of JavaScript, we can compress our uploads too, make our apps even faster.

That's all.

Thanks so much for listening.

I'm happy to take any questions or I know you all want to get to lunch so come find me afterwards.

I can talk about other interesting gotchas or anything else you might want to talk about then. Thank you.

(applause) (keyboard music)