We all know that compression is essential for delivering fast web apps. But how does it work? Is it just ✨magic✨?
What is compression? Obviously it’s about making things smaller, to some a checkbox item you need to tick off… but ultimately compression is a bet that your CPU is faster than your network. This is a pretty good bet, most of the time.
Now if you take James Bond’s number and say ‘double-oh seven’ you’re compressing the number by referring to how many zeros before the seven. Admittedly it’s not efficient, but if you increase the zeros
0000000000007 can be compressed to
12:07 … twelve zeros and a seven.
This is an example of lossless compression, as you get the exact same number back. Many compression systems are lossy, for example JPG compression. But we’ll focus on lossless as we’re dealing with code, which must be lossless.
You have to compress before you encrypt – encrypt-then-compress doesn’t work. This can also open up security holes. If you reflect a value in an error message AND you reflect a secret key, it makes an attack much faster and more likely to be successful.
But back to compression… most text files include a lot of repetition and that compresses really well. This does take some crunching however. Static files can be compressed at build time, which makes things nice and efficient. For dynamic responses, you have to compress on the fly with a cheaper/faster form of compression.
We have to remember that many network connections have much slower upload than download speeds, which means the request to an API will be transferred much more slowly than the response will be downloaded. So it would be nice to compress the request and not just the response.
It does a few things…
(1) Remove duplicates.
var x; var y; →
var x; 4y (point back to the first 4 characters)
(2) Replace symbols based on frequency (Huffman Coding)
Some text characters are used much more than others; and you can replace an 8-bit character with a single bit character. So if you have a lot of ‘e’ in your document, you change it from
…so now you have Pako, you can compress your fetch data before you send it. Again this is still a bet between CPU and network, so low-power devices will want the compression moved to a web worker to avoid blocking the thread. At Fastmail they added a shared dictionary to speed things up even further.
Compression isn’t magic, it’s maths. It’s a trade off between CPU and network. It makes things faster, but don’t forget to compress uploads as well as downloads.