Introduction to cryptography on the frontend
I. Introduction
- Overview of cryptography in the browser
- Importance of cryptography in the web computing trust model
II. State of Cryptography in the Browser
- Two main options: WebAssembly and Web Crypto
- Web Crypto is widely supported and available in most browsers
- NodeJS also implements the API for the backend
- The main interface is called SubtleCrypto
- Web Crypto only works with HTTPS
III. Types of Encryption
- Two main forms of encryption: symmetric and asymmetric
- Focus of this talk will be on symmetric encryption (specifically AES)
IV. Encrypting a Message with AES
- Steps required for encryption: encoding the message, creating a key, generating an initialization vector, using the authenticated associated data
- Example of how to encode a message using Web Crypto API
- AES keys can be 128, 192, or 256 bits in length
V. Importance of Initialization Vector
- ECB (electronic code book) is the raw form of AES
- Deterministic encryption has poor security and is vulnerable to chosen plain text attack
- To achieve randomized encryption, an initialization vector (IV) is used.
Welcome to cryptography in the browser . First question you should probably ask is why would you even want to do cryptography in the browser?
A good answer to this question comes from considering a model of trust in modern web computing.
In most applications, the web backend is a trusted actor.
While network communications are encrypted, the backend typically has access to un-encrypted information.
So in any security model we must ask, can the web backend be trusted?
By encrypting in the browser with keys controlled by the individual, we can effectively remove the web backend from the trust model.
So what is the state of cryptography in the browser?
There are two main options.
WebAssembly, say using a language like Rust or the new Web Crypto standard.
In this talk., I'll just be talking about the latter.
Web Crypto is quite well supported now and is available in most browsers.
NodeJS now also implements the API for the backend.
The main interface to the Web Crypto API is called SubtleCrypto.
This is to avoid naming clashes with older and insecure modules it's simply called 'crypto'.
But also to remind us that it is easy to make subtle, but nonetheless, devastating mistakes with cryptography.
There be demons.
One of the small gotcha is that Web Crypto only works when the browser is connected via HTTPS, probably for obvious reasons.
So for testing, I typically just navigate to any secure site and use the Chrome dev tools to play around.
Okay.
So let's talk about what we're all here for-how to make things secret.
There are two main forms of encryption-symmetric, and asymmetric.
Cryptography is a big topic and we only have around 20 minutes.
So I'm going to just focus on symmetric encryption, but I will go a little deeper than the typical developer talk on encryption.
There are also many flavors of symmetric encryption, but by far the most common and trusted symmetric cipher is advanced encryption standard.
Or as it was originally known the Rijndael block cipher.
When encrypting a message with AES has three required steps and an additional optional step.
I'll run through how to correctly encode a message, create a key and safely generate an initialization vector, using the lightest Web Crypto API running in the browser.
Then we'll look at how to use the authenticated associated data to add tamper-proof metadata to the message.
Let's start with encoding.
We can't just encrypt a string directly with Web Crypto.
We first have to encode the message into a consistent binary format.
To do that, we'll instantiate a new text encoder and pass the message to the encode function.
Next we need to either load or generate a key.
AES keys can be 128, 192 or 256 bits in length.
256 bits will be the most secure, but implies more encryption rounds.
So that's 10 for 128 versus 14 for 256.
So that will take longer.
Note that the block size does not change with different key sizes.
So using a larger key doesn't result in a larger ciphertext, it's only really performance you're trading off here.
Also remember that because AES is symmetric, you use the same key to both encrypt and decrypt.
We can use the subtle crypto interface to generate a key in the browser.
generateKey function returns a promise and takes three arguments.
The first is an object specifying the encryption mode, which we'll explain later.
And the key size think of this as the algorithm specifier.
The second argument indicates whether we can export the key for later use.
And the final argument tells the API what operations are allowed with the instance of the key.
For example, we might want to limit use to only encryption or decryption.
The next step is to generate an initialization vector.
This is where I see a lot of mistakes.
So first some theory.
AES in its raw form is called ECB or electronic code book.
It coverts a plain text block into a ciphertext block and back again.
If the plain text is not 16 bytes, the operation will fail.
There are a few things to unpack here.
The first is that an AES electronic code book is deterministic, and that isn't very secure.
To understand why let's look at an example.
I have a simple database that stores product purchases along with the encrypted email address of a buyer using a deterministic encryption scheme, for example, AES and ECB.
Note that one of the encrypted email addresses repeats.
That means the person who bought a pizza also bought chocolate.
We've learned something about the data.
If an attacker was able to determine the encryption for an email address that can find all the corresponding purchases.
And this is known as a chosen plain text attack.
And there are many examples of large amounts of data being leaked because of them.
A good way to see why deterministic encryption has poor security is with this visualization.
As you can see with ECB, the image is still quite visible despite being encrypted.
The image on the right uses randomized encryption and is totally unintelligible.
To achieve randomized encryption we need to extend the encrypt function to take an additional piece of data called an initialization vector or IV.
Every time we encrypt a message, we use a new IV and so multiple encryptions of the same message result in different outputs ciphertexts.
IVs are sometimes called a nonce or a number used once.
This is because to be secure and IV must never repeat.
I often see code with hard coded IVs.
And this is a big problem because the encryption is now deterministic.
So how do we choose each IV so that it is never repeated?
The first option is to use a counter.
We increment every time we encrypt and this approach is generally fine, but you must be careful to ensure the counter never resets.
Managing counters in distributed or multi-user systems, like the web, can be hard.
So another approach is preferred when using Web Crypto.
So option two is to use a secure, random number for your IV every time you call encrypt.
This is usually a better approach for Web Crypto, but also comes with some gotchas.
If you randomly select a value from a finite set of possible values, eventually you will get the same value twice.
And actually a value will repeat sooner than might be obvious.
To illustrate this, let's look at the birthday paradox.
If there are 23 people at a party, what is the probability that two of them will have the same birthday?
Put another way, what is the probability of a collision given a random selection of two numbers from a possible set of 360.
It's actually about one in two.
So the next time you're at a party with at least 22 other people, there is a 50% chance that someone else there shares your birthday.
Crazy right?
Well, this fun fact is somewhat of a problem for selecting our IV though.
In general, we can calculate the probability of a repeating IV for an n-bit number using the following formula.
So what about a 32 bit number, which could be represented by the whole component of a number in JavaScript?
After 100,000 encryptions, what's the probability that we'll get a repeating IV?
Well, it's actually over 68.5%.
So a 32 bit nonce really isn't big enough for most applications.
What about a 64 bit number?
Say, using a bigint in JavaScript?
After a hundred thousand encryptions, there is less than one in a billion chance of a collision.
However, after a billion encryption, the probability of repeated IV jumps to 2.7%.
That sounds like a lot of encryptions, but consider that 1 billion 16 byte blocks is only 16 gigabytes.
If you encrypt a one terabyte SSD using this approach you'd get a lot of repeated IVs, which gives a motivated attacker plenty of advantage.
So really, the safest approach is to use a 128 bit secure random IV, which is really pretty easy to do with Web Crypto.
But by now you're probably starting to see that even relatively simple API, like this can lead to major issues, if not used correctly.
One of the trap to avoid is the use of math dot random to generate an IV.
Aside from generating numbers that are too small, this isn't a cryptographically secure, random number generator.
In simple terms that means that attackers can often guess the numbers that they generate.
Non cryptographically secure random number generators also have much higher collision rates and that can lead to data exploits.
The other problem with a AES/ECB, is that messages must be exactly 16 bytes long.
To address this let's first look at how to deal with short messages.
Short messages can be padded to fill up the 16 block size.
However, this isn't as simple as it sounds.
We can't just pad with zeros because there's no way to tell where the message ends and where the padding begins.
But the good news is that we don't really have to worry about it.
There are plenty of standards for padding data properly.
One of the common ones being PKCS#7.
As we'll see later, Web Crypto does the hard work for us, but it can be helpful to know what's going on under the hood.
Some padding schemes have been shown to be vulnerable to attacks like the padding oracle attack, if not implemented correctly.
We can encrypt messages longer than a single block by breaking it up into 16 byte chunks, and then padding the last chunk, if it's smaller than block.
But if we have any repeated data in the plaintext, we'll get repeated blocks in the ciphertext.
In other words, we'd get deterministic encryption.
We could generate a separate IV for every block, but this would literally double the size of the output ciphertext, not to mention the additional CPU overhead in generating all of the IVs.
Thankfully cryptographers have solved this problem by using the previous block as an IV for the next block.
This is called cipher block chaining or CBC.
And it's the first of what we call a 'mode of operation' for AES.
CBC is supported by the Web Crypto API.
To use it, we first generate a random 128 bit IV.
Then using the encrypt function, which takes the algorithm object-this time it's specifying AES-CBC along with the IV, crypto key, and an encoded plaintext.
Decrypting is very similar, but this time we use the decrypt function.
This will we use the same IV to decrypt as we use to increase.
Now most applications will just store the IV along with the ciphertext, say as a hex encoded string in JSON object.
And that's okay because the IV itself isn't secret.
It just can't repeat.
However, there is still a big problem with CBC and others like it.
The encrypted ciphertexts are not resistant to tampering and this can lead to quite nasty attacks.
Consider Alice who sends an encrypted message to Bob asking him to pay Eve $10.
Eve is able to intercept the message.
And although she can't read it is able to modify the encrypted data such that Bob decrypts a valid, but different message.
Can see why CBC is vulnerable to this kind of attack by looking at how it decrypts data.
The process is basically the reverse of encryption, but the IV is XORed with the output of the decryption just before returning.
Therefore tampering with the IV allows an attacker to change the decrypted message, in this case, changing the amount from 1000 to 9,000.
This is an easy attack to perform in JavaScript.
Take a JSON message, containing an amount with value 1000 encrypted using CBC and a random IV.
Through a trial and error, we can set the corresponding byte in the IVs, such that when the message is decrypted, the amount has now changed to 9,000.
There are only 256 possible values for each byte, so trial and error finding the IV byte doesn't take very long.
The person decrypting the message would be none the wiser.
Now what we want is a way to detect if a message has been tampered with and discard it if so.
So this brings us to our final step.
Adding authentication and authenticated associated data.
Encrypting using GCM mode looks much like the process for CBC mode, except this time, attempting to decrypt a tampered ciphertext, or a modified IV will fail with a DOM exception.
Web Crypto, won't say why it failed, however, as this would potentially give an attacker an advantage.
We can use authenticated encryption to tag additional data that isn't necessarily secret, but that we want to be resistant to tampering.
This is called the authenticated associated data or AAD.
You can store anything in the AAD, but a common pattern is to assign a unique ID to the key that was used encrypting and store it with the AAD, so decrypters know which key to use when they decrypt.
Now decryption will fail if the AAD is tampered with as well.
Another advantage of AES-GCM is that it doesn't require padding.
That means that cipher texts are the same length as the plaintext excluding the IV tag or AAD.
And don't have to be a multiple of 16 bytes, which for many applications can save quite a lot of space.
AES-GCM is probably really the only mode you should consider for new applications.
If you want to go deeper, I recommend checking out the NIST standard publications for recommendations on key usage, IV guidelines, and the use of authenticated associated data.
A few final thoughts.
Doing crypto is harder than it seems.
Lots can go wrong.
So make sure you follow the best practices.
I recommend getting an experienced cryptography engineer to review your code before going live if you can.
If you want to learn more about Web Crypto, you can find the spec on the W3C GitHub.
So I hope you found this talk interesting and useful.
In case you missed it, I'm the founder at CipherStash and we're building a searchable encryption platform that's easy to use.
Feel free to get in touch.
I'm always up for a chat and thanks a lot.