Protecting data in the browser: the first line of defence
Introduction
Dan Draper begins by asking the audience how often they think about what happens to data after it is submitted through a form. He points out that even seasoned backend professionals may not reflect on this as often as they should.
Data Journey and Vulnerabilities
Using the example of saving an email address, the Dan describes the typical data journey from browser processing to backend API and database storage. He highlights potential vulnerabilities at each stage, including man-in-the-middle attacks, application compromises, and insider access.
Limitations of TLS/SSL
While acknowledging the security benefits of HTTPS and TLS, the Dan emphasizes that decrypting data at various points creates gaps in protection. He explains how TLS termination at load balancers and intermediate network services can expose data to risks.
Benefits of Encrypting Individual Values
Draper proposes encrypting individual values directly in the browser before transmission. He explains how this provides end-to-end data protection, reduces risk for service providers, and mitigates threats like insider access and SQL injection attacks.
Emerging Technologies and Data Breach Prevention
Dan briefly touches upon promising technologies that use encryption for access control and data breach prevention. These involve recording data access and leveraging encryption to enhance security measures.
Challenges of Browser-Based Encryption: Key Management
Draper explores the challenges of implementing encryption in the browser, specifically focusing on key management. He introduces AES encryption in GCM mode and explain the crucial role of keys in encryption and decryption processes.
Envelope Encryption as a Solution
To address key management issues, Draper introduces envelope encryption. He describes its two-layer approach, involving a key service for generating and wrapping data keys. This ensures secure key distribution and revocation capabilities.
Demo: Encrypting Data in a Next.js App
Dan demonstrates envelope encryption in a Next.js application. He showcases how data is encrypted before leaving the browser, making it unreadable even with database access. He also briefly discusses the code and highlight the use of AWS KMS for key management.
Problem: Storing AWS Credentials in the Browser
Draper points out the security flaw of storing AWS credentials directly in the browser. He emphasizes the importance of secure credential management and avoiding hardcoding sensitive information in client-side code.
Solution: Federation with Secure Token Service (STS)
As a solution to credential management, Dan explains the concept of federation using Amazon's Secure Token Service (STS). This process involves exchanging a user's authenticated token for temporary AWS credentials, enabling secure access to KMS without exposing long-term secrets.
Problem: Searching Encrypted Data
Draper raises the challenge of performing searches on encrypted data. Traditional SQL queries become ineffective when data is encrypted, hindering functionalities like filtering, sorting, and pattern matching.
Solution: Searchable Encryption
To address search limitations, the speaker introduces searchable encryption. They highlight its efficiency compared to homomorphic encryption and its compatibility with existing databases. Searchable encryption enables practical and performant searches on encrypted data.
Conclusion: Practicality and Future of Browser-Based Encryption
Dan Draper concludes by discussing the practicality of browser-based encryption. He encourages the audience to consider this approach for future projects and highlight existing tools like AWS Encryption SDK and emerging solutions like CipherStash's Web SDK.
Alrighty, hi everyone.
Can you hear me okay?
Cool.
So the first question I have for you, how often do you think about what happens to data once the form is submitted?
If you're primarily working in the front end, excuse me, maybe it's not something you reflect on all that often.
And actually, I would wager that even many seasoned back end professionals don't think about this as often as they should.
So let's talk through it a little bit abstractly.
Let's say you're saving an email address.
What happens to that value once you submit it in a form?
There'll be some processing in the browser.
It could be wide and varied in terms of what you might want to do.
But a common thing you might do is validation.
You might serialize it into JSON and so forth.
And you're going to send that over to a back end API.
And then, probably, that backend API will save it to a database.
Maybe there'll be some side effects, might send an email or a Slack notification or something.
But that's generally the flow that we would see.
Now there's some problems with that though.
There's lots and lots of places along the way that an attacker might be able to compromise.
Thinking about the connection from the browser to the backend API as common network attack called a man in the middle.
Not very gender inclusive, but it's a common phrase so I'm going to stick with the common language.
Where an attacker can intercept traffic and read values out.
Or the application itself might be compromised and all you have to do is read, or do a quick search for web security problems and you'll find hundreds of thousands of blog posts on all the various different ways that your code might be insecure.
And then finally, and unfortunately I think it's one that we don't think about enough, but that's the idea of insider access.
Who within the organization is looking at the data that's stored in a database or data warehouse?
And it sounds like when we say that we're referring to somebody that's malicious, but actually very rarely is that the case.
But it doesn't mean that an insider access is not without problems, give you an example, some of the more recent breaches that we've seen in the media have often been, caused by somebody that's got access to lots of data and they themselves have been compromised.
So it's really important to think about even if there's no malicious intent, who has access to the data in the organization?
You might be thinking, okay, but haven't we got TLS and SSL HTTPS, all this stuff.
Isn't that serving a security need in this situation?
Yes it is, but let me talk you through how that works and where there are still some limitations.
Let's go through the journey again.
We might decide we're going to use, not might we decide, we should absolutely decide to use HTTPS connections to our backend, certainly in production.
And so now, we've got this encrypted channel from our browser to our backend.
Similarly, we could have TLS running on our connection from the backend API or application into the database, so we're protecting that as well.
The problem is there are still quite a few gaps in the model, because we're decrypting, we're termina what we call terminating the TLS connection at various different points along the way, and there are still Sort of holes in the defense, if you like.
Now, we've mediated the man in the middle attack, perhaps, but we haven't really solved anything else.
And actually what tends to happen is that little line that I've drawn at the top there is actually in practice a whole bunch of different services that we don't really get a lot of visibility into.
So often you might find that the TLS is actually terminated at a load balancer.
And then there's all sorts of other network services in between that load balancer and your application running in Heroku or fly.io or any of these services, there'll be lots of things in the middle.
So what if we encrypted the individual values, not just using TLS, but the individual values in the browser directly, what would that look like and what benefits would that provide?
Firstly, let's work through, walk through what might look like how it might work so we go from the processing in our browser We do our validation.
What have you and now before we do anything else?
We're going to take that value that sensitive value and put it through an encryption algorithm and pop out a cypher text.
A cypher text is an encrypted value a random value that an attacker cannot distinguish from random, So there's no way to differentiate between values or even from what's actually real value and just noise.
Then we're going to send that value across our TLS connection, back end API, to the database and so forth.
So how does that change the security model?
As it turns out, I'm not going to say completely eliminates these issues, because there's no such thing in security, but it certainly does a lot to mitigate the problems that we've talked about so far.
The man in the middle was already largely addressed by TLS, but now if an application is compromised, very likely an attacker would only get the encrypted values.
Or similarly, if somebody's got insider access, maybe they even need it to do their job.
But they don't actually have to see the sensitive data in order to, say, manage the backups for a database.
You can now differentiate between access to the infrastructure and access to the underlying data.
So this is pretty neat.
This is a technique that I encourage you to all reflect on and hopefully you'll have some takeaways from this talk.
But why is this good?
Let's talk about that a bit more.
So we've mitigated some attacks, but let's unpack that a little.
You're getting actual end to end data protection, and I don't know how many of you have seen, some marketing copy on a website that says, we use end to end protection.
But, it's bollocks, let me tell you.
Most of the time, it's not really the case.
This means this approach allows us to have reduced risk for service providers if you work for a company or you are a founder or something.
It's much, much less risk for your business.
It also allows you to have a level of deniability.
So that means that if there is some sort of an issue, You can say, we've never seen the data, we don't have any access to data, or indeed in conversations where you're trying to win business, you can say, we never see the data.
And that's often a good thing.
It means you don't have to take on any additional risk.
Similarly, from the client side, clients don't have to fully trust the server.
And this is really important in lots of contexts, but one of the most common ways is in this idea of, MTLS, a messaging TLS, and you've seen, if any of you still use Facebook Messenger, you might have noticed that they've done a serious upgrade to their, systems lately, and there's a bunch of articles on the web explaining how they do it, but effectively, it's a very advanced version of what I've just described, doing full end to end encryption.
Facebook actually can't, and they, so long as they Implemented what they said in their papers, at least they wouldn't be able to see any of your messages.
A side effect, a nice little bonus, is that this helps to mitigate things like SQL injection attacks.
Which, if you don't know what that is, come see me afterwards and I'll help clarify that for you.
But then there's also a class of, benefits that are on the forefront of technology that I'm not going to go into today in lots of detail but there's some very promising new avenues around this idea of using encryption as access control and guaranteeing that when data is accessed we always can record the fact that it was accessed.
And these, two concepts become very important when it comes to a modern approach to data breach prevention, technologies.
So hopefully that's enough to convince you that it's good, but then you might be asking, why don't we do this now?
Why don't we see this in the wild and maybe you're hearing about this for the first time, why is that the case?
It turns out there's some problems.
And the first problem to think about is the idea of key management.
And so every time you do encryption, you need to have a key.
Now, I'm going to go into a little bit of a technical detail.
Deep dive, a technical topic for a moment.
If you don't understand all of this, it's not a big deal, but for those that are interested, hopefully this will give you some additional framing.
So when we talk about encryption, there's lots of different kinds of encryption.
I'm specifically talking about AES encryption, Advanced Encryption Standard, and running in a mode that's called GCM mode.
Technically Galois/Counter Mode you don't need to know what that means, but this is a standard that's been ratified by most governments around the world.
Very briefly, we're going to talk about what that encryption algorithm does.
It takes a plain text, you can see in the middle here, Love coffee, an initialization vector, which is just a random string of bytes, and a key, encryption key, which is 32 bytes long.
We're going to feed that into a function, so this is a function that does encryption, E for encryption, takes my plain text, my key, and initialization value, and it pops out a ciphertext and a tag.
You can ignore the tag for the time being.
The ciphertext is the encrypted value that we really care about.
Then, we're gonna, we're gonna store the total result, so my IV, my ciphertext, and my tag, and that total, set of bytes is my encrypted output.
Decryption looks very similar, but just the other way around.
So we're gonna take the key, take the same initialization vector, now the ciphertext and the tag, we're gonna pump it into a decryption algorithm, and with a bit of luck, we'll get our, plain text back out.
Love coffee.
So it's actually not that complicated at a high level.
There's a few different moving parts.
But there's a big challenge, a big issue with this approach that becomes very apparent when we're looking at the browser specifically.
And the issue that you may have picked up on is that those encryption operations all need to use K, the key, and that has to be the same every time, otherwise it fails, doesn't work.
So if you're doing encryption in the browser and you want to access data that multiple clients need to access, encrypt data that multiple clients need to access, you need to give them all the same key.
And that's a problem because how do you get that key to them?
That's, start to think through the mechanics of that and it actually becomes quite a tricky problem.
All the clients get the same key, which means there's no way to differentiate access.
And key revocation is not possible, or at least very hard.
So let's say that, If client 3 was no longer trusted or they, you wanted to, they just finished their session and they shouldn't have access to data anymore, how do you take that key away from them?
They've already got it.
Now, if it's stored in a cookie or something, maybe you can clear the cookies, but there's no guarantee that an individual or a malicious actor hasn't actually saved a copy of it somewhere.
That's a huge problem, and that's, you should never do that, by the way.
You should never try to store one of those 32 byte AES keys in a browser cookie.
Please don't do that.
But there is a good solution to this.
It's the idea, of envelope encryption.
And envelope encryption, I'll, go through the mechanics of it, but in very simple terms, it's the idea that you've got two layers of encryption.
You've got one layer of encryption that is protecting keys, and then another layer of encryption which is using those keys to protect the actual data.
To think about, the, network pattern or the interaction, that a client might have to perform in order to, to perform envelope encryption, you would have a key service and some examples of those key, of key services are things like Amazon's, KMS, at CipherStash we've developed one called Zero KMS.
And a client would request, a data key from that, service, and to do that request they would need to authorize themselves.
So there's a step to validate that the person is allowed to do, or the client is allowed to do what they want to do.
And then in return, we're going to send two things back.
We're going to send a data key, which we can use to encrypt or decrypt values, and what's called a wrapped data key.
That wrapped data key is an encrypted version of that key, but under a key that only the key service can actually see.
So that means the key service is still the final kind of arbiter of, of trust.
So I'll step through this very quickly because it's not really the thrust of this talk.
This, these slides were taken from a talk I did at B Sides in San Francisco a couple of weeks ago.
So it's just me being efficient to read these slides.
So we're going to take a, an initialization vector once again, but this time we're going to take a data key as our plaintext.
I'm going to feed it into our encryption algorithm like we did before, but now the plain text is actually another key rather than some sensitive value.
That then becomes what we call our wrap data key, and we're going to take our, the actual raw data key and we're going to return those two things together as a pair.
Then what we're going to do is take that data key and our plain text and do the same thing we did a moment ago.
So you've got these two, two stages to the process.
The reverse is equally, Simple, so now instead of when we want to reverse the process, we want to decrypt something, we want to get access to a data key, we've got a wrapped data key, we want to send that to the key service and get the original key back.
So that's fairly straightforward.
I'm going to show you a demo of how this works in the browser.
Fun fact, there's a problem, which I intentionally introduced, to show you the, challenges of managing, credentials with someone like Amazon.
And I, accidentally, semi intentionally pushed it to GitHub, and my security team is so good, they revoked access within about five minutes, so I've now reinstated my service.
I don't actually know if this is going to work, so we'll see how it goes.
So this is a really simple NextJS app.
It's using a used client, so it's mostly running in the browser itself, and I want to run a add user, so capturing some personal information here.
This is definitely not rocket surgery looking at it from a user experience point of view.
I don't know, pick on John.
Web directions, so I want to save it and see if it works.
Cool, it worked.
We saved a thing to a table in a database.
So what?
Let me show you what's happening under the hood.
Inside my database, lost in tabs.
Does anyone else have this problem?
Get way too many tabs.
Connect to Postgres here, which is running behind the scenes.
And I'm going to select star from web direction users.
And that's all just Mumbo jumbo.
So everything's been encrypted.
So that means if I'm an insider working in the organization, I actually can't see that sensitive information now.
I can still access the database, as you can see.
I can do things that don't require me to access sensitive data, but all this data is encrypted.
Conscious of time, so I won't labour this too much, but I'll give you a bit of a taste of how this actually works.
In my, Form here, so create user form.
You can see here, I've got AWS credentials here.
Please, do not do this.
It's very bad.
I'm putting it there to illustrate a point, which I'm going to talk about in a moment.
This is basically, and there's a bunch of commented out stuff, you can ignore that as well.
This is just getting a token from Auth0, and we're going to use that in a moment.
It's got an on submit handler for the form, and then it's got a function here called encryptObject.
So encryptObject is, in this case, it's using AWS.
So AWS KMS.
And there's a bunch of setup and ceremony you have to do, but effectively it's just using this encrypt function.
And as you can see in the form here We're using client, so this is all running in the browser.
So the values are encrypted before they ever leave your device, effectively.
All right.
Still getting used to Canva.
Okay.
Second problem.
So I showed you those AWS credentials a moment ago.
As it turns out, you can actually very easily see those credentials, if you ever stored those in your browser, that's something you want to avoid.
How do we get around that?
Let me show you another way to approach the problem.
Clearly this is a bad idea, don't do that.
I've said it three times now, four times might not be enough yet, so we'll say it again.
We want to take the credentials of a user that we trust, and find a way to convince the key management service that these credentials can be used to access the key management service.
And this is a process called federation.
And Amazon supports this, various other cloud providers do as well.
But in the AWS context, what's going to happen is you're going to take a client that has.
Gone through the authentication ceremony, it's gotten a JWT.
A JWT, for those that are not familiar, is a token telling me, giving me some right to perform an action.
A very common way of authenticating people.
I'm going to take that token, I'm going to get one of those from my IDP, say Auth0 or Okta, and I'm going to take that token and I'm going to send it to a service in Amazon called the Secure Token Service.
It must be secure because it's got secure in the name.
And use a very memorable API call called AssumeRoleWithWebIdentity.
I've been using it for years and I still can't bloody remember that name.
Then that's going to give me back another token called an STS token and the STS token, is a temporary credential which is usable only by that user and only for a short period of time but that will allow that user, access to KMS.
So now we can we can reuse those credentials, the user's credentials that we've trusted, in such a way that we can talk to the key management service, we don't have to share any key material, and we don't have to share any common credentials.
So this actually pretty much solves both of those problems.
By the way, in Cypherstash Zero KMS , you can federate your JWTs directly without having to jump through that extra hoop, and that was because of The frustration of pretty much everyone on the team of having to deal with the way that AWS does it for so many years.
Cool.
One more problem, and that's search.
This is actually the thing that I'm super passionate about and the thing that we're solving for at Cypherstash.
But let me try to articulate to you what the problem actually is.
Remember that these queries, even if we're driving it through the front end, we've got some React components or what have you and they're maybe doing some search or sorting of values, or filtering by some field.
At the end of the day, it's just doing an SQL query somewhere down the stack.
And so thinking about our SQL queries, if the values in the database are encrypted, these queries won't work anymore.
So I want to do a lookup with a condition.
So I want to get the name value from the users table where email equals Grace Hopper.
That won't work if the name, or sorry, the email column is encrypted.
Similarly, I can't order by it.
Technically I can, but it's going to be ordered by the randomized encryption values, so it's not useful at all.
So that doesn't really work either.
And then you can forget things like using the like operator and fuzzy matches, it's just not going to work at all.
However, there is a, I would say emerging, but it's actually been around for a while.
It's gradually becoming, known to more and more, industry players.
It's been studied in academia for many years.
This idea of searchable encryption.
And at least in the case of the technology we're developing at Cipherstash, that searchable encryption supports most.
SQL queries, including those simple conditions like sorting range queries and some others.
It's very fast.
There are some other technologies you may have heard of.
I don't know if anyone here has heard of homomorphic encryption by any chance?
Hands up who's heard of homomorphic encryption?
I think maybe two or three or four.
So just to give you a sense of it, homomorphic encryption is about a million times slower than searchable encryption.
And so if you were to run a query over a table with, a few hundred thousand records in it and they were encrypted using homomorphic encryption, it would take about five hours.
Whereas, searchable encryption, typically adds less than ten milliseconds of overhead to your existing plain text queries.
So it's very practical.
And most importantly, I think, at least for getting industry adoption, is that it still works with existing databases, or most of them anyway.
And you don't have to have some exotic, new kind of system, you don't have to migrate all your data to a new database, it works with the technology that you've got available today.
The final question.
Is encryption in the browser practical now?
Kinda, almost.
We're nearly there.
I think it depends on your use case, and I'd encourage you to think about it.
I wouldn't necessarily say to you, go out and implement this in your application tomorrow, but as you start to think about projects you're working on, or security requirements, or what have you over the next few months, or the next 12 months, keep this in your mind as an approach you might want to consider.
If you've got relatively simple requirements and you only need to look up records say by an ID, then the example that I've shared with you would be very appropriate.
Using Amazon's KMS, their encryption SDK and the assume web role identity thing I talked about.
On the other hand, if you need to query the data as well, and if all of that stuff sounds way too hard, which it certainly did for me, then you could wait for the Cipherstash Web SDK, which is coming soon.
Obviously I'm plugging my own stuff, but there are a couple of other companies around that are doing this.
Baffle is another example you might want to check out.
But certainly there's, I think this is the way the industry's going.
And, being aware of it and starting to think about how it might be useful.
In whatever projects you might work on, I think is, a sensible approach.
So I've shared some resources, there's a little camera emoji there, so if you want to take a photo of this, if you want to learn more, I'll leave it up there for a few seconds.
There's a link there to the AWS Encryption SDK, which is a really great set of libraries.
You can learn about how to federate JWTs in AWS, which actually has lots of other use cases beyond this.
It's not for the fainthearted dealing with IAM.
Is table flipping worthy.
And then I've pushed my sample code that I've shown you today to a repo at cipherstash slash wd 2024 demo.
I'm buying time so you can take a photo.
Thank you very much and please don't be a stranger.
I'd love to talk to you more and answer any questions you might have.
If you are interested in this kind of stuff, and you want to stay ahead of what we're building at CipherStash, please follow us on our github, com slash CipherStash.
You can connect with me on LinkedIn.
I've given up on Twitter.
Yes, I still call it Twitter.
Or you can email me, good old fashioned email.
Love to chat to you.
Thanks very much.