Random

Lachlan Hunt at Summit '19

Transcript

(upbeat tempo music) - I am Lachlan.

I'm going to talk to you today about Random. So, in the physical world, there are lots of ways you can generate a random number. You can flip a coin, roll some dice, pick some numbers from a bag.

But have you ever wondered how do computers generate random numbers? What goes on behind the scenes when you call Math.random in your webpage? And is it really as random as it appears? These are questions I hope to answer with this talk as I take a deep dive into randomness and how it works into computers.

So computers are inherently deterministic.

In normal circumstances, given the same input they will produce exactly the same output.

So the question is, how do you get random output? The way you do that, is by gathering unpredictable input. This unpredictable input is called entropy and typically comes from various hardware sources. In practise, this comes from a combination of timings, signals and other measurements from hardware components.

Data from the entry pool is used to provide the seed value.

It's the initial value that is then processed by the algorithm to produce a stream of computationally random output.

For some applications, the seed value is frequently updated as new values are produced and additional data becomes available from the entry pool. There are many mathematical algorithms designed to manipulate the seed in ways to produce a stream of statistically random numbers. These are known as random number generators, or RNGs. There are many different types of RNGs that suit different needs.

Physicists will tell you the only true source of randomness occurs in quantum mechanics and thus any random number generator must be based on quantum measurements.

But from my understanding of quantum physics, it requires sacrificing too many cats.

(laughs) More practical RNGs are known as Pseudorandom Number Generators or PRNGs.

PRNGs rely on mathematical algorithms.

Some are considered weak, some are strong, some are fast, some are slow.

Different algorithms and implementations are designed to meet different needs for randomness. For example, random images on your webpage doesn't have the same requirements as generating your SSH key.

For the latter, that's a class of PRNG known as Cryptographically Secure Random Number Generators. These fundamentally follow the same underlying principles as other PRNGs, but they're designed specifically with the needs of cryptography in mind.

PRNGs may be classified according to many different criteria.

Some of these include their predictability, how fast they can produce random numbers and whether or not the output is verifiable or reproducible. Depending on the application, other criteria may also apply. So let's take a look at a simple random number generator just to illustrate these principles.

The middle-square method was first presented about 70 years ago at a mathematics conference.

It is not state of the art today, but it's sufficient for the basic understanding of how they work.

So given the initial seed value in this case a four-digit number, it can be any length but that's sufficient. We take the square of this number and then we extract the middle digits by truncating it from either end.

So this is our next random value, it also becomes the seed for the next value in the sequence. So square it again, truncate the value, and that's our next value.

So if we repeat this pattern over and over again, you get a stream of random numbers, or seemingly random numbers.

But as you can see it's not quite perfect.

Those last four digits I've highlighted in red will repeat over and over again if you square 2100 you get 4100 and so on.

This is called the periodicity it's going to repeat over and over again.

All random number generators have a periodicity. The period of a PRNG is the number of values it can produce before it begins to repeat.

All good PRNGs are designed to have a very long period. So that's a terrible PRNG, don't use it.

(laughs) So some PRNGs are biassed toward outputting certain values more than others. A good PRNG will output values with approximately equal distribution, but some are certainly better than others.

Some PRNGs are also better at concealing their internal state.

So the middle-square method used the internal state and the output value was the same thing.

Others have a separate internal state that's distinct from the output.

The most common PRNG in Javascript is exposed via Math.random.

This has been available since the beginning so let's take a look at how it works.

You invoke it in the console and it will just output a random binary number between 0 and 1.

The ECMAScript classification defines that returns a number with the positive sign greater than or equal to 0 but less than 1, chosen randomly or pseudo randomly with approximately uniform distribution over that range, using an implementation-dependent algorithm. Uniform distribution basically means this, it includes 0 and all the possible values that Math.random can produce are uniformly distributed between 0 and 1, but 1 is not a possible value.

Prior to 2015, different JavaScript engines used different algorithms.

Many of these were very weak, and suffered from many of the same problems I outlined with the middle-square method.

In November 2015, an article by Mike Malone on a medium called Today I Fucked Up by using Math.random, he gave a fascinating tale about how they were using Math.random to generate what they thought were unique IDs.

But due to serious flaws with Chrome's algorithm, the chances of clashes were significantly increased. It turned out that due to the way they were converting the floating point number to a integer, it effectively only gave them 16 bits of randomness which was not enough and gave it a very short period.

Just a few weeks later in December 2015, the V8 blog announced major changes to their algorithm. They acknowledged the issues from the article and took it as an opportunity to completely rethink how their Math.random implementation worked. They picked an algorithm that was much better suited to the task at hand but it was not only the V8 team that was paying attention. Over the next few weeks all of the major browser engines had switched their implementations to use an algorithm called Xorshift128+.

The Xorshift generators are among the fastest non-cryptographically-secure random number generators, requiring very small code and state.

It has a maximum period of 2 to the power of 128, reducing the chance of repeating sequences. It's very fast due to its use of bitwise operations, and it passes many of the randomness tests that are used to verify how uniform the distribution is, and other statistical measurements to check for randomness. All JavaScript engines take their initial seed from the operating system.

On Mac OS this is dev/urandom, similarly on Linux, and on Windows it would be other crypto APIs. And in browsers it's also reseeded every page load. But I always find the best way to learn about how algorithms work, is to try to implement it myself. So I wanted to implement Math.random in JavaScript without using Math.random.

So, the way it works, is it stores two internal 64-bit state variables.

It uses very fast bitwise XOR and shift operations and it adds the two internal state variables together to produce a final output value on each iteration. To implement the feed, I needed a way to represent 64-bit integers. Normal JavaScript numbers don't allow me to represent 64-bit numbers. So JavaScript has recently introduced a big integer type. So the type of 1n is the "bigint".

So bigint numbers are represented by appending the suffix "n" to it.

So to represent 64-bit number in a normal case, it's an inaccurate result, but if we do the same calculation with the big integers we get an accurate result.

Also we need to know the AND XOR and the Left Shift and the Right Shift operations. With these concepts we have everything we need to implement our own XOR Shift algorithm in JavaScript. Let's take a look at how it works on a higher level. So we start with our 2 internal state variables, state 0 and state 1.

When invoked, these state variables are assigned S1 and S0 respectively.

Next, we apply some Bitwise XOR and Shift operations to these variables.

Then we uptake our internal state and finally we add together our state to produce a single, 64-bit output value.

If we implement this in JavaScript, we start by first defining some constants for the bit masks. These are used to constrain the output to 64 bits. Next we define our state variables and initialise them to some random value.

So to get random seed functions, not shown, but in a real implementation this is where the browser would read from the operating system dev/urandom.

Then we define our xorshift128Plus function. At the end of the function we're going to add together the 2 state variables, state 1 plus state 0 and take the result modular 2 to the power of 64, which then constrains the value. So if it exceeds 2 to the power of 64, it will wrap around to 0 and so on.

We assign state 0 to s1, and state 1 to s0. Then we perform some XOR and Shift operations for these numbers.

So the constants there 23, 17 and 26, they're defined in the Xorshift specification and those are the values that are used by the browsers. There are other options, but since they're the ones used by other browsers, that what I used.

So that's the core of the function where you're manipulating the state s1 gets updated and s0 was the value of s1.

Finally these are then reassigned to our global state0 and state1 variables, ready for the next iteration, and that's pretty much it. If we now run xorshift function in the console, we start getting our random numbers output. But you might be thinking these are integers, Math.random is supposed to return a floating point number between 0 and 1.

You're right.

All we need to do, is convert from unsigned 64-bit integer into a floating point value in the correct range. There are a couple of different strategies to use here, with slightly different results.

The approach used by Firefox produces up to 53 bits of information, whereas Chrome provides only 52 bits of information. But the approach used by Firefox is slightly easier to explain so that's the one I'll do. So, we take our 64-bit unsigned integer represented here by the green boxes.

The first 11 bits are always going to be 0, they mask the value to 0.

So the minimum value 0 is represented by all 0s obviously. The maximum value is represented by setting the lower 53 bits to 1.

All other possible values are going to be randomly distributed within this range and appear somewhere on that number line.

If we then divide that number by 2 to the power of 53, we'll get a result between 0 and 1.

If we take for example this number, about 1.8 quadrillion. If we divide this number by 2 to the power of 53, we get a result of about 0.203.

Just for comparison, the maximum possible value output is about 9 quadrillion, will result in a random number of about 0.9999. It's not possible to get 1.0 out of Math.random. So now that we've implemented our own Math.random implementation, let's see if we can use that to then predict random number values from Math.random.

An article published in May 2016, "Hacking the JavaScript Lottery", did just this. This was published by Douglas Goddard and titled "Hacking the JavaScript Lottery".

It describes a simulation lottery in which it was using Math.random to generate the lottery numbers.

So the idea is that by capturing a small number of random numbers from Math.random and running it through a mathematical algorithm, we can then calculate what the initial seed values must have been.

So the code provided in the article was written in python and used a utility provided by Microsoft Research called "Z3 prover".

It's basically a mathematical logic prover system it's beyond the scope of this talk of how it works but it's fascinating.

We start by taking the value from Math.random, we multiply it by 2 to the power of 53 which gives us our initial number that Firefox is using, the non-floating-point version, but we need 3 of those values.

We then pass that into the Z3 solver algorithm and then that will output to state 0 and state 1 values that we can then use to pass into our own Xorshift implementation.

We're going to do a bit of a demo.

On the left, we have Firefox which will just output some Math.random, this is the raw values from Math.random.

On the right, we have my own implementation of... My own implementation of Math.random which I can pass my own state values into.

Every time I get a new value it shows me the new internal state.

By taking 3 values from Firefox and passing that into my python utility, we read from the clipboard, pipe it into the python script, and then pipe the output back into the clipboard. So we run this...

As you can see it took my input values, 0.26, whatever, and this runs anywhere from 20 seconds to a minute. So this is going to be a bit random.

(laughs) Okay, let's hope it works.

Usually it's around 20 seconds.

(mumbles speech) So, here we have our two state values, I don't know if you can see that but they've been copied into the clipboard for me. So if I paste those values here, and get seeded random value, we'll see that the values that Chrome is now giving me, match exactly what Firefox gave me.

(clapping from audience) It gets better, so if we keep going, we then predict what Firefox will then give after that. Three five, six seven one, five six four.

(laughing) Yeah, so Math.random is predictable given sufficient input. (mumbles speech) So, what if you need a cryptographically secure random number generator? For situations where you need a more robust source of randomness, there are alternatives to Math.random.

In Node.js there's a crypto module that you can import, in browsers there's the Web Crypto API which among other things provides crypto.getRandomValues and other more specific functions for cryptographic applications.

Unlike Math.random which merely takes its initial seed value from the operating system, this reads all values directly from the operating system's random source which has been designed to be much more secure. The benefits of the API, unlike Math.random, it's not possible to predict future output with the current level of technology.

All that is read directly from the OS from dev/urandom, on Linux, on Mac and whatever, and they also provide much more random data with the ability to populate up to 64 kibibytes of data. Using the API is simple.

We start by creating a typed array, an array of 8-bit integers.

Then we call crypto.getRandomValues which populates the typed array and then for the example we can log it to the console which shows nice random numbers that we can then use for our application.

So how is the CSPRNG more secure? The key is in entropy accumulation.

When I said earlier...

I talked earlier about the entropy source, the operating system is constantly gathering new random input from various measurements. So it could be thermal measurements, it could be your keyboard timing, your other user input. It could be all sorts of thing that are then being combined within the system to continually update that entropy and use that to reseed. It's also built using much more rigorous cryptography-based functions like AES and Charter 56 for example.

The different operating systems use different algorithms but we're not going to go into that.

So when it comes to choosing randomness, not all random functions are created equal. Different applications will have different requirements. We've covered the two functions that are widely available in JavaScript, two web applications Math.random and crypto.getRandomValues.

Understanding the difference between these two can help you to decide which one, if any, is right for your application.

Or perhaps you'll need to find some other features provided by third-party libraries such as explicit seeding or weighted selection. If there's one thing for certain, your choice of random function is one thing that should not be random.

Carefully consider what you want out of your application and the features are important to you.

But if you only get one important takeaway from this, you should at the very least understand why you should never use Math.random for security. Thank you.

(audience clapping) (upbeat tempo music)