Machine learning in the browser

Machine learning has really come of age thanks to ever increasing compute power and access to troves of data, what if we could respect the privacy of the users and run it on their device without sending data to a central server. Introducing tensorflow.js a JavaScript implementation of the very popular ML library from google, you can run it in inference mode or even train a model if you’d like.

In this talk we’ll explore what’s currently possible and some potential use cases.

(cheerful bouncy music) – Yeah so, I don’t have caticorns, or cute dogs to show.

But, my pants do match my slides (laughing) So yeah, I’m gonna talk about machine learning in a browser. So this is kind of a new area for client-side machine learning.

And it’s got some really interesting kind of fun things we can do, and there’s practical things that I’ll show as well. There are the possible.

I’m Ryan Seddon, you can find me on all the online places under @ryanseddon.

If you’re wanna tweet me, I’m not gonna share all my slides and stuff on Twitter. After that, just a quick overview of what we’ll cover. I’m gonna look at high level of machine learning. Just a level set for everyone so we know what we’re talking about when I refer to machine learning.

A lot of people have heard the term but don’t specifically know what it means and what’s involved.

This, you’ll find out what’s after it, there’s an article by one of the speakers, Charlie, Smashing Max, he did a really good primer, on machine learning in the browser that you should read. And I’ll share that link after as well.

And then we’ll look at some demos and what’s possible today. What’s not covered, we’re not gonna talk about training your own models, that’s creating a whole new machine learning model that has a new feature learned something, Transfer learning. So that’s about taking an existing model and adding new cat abilities on top of it.

I’m gonna be talking more so about TensorFlow.js which is the machine learning, Google’s machine learning browser library.

I’m not gonna go into specifics of the API ’cause it gets pretty complicated and math heavy. And I’m not gonna share any slides of code. So it’s gonna be high level.

So yeah, so machine learning has a pretty high barrier to entry.

You’ll see a lot of terms like linear algebra, gradient descent and polynomial.

If you try to read a beginner article on it, and it kind of, you get that reaction you’re like, I don’t get it, It’s really hard to get into it.

I know everyone’s really quite interested in it. I was really quite curious.

And reading all those kind of terms without a a heavy math background is that you just like, you look at it strange and you’re not sure how to progress from there to how to use the library.

So I had to understand how it all works.

And so yeah, it’s math all the way down.

If you really want to dig into it, you really have to understand some of the math terminology. I like this tweet, because it kind of, a lot of people talk about they have AI machine learning their products.

And really, it’s just revealed as a bunch of if-else, control flow, Master marketer, that is AI But if you look at the visualisation there, this is a very simple visualisation of a certain type of machine learning algorithm called deep neural networks.

And I’ll say simple because if else is your real one, it wouldn’t be enough real estate on the screen to show it and it’d probably just be a black blob, showing all these lines connected.

That’s kind of like a process.

So you look at the input layers, kind of you can think of like a pixel in an image could be like one of those input layers.

And then the hidden layer is a kind of like machine learning magic that happens.

So it’s a lot of linear algebra and calculations that happen across all those pixels. And it kind of figures out, for example, if there’s a cat in the image, and the output would be a series of things like series of probabilities, or like a single probability saying, Yeah, we’re pretty confident this is the cat, but we’re not very confident this is the cat. And there’s kind of different outputs you get from a machine learning algorithm.

I like this XKCD article.

Now you fill your data in, you mix it around and if you don’t get the right answer you keep mixing.

So if we try to level set what machine learning is, it’s a process where computers learn, by discovering relationships and data.

So you have a lot of data from somewhere, if you’re not a big company, like Google has Lots and lots of data that they collect.

So they can create really interesting models from that. And the model is a description, that’s kind of the the output of those relationships. So it’s like the packaged up machine learning thing. And then you can take that model and do what’s called inference.

And that’s applying new data to that model. And then you should get something out of it. So for example, you could say, does this image contain a cat, and that machine learning model will break apart that image and analyse it and then give you a probability or a distribution, if it’s a cat or not, and then so you get the output from that.

And if we try to compare it to classical programming, so, classical programming is what we do today. This is a really good diagram.

So you’ve got you’ve got rules and data for classical one.

So you’ve got rules and data and you sort of programme around that, and you get answers out of that.

If we think about machine learning, we actually feed in the data and the answers so we know the answers ahead of time. Put it into our machine learning pole of data, and then we get rules out of it.

So then when you fit in new data, you should get the answers you expect from new data. So the idea is you train, you give it the answers, and it learns what the right answer is.

And it has rules coming up, in the form of what we call the model.

So that seems pretty complicated, but don’t despair since TensorFlow.js makes it fairly easy. And it’s a few other libraries that make it even easier than TensorFlow itself to start consuming some of their pre-trained models, so they have a lot of pre-trained stuff, that you can start using now.

And you don’t need a PhD in math for the really complicated stuff.

So we’ll have a look at some examples.

So we take this scenario.

When we get there, if the following image is a cat, we can get a probability if it’s a cat or not. And then we can also get a distribution based on what type of species of cat it is. So, this is running in the browser now.

And it’s already, I’ve already precomputed ahead of time, so it didn’t crash.

You can see here it’s given, a distribution of probabilities of cats.

So if we say the one, two, three, four, five are predicting it’s a cat, so it’s about what 92% sure that’s a cat.

So it’s pretty confident it’s a cat and then it’s trying to give you a probability based on what species is from predefined terminology. So it’s, you know, it’s about 70% sure, that it might be an Egyptian cat.

I don’t actually know what cat that is.

I assume it’s an Egyptian cat.

Hopefully, but the machine learning’s only as good as what it’s trained on.

So if it’s only trained on a small subset of data, you won’t get as like fine grained results as what we’re seeing here.

Another cool one that they have, that Google has, so they provide is kind of creating the Bokeh effect, so If you’ve got portrait mode on your phone, or you take a selfie or kind of blurs out the background. We can do that in the browser, using one of the machine learning models called BodyPix. Oops, I scrolled ahead there.

Yeah, okay, so using BodyPix, it’s basically, it does a lot more than kind of just doing your blurred background. And basically, it’s segments of human body in an image or a video so I can do a live video. And you can kind of segment around that and tell you where the person is, where their arms are, where the head or where the torso is, before our demo.

We just wanna get the outline.

So you can see here, this is a GIF they’ve got on their GitHub page, where you can get the JavaScript to do that. And you can see kind of like it’s blacking out the body and now it’s showing the segments so you can see you can just take forearms the whole arm, the torso, the right and left leg and the head. And so there’s more interesting stuff than just blurring it. But you can kind of do that, you can create a mask around it, we can blur out people in an image.

Do pretty cool stuff.

So if we take, think the slide’s out of order, so we’ve got an image here, it’s a pretty famous image of Buzz Aldrin on the moon, Apollo 11, I’ve got the image there.

And then we’ve got the mask.

So that’s created through the machine learning model, it’s going undetected, where it thinks the human is in there, and kind of cradle that one, like it looks a bit janky there, but it’s roughly is a correct outline fully. So then if we try to blur the background.

So you can see there, I can take that mask, and I can apply that over the image and kind of do that blurred out background and focus on just the astronaut.

And that works pretty well.

Like even though he’s wearing a big helmet, you can’t really detect that it’s a head or not. It’s done a pretty good job of kind of figuring that out doing that.

And that’s all right in the browser, this is now sending data up to the server to run locally, using the TensorFlow library. Okay, another really cool one that I really like is called style transfer.

And this is using a layer on top of TensorFlow, called ML5.js And it makes it even easier, to work with machine learning models.

And it’s really beginner friendly.

So that makes it like kinda like jQuery functions. But you can do a lot more stuff like machine learning. And what this does is it takes a like a famous painting or some sort of style of a photo and try to apply it to an input image so an image just, by of yourself or of something. We’ll take that and I’ll try to do it.

But just a side note, so like three years ago, I remember playing around with something similar. It’s not quite style transfer, but it was called Deep dream.

And Google was trying to visualise, what was happening, in their machine learning models. Throughout they’re not creating images but ended up creating these really kind of funky looking images.

Now remember, I had to load up Amazon Web Service EC2 instance with a dedicated GPU and I had to set up all this Python code.

And it was like, I had no idea what I was doing.

But somehow I got it working.

And it ended up costing me $15 like seven minutes work. And because it’s quite expensive to that, and that was kind of the output I did, so that was like, 2016.

So, it’s kind of really cool kind of an image. That’s the if you can’t quite tell that’s Buzz Aldrin on the moon, same image I had before. And so with the style transfer, you can take like a famous painting like everyone kind of knows the screen, you know, everybody’s seen it, it’s a famous painting. someone pointed out the other day that it kind of looks like a poorly painted spaniel dog If you look at it, like the ears, you kinda can’t unsee it and then it’s the nose so, it could be a screaming person, it could be a dog. I can always see a dog now someone’s pointed that out. And, and so we take that input image which is of the Apollo 11 photo and then we apply that we transfer over that style. You can see that it’s transposed that screen, like artistic impression that the machine learning model’s learnt and it’s uploaded to that image in the browser. And you’ll see it actually looks like a painting nail at the ground kinda looks like strokes and it’s got the colours in the background and just a really effective interpretation of that image, and that’s all kind of in the magic of the machine learning box.

So I guess it’s kind of like three points, on why you would do machine learning on client side. To me, that’s kinda like privacy, you’re not sending up your photos are video source to some random server and you don’t know what their data retainment policies are and who they’re selling your data to.

And there’s been a lot of incidents like there’s been some Chinese apps and some Russian apps where you take a selfie, and it doesn’t mean no fun, but you can share with friends, but then you find out they’re actually selling it to advertisers and facial recognition systems and all that kind of nasty nefarious stuff. Whereas if you’re do it on the client, there’s no kind of talking to a network, you can switch off your WiFi and your cell tower connection if you’re really paranoid, and it will still work.

And also a benefit of having it run on the client is you get performance and latency improvements. So you’re not doing a round trip to a network and it’s not trying to do all the computation on the network on the computer out in the cloud somewhere and then sending it back you kind of get that fast feedback loop on the client.

And I think the cool thing about the web is this cross platform, so long as it can run a browser you can run that machine learning model on a mobile phone, on a laptop, Windows, Linux, Mac, it all works. But one thing to point out, the TensorFlow.js models are quite large for web standards, and you’re really used to, trying to keep applications under a certain size. And these kind of models will blow it away straight away. But it’s all kind of in the delivery of it, you wouldn’t necessarily bundle it up with your package, you’d probably choose the right moment to download that and tell people ahead of time.

So if you look at like a network trace of just that, the BodyPix demo, you can see here, essentially, it’s got this model JSON.

And that then tells it to download, you know, one of many binary files, that actually contain all the model information for the machine learning to actually work.

And that’s roughly about 5 megabytes.

This is actually one of the smaller ones, that can get up to about 100 to 200 megabytes. If you’re for instance, if you were to do a speech recognition that could be gigabytes and there would be obviously not, I think you’d run on the client because it’s lots of information, you need to understand the whole, like English language, let alone more languages.

So there’s kind of use cases where it makes sense to bring down the client.

And use cases where you have to go onto a cloud service to do it.

It’s still very early days, like TensorFlow.js kind of start with they had this thing called Deep Learning js. And then they kind of followed into the TensorFlow umbrella. Just as an experiment to get it running.

So it’ still fairly new and it’s still sort of integrating they’re on a stable version now, I think version one point something, maybe even two they’re trying to keep lockstep with actual the the TensorFlow library that people use for serious machine learning. And also, Google just recently announced, they’ve got a platform called AutoML where it’s kind of like infrastructure for training the machine learning model and there it Export to TensorFlow.js.

So, they’ve had some really good examples of people using it, in the world for genuine use cases where like connectivity isn’t great, so they have the, like a browser running a locally cased model. And then they can do like sort of image detection in sort of remote areas, and it got pretty good examples of that.

And so, soon you got to get all the cool models that people are working on that you can’t quite run the client yet.

And you’d be able to do AutoML, then export it to TensorFlow.js and start experimenting with it.

Where this could shine, you could do a progressive web apps, a PWA app, using service workers, you could case that five megabyte model offline. So then, let’s say, sort of one kit for the user initially, and then like it’s offline from there, and they can continue to get that benefit locally in their browser.

And the other cool thing is you can train a new model off an existing one, that’s called transfer learning. Google’s got a really cool website called Teachable Machine.

If you just Google that, it should probably be the first result.

And that shows you how to take a machine learning model and then train it with webcam input.

So you can train it to steer a thing around by moving your head, and it kind of learns what left is.

Take a few screenshots, right, and you can kind of train a new model on that. And you can start to do interesting stuff.

Really, recently, Google also released a thing, called Creatability, which is about making creative tools more accessible to people who may not be able to use a mouse to people who can’t actually hear, so they’ve got stuff around visualising sounds, so people who might have some deafness or complete deafness to be able to sort of visualise and see what the impact of sound is.

But an individual term, they’ve got ways you can contract your nose. And you can play musical instruments.

So you turn your webcam on, you can turn your head and start playing like a piano that’s displayed on the screen.

That’s kind of really cool to see that, I could, it’s fun for us to use.

But it’s also got a tangible benefit for people who might not be able to get that create outlet.

And you can see a GIF there.

So, if you were to load up a demo, you’d be able to play sounds and it kind of tracks your nose.

And you can then move it up and down, to change the pitch and change like what note to place. And that’s kind of that’s all done locally, using TensorFlow.js.

But a warning if you do like that, I think that’s about 100 megabytes, that model to do that.

It’s quite large.

Beyond TensorFlow, so TensorFlow is kind of like the dominating machine learning kind of library that a lot of people use. At Zendesk, we use that the very smart people with PhDs, at my work, they use TensorFlow to do a lot of their machine learning stuff for our products. But there’s also all the other big kind of tech firms are getting together.

And they don’t want Google to have the monopoly. So they’re kind of collaborating on what’s called The open neural network exchange, where you’d be able to run, multiple different machine learning formats, consistently across different platforms.

And essentially contains all the big tech fans but not Google.

So like, you know, Amazon, Facebook, apple, Intel kind of those people. And they’ve also got a browser version, but it’s not as polished and the documentation is not as great and it’s called just ONNX,js.

Interestingly, they, ship with the ability to run the machine learning in a web worker.

Now, TensorFlow has recently added this too, but if you are to run any sort of machine learning in the cloud, ’cause it’s so computationally heavy, it actually locks up the main thread and you can’t scroll or do anything until it’s finished. Whereas if they run in the web worker, they can sort of put in a separate thread.

And that’s kind of rolling out to all the sort of clients, I love always to improve performance.

Another interesting thing I found was what’s called the Web Neural Network API. This is a proposal by Intel, to bring sort of machine learning inference natively to the browser At the moment, all the libraries rely on the hack of WebGL. So they spin up a WebGL, hidden WebGL canvas.

And then they start doing all the complicated matrix multiplication, math, using the libraries from there, ’cause it’s kind of similar to the graphics, the graphics, math, and they use that.

So Intel want to make it a lot faster and be able to run kind of machine learning models natively in the browser.

They have a custom build of Chromium with API enabled. That’s obviously more performance.

And I also have a polyfill that you can use, you can use it today.

And the benefit of that is you can use a lot of the machine learning models that aren’t actually, haven’t been converted the TensorFlow.js in the browser, so you can do some more interesting things that they’re starting to show with ensorFlow. And then I’ll share this later, but can get GitHub, like web machine learning and is called webin and this is fairly new. Some resources tensorflow.org/js, ml5.js that’s the library that I use for the style transfer. That’s like a friendlier version of abstraction of TensorFlow.js, makes it a little more approachable for beginners. And I’ve got some demos as well.

A link to the ONNX.js library and this notebook, which will make sense in a second and escape out of there.

I’m using a tool called observer, which is kinda like, if you ever used Python notebooks where you can do computation on data this is kinda like the browser version of it. So as per usual, I thought I should build my slides in there because I’m insane bits into it.

And so you can do, you can write code and have visualisation of it. And if you wanna know more about it, come chat to me after. That’s it, Thanks.

(audience applauding) (bouncy music)

Join the conversation!

Your email address will not be published. Required fields are marked *

No comment yet.