How to AI in JS?
(upbeat electronic music) (audience applauds) - Hello, is this on? Yeah? Hi, hello, hello everyone.
Hi.
All good.
So welcome to this talk.
To How to AI in JS.
Thank you very much for the introduction.
My clicker, again, not working.
Oh, no it is.
Yeah Asim Hussain, you can find me on Twitter as Jawache.
Not Jowashe, Jawache.
I blog about Angular Javascript, other things, on my website codecraft.tv.
Asim.dev is almost ready, about to be released. I'm a cloud developer advocate at Microsoft, working on the (speech too fast) We are the open source advocates, so you can find us all via this link there. Aaron who spoke to who you spoke to, spoke to you before.
He's also one of our wonderful advocates based in Australia. Which is the country that I'm in right now. Anybody here heard of Udemy? Yeah, anybody my students on Udemy? Come on.
(audience laughs) Not even one.
There's usually one, like I'm not the most popular instructor on there.
But early end of last year, early of last year, they added in automatic subtitling to Udemy. (audience laughs) And so, they correctly transcribed my name to Awesome. For a while I changed my ID in our instant messaging platform to Awesome. So if you had to message me, you had to type @Awesome.
But my boss at the time quickly got me to change that. And I also co-organize, or co-run a meetup group in London called AI JavaScript with my friend Eleanor Haproff.
It's kind of early last, yes last year.
Yeah, early last year, we basically went on a hike. We both have an interest in machine learning and JavaScript.
There were some interesting things happening at that time.
We saw that, and why don't we start a meetup group, and that's what we did.
And very soon after it some very very interesting things happened. And it's kind of grown a lot.
So I think now we're at like 1700, 1800.
I can't remember.
There were quite a few meetups over the last year. And what we discovered in those meetups, is people would often come to us afterwards and go "hey Elle, Asim, there's this really cool "demo online that demonstrates kind of AI machine learning "in their browser or with JavaScript." And we said, "Oh this is really great, thanks." And eventually we thought well why don't we just stick all these things together and put it in a website? And that's what we did.
So we launched a website called aijs.rocks. So if you go to aijs.rocks, what you'll find is a whole bunch of different applications. All written in JavaScript, all expressing some sort of feature of machine learning or AI.
And it's all in JavaScript, so most of it is still stuff in the browser so you can always kind of play around with it. And then at the very least you'll have some code you can look at so you can see how it was built, or hope for like a blog article. Or maybe even a video, or something like that. So the goal of the website was to kind of inspire you with what's possible.
And then give you a journey to kind of figure out how it was built and learn a bit more about it. And this is how I've kind of understood more and more about the space, was finding something cool, and going, "how the hell did they build that?". And then through that journey, learning and figuring out, right.
And that's what I'm doing in this talk today. In this talk what we're going to talk about is four different... four? Where did that come from? Three different applications- I am massively jet-lagged, it is unreal right now.
I woke up this morning and I honestly couldn't remember what country I was in.
It was really a surprise, shocking.
But, so it's three applications and I'm going to go through them.
I'm gonna show you how they work, I'm gonna dig into them a little bit deeper. And hopefully, though the journey of today you're gonna learn a little bit more, little bit more, little bit more about what's possible.
Hopefully give you a bit of an overview.
And then you can learn more yourself.
All right, lets get started.
First application is actually one I built.
It's called TheMojifier.
It's not trademarked.
Its just TheMojifier, so it kind of makes sense. And it's, if I do say so myself, it's revolutionary. What it does is, what it does is...
Okay, I give up.
I'll use this thing.
What it does is if you give it an image, and it's got some faces in it.
It will detect the faces, any faces in the image. It will then detect the emotion in the face. And then it will, any guesses? (audience mumbles) What was that? - [Audience Member] Emoji your emotion.
- It will emoji, yes it will basically find the appropriate emoji and place it on the image. (audience laughs) That's what it will do.
So as I said, revolutionary.
I am open to an investment.
So it's available in many different formats right now. Well this current incarnation is as a Slack app. And if you actually go to TheMojifier.com today, click Add to Slack.
It will actually add this to your Slack workspaces at work. This is kind of how it works, so you basically just- there's my son.
You basically just take a picture, /mojify, put it in.
Boom, mojified it.
It works with multiple faces.
(audience laughs) /mojifier there you go.
(audience laughs) And it has finally answered an age old question. That has plagued ... and there you go.
(audience laughs) That is the Mona Lisa smile.
Thank you very much, thank you.
Wow, first time I got a round of applause for that. Okay so it's doing quite a lot of different things. Quite a lot of different things.
But I think the thing that is the most impressive about it, I wrote it.
Is how does it calculate emotion.
How do you calculate emotion? That's kind of like crazy really.
How do you calculate emotion on a face? And what I can tell you is actually it's very very easy. Incredibly easy to solve this problem.
You just need to do two things.
Two things, right? Number one, detect the facial features.
Right, so you just need to have some algorithm which can basically detect the eyes and nose, this thing, I've forgotten what this thing is called. It's got a word.
And this thing.
Wow, yeah, chin.
Detect the different facial features, there's various different algorithms you can use to grab that.
Here's one link that will take you to one algorithm. That let's you grab it.
And those are the main facial features of a face. Okay, then next up is incredibly easy.
Incredibly easy.
You just have to use a neural network.
(audience laughs) And that's it, just two steps.
And you've got it right? Just two steps.
Yeah? Oh, do you need me to explain a neural network? Yeah? Okay.
(mumbles) Okay so a neural network has a base symbiology. This is a neuron.
You have a few of these in your head.
It has some dendrites going into a body.
A cell body.
And an axon, or some axons going out.
I think a axon, going out.
And if enough electricity flows into the dendrites the body activates and pumps electricity out through an axon.
Okay, that's it.
That's the neuron.
If you have enough of these in your brain, then you have a brain.
You can think.
Yeah.
And if I asked you to code this up in JavaScript, or whatever your favourite language is. It would be JavaScript in this conference.
Wouldn't it? (audience laughs) - [Audience Member] CSS.
- Or CSS, either one, either one.
Then you'd probably try, and like, remember what it was like back when you studied computer science.
I couldn't remember, cause you'd have to create nodes, yeah.
Edges, remember that term? Edges, and they're just going out.
Yeah, graph theory, hello? I think what you do is, you basically just code something like this in kind of code.
And you need some sort of features, some input, some electricity flowing in and out. We have the features of something, perhaps it is the number of days so far in this year. That we are at, 23rd day of the year.
And the current temperature, all right.
Maybe that's some input.
And you want to have something that will then tell you the number of decibels or cricket chirps, or something like that.
Then what you do is for each of the edges, you need to know kind of what is the importance of that input into the final results.
So you have like a weight, okay? And that's what you do, you basically you just have some weights, right. And then you kind of multiply all of it together, kind of to represent the body.
And its triggering of electricity out through the acts.
So you have an activation function.
And you just multiply those numbers together, pump them into the activation function.
Whatever number that pumps out, you pump it out. And that's it.
That's how you create an artificial neuron. Just to show you, maybe hopefully a little bit clearer. This is it.
Some inputs, multiply them by some weights. Sum them up together, parse them through a function. And whatever number that pumps out, you pump out. When I give workshops, I do workshops on machine learning. And I take people through this whole process. They do reach a point where they're like, "this is just multiplication?" And you're like, "yeah, it's actually "it's actually just multiplication." Like a large part of it is kind of, well it's not just multiplication.
A large part of it is these simple mathematical constructs. Right.
And there's no ifs, all right.
Forget that meme that AI is just some if statements. It's not.
In fact, if you threw an if statement in there it just wouldn't work.
Cause you need to have something where you can derive from.
So, this is like a simple activation function. Anything below zero is zero, anything above zero is one.
There's lots of different activation functions. Your selection of activation functions is something you need to choose to appropriate for the problem you're trying to solve. And I said you basically, to create a neural network, all you do is grab a whole bunch of them together and join them up.
Right, so this is a simple four layer, feet forward, densely connected neural network. Where one input layer, one output layer.
Two hidden layers.
And all it is is those nodes connect to every single other node going forward.
And if you're creating it, all you'd then do is for that weight you'd just do math.random.
Just initialise them as random numbers, all right. And that's it, so all you've got is some nodes, edges, with some random numbers assigned to them. Then you get a data set.
We're talking about supervised machine learning here, you get a data set.
And that is some sort of data, and if we're trying to calculate emotion, that would probably be a bunch of pictures. Of people's faces.
Or features, and where you've extracted the features. And then what you need to do, you need to pump those in as the inputs.
As the electricity.
For instance it might be the positions of the eyes, the nose, something like that.
You feed them in as inputs.
And all that's happening is that multiplication, all the way through to pump some number at the end. That's all a neural network is doing.
Right, that's all it's doing.
The question is, well A, what does three mean? Let's just say we've decided for this example that zero is unhappy, and ten is very happy. And this is kind of pumped out three.
But we know that this image is eight, cause a human being has gone and said, "that's an eight.
"That person is eight level happiness." So the neural network is wrong, of course it's wrong.
We initialised it with random numbers.
It's not gonna be right, okay.
So it's wrong.
And this is kind of where the real magic comes along. Or the real challenge comes along.
Is we know this neural network is off by five. What we do is something called back propagation. Which is a process of, given how wrong it is, how do you tune these numbers? So next time you go through, it's less wrong. Okay.
And that's it, that's this algorithm called back propagation.
And you can have like a read of it, you can Google it online.
There's some very good resources which kind of try to explain the different algorithms you can use.
And this is kind of where a lot of the power and magic of neural networks is.
How do you tune those numbers, and you basically do it again and again, and again and again, again again with the same data sets. And then it slowly tunes those numbers till it gets better and better and better.
Until hopefully, eventually, when you pump in that image, it eventually gives you eight.
That's it.
That's all that's going on in neural networks. Pretty simple.
There is actually a famous data set out there which has a whole bunch of images.
Which people have already gone through, and labelled the emotion of.
So if you wanted to, you could go ahead and train this whole thing yourself.
What's been happening over the last couple of years, is kind of a commoditization of very common machine learning problems.
And so a lot of the different cloud providers, are now providing a bunch of APIs that'll enable you to do machine learning, common machine learning problems very easily (speech too fast) And that's actually what I use, I work at Microsoft.
I don't know If I, did I mention that? I think I mentioned that, right? Yes, I mentioned it, excellent.
I forgot to mention it yesterday.
And that's what I used here, I used basically the face API.
Which is one of the APIs we have.
And it gives you a whole tonne of information about an image of the face.
To use it is very very simple.
You just basically parse that URL, and you do a poach request.
And it gives you like a subset, it gives you an array of things.
What do you call those things? Objects, entries, one per face.
This is actually a subset of all the information that you get.
Just to fit it on a screen.
So for each image in the face, it'll give you the rectangle.
So you can obviously place an emoji.
It's the most important part of this whole process. It also gives you face attributes, and an emotion of anger, contempt, disgust. Fear, happiness, neutral, sadness and surprise, was zero to one.
Okay, who here has got a beard? You, sir, yes, yes, okay.
Turns out you can't be 100% happy if you have a beard. (audience laughs) You can only ever be 99% happy.
So, I don't know where that one percent goes, but there it is.
And that's basically what I used for TheMojifier. This is it.
This is all a weird little algororithm which it tries to map emotion to emojis.
It's actually quite a clever algorithm, but anyway. And then just placing the emojis on the faces. And that's basically how it works.
So just to summarise, Neural networks are really powerful, incredibly powerful.
Don't be scared of them, I was scared of them when I stared looking at this stuff.
Cause it's actually, conceptually they are actually quite simple to understand.
Like I explained the basics to you in like five, ten minutes.
And the other thing I would just explain, cause when I give these workshops.
Often times I'm teaching people how to build quite complex neural networks.
And they'll come up to me afterwards, with "Oh Asim, I'm trying to do this." And I'm like "Ah, there's 18 APIs available, "that know how to do that for you right?" So step one, don't get too involved.
Like have a search, maybe there's already a bunch of APIs.
There's very very advanced ones now that will solve the problem for you.
Yeah.
All right, demo number one done.
Demo number two.
TensorFlow, MobileNet and I'm Fine.
So I mentioned I run these workshops in machine learning. And after one of these workshops, one of my students built something and sent it to me. I love it when students do that right, when you give a workshop and they build something and they send it to you afterwards.
It's really exciting.
So he did, as a basic, as a code pen.
What it does is at the bottom it's actually doing, for each of these things. It's actually doing a search on Unsplash.
Which is an image API for I'm Fine.
It's gonna be returning the image for that. And then it's putting, same kind of, what's in the image.
This is a dog, a terrier.
So it's gotta kind of guess that a little bit. I'm Fine is returning, this has got 70% of the jersey t-shirt.
The important thing to understand about this demo is the only API call that's being made, is to Unsplash for the image.
The actual detection of what's in the image, this kind of percentage thing here, that's all happening in the browser in JavaScript. Pretty cool, right? The title might have given it away, but it's actually based on a technology called TensorFlow. Who's heard of TensorFlow? Who's used TensorFlow? A few, excellent.
TensorFlow is a technology that was open sourced by Google in 2015, I think.
And it allows you to.... it does a bunch of things.
But it allows you to basically train neural networks on multiple CPUs, GPUs. Really scale out these calculations.
It's written in C, which is kind of very compute friendly. But, you know, you know.
It's written in C, so.
It's not really good for us JavaScript developers. Early last year though, they announced JS.
Don't we love it when they add the .js after things. We love that don't we? We're like, "yes, we can do something now." (audience laughs) When they first released it though, I thought it was like bindings, like node bindings. I thought it was like you had to instal TensorFlow on your computer, and then you could use TensorFlow, or interact with it from JavaScript.
But, no its not.
It's actually TensorFlow rewritten from the ground up, in JavaScript.
Right, that's the important distinction that you should take from this.
And so what that means is if you want to use it, that's all you need to do.
These are the only dependencies that you need, if you want to play around with this in JavaScript in the browser.
So either an import, or I like to use script tags. Just a quick aside, if you're using the node version of TensorFlow.js it does have dependencies. It actually does instal proper TensorFlow in your node modules.
You'll find like a couple 100 megabytes taken up by a file.
Which is actually TensorFlow.
It's actually node TensorFlow is actually closer to full TensorFlow.
You can do a whole bunch of things with TensorFlow.js.
You can do what I just explained to you just earlier on. You can train models, that whole process of getting all those neurons and parsing in datasets, and having to tune those numbers. That whole thing you can do with TensorFlow.js. What you can also do it, load a model that's been trained up previously and just use it. Okay, that's what we like doing, usability. If you actually go to the TensorFlow.js website, you'll find a whole bunch of pre-trained models that you can start using.
And it's a great way of getting started.
It's how I start teaching it.
And what Oliver used, he actually uses a model called MobileNet.
Now what MobileNet is an image detection model. The clue is in the title, mobile.
You can optimise for use in the mobiles.
So you can actually detect one of a thousand things. Which is why it probably wasn't very good.
Can it really detect a thousand things? There's more than a thousand things in the world. But that's it, it's only four lines of code in JavaScript. In order to get that functionality.
But you saw right, it's pretty incredible.
Pretty incredible right? So I think ten years ago, we couldn't tell if an image was a cat or a dog.
Now in JavaScript we can do, in the browser we can do this. I think that's pretty incredible.
But, it's mobile.
It's MobileNet, so it's not that great for really detecting what's inside an image. Okay, cause it's been optimised, size wise, to be able to be something that can be downloaded via the browser.
And if you really really really want to use, or want to know what's inside an image, you need to use a much much larger model that's been trained up from a much bigger data set. And that's the kind that you can't really download effectively onto your computer.
And that's why certain companies out there, make it available via APIs.
So, there's another one here called Computer Vision. And it does a whole bunch of stuff, it's scary what it does.
Not scary, not scary at all.
(audience laughs) It's all good, its all good.
But one of the things that it does, that is definitely not scary, is that given an image it will give you a human sentence to describe the image.
As if a human being had written it.
And so my friend, Sarah Jasner, last year.
She thought that wouldn't that be good to generate alt texts.
Cause you know we're all supposed to put alt texts for accessibility in image tags.
But some of us forget.
Wouldn't it be great if there was this tool which can just look at the image and go, "oh, that's what it is", and put it as the alt text.
And so as a proof of concept she kind of wrote this code pen.
And you can just upload your image, and it will basically ... the text at the top, is what it returns to describe the image.
Does anybody know this TV show? Yeah, Halt and Catch Fire? Absolutely amazing TV show, stunningly good. But the star of the entire industry that really started in 1985, and they're tying to build a laptop? The first laptop? This is basically Mackenzie Davis is the actress, she's in the centre.
So it's "Mackenzi Davis, et al, "standing in front of a building." That's what the API has returned to describe the image.
Pretty cool, right? Obviously, well she released this on Twitter. And as you know on Twitter people are very friendly, very supportive.
(audience laughs) You know, they'll very kindly let you know that you're wrong.
And so once she released on Twitter, some people basically you know, let her know that it doesn't work all the time.
So, this is one that we failed at.
So for this image it said, "a star filled sky". The text says four something.
No, that's not right.
The next one, now we could be right.
We could be right, you know.
(audience laughs) Yeah, I mean, they might be dead cats.
You don't know, do you? You can't be certain.
And I like, I really like the next one.
Because we're actually half right.
Half correct.
(audience laughs) Yeah, yeah.
50% is a pass mark where I come from.
(Asim laughs) Don't know.
I think its the ... dogs don't normally wear... What are those things? Things.
So yeah just to summarise, probably the best takeaway, the main takeaway you should take from this.
TensorFlow.js doesn't have any dependencies. So if you wanna start playing with it, just import it and you can start doing some machine learning.
It's pretty cool.
I really recommend, in my workshop I start off with kind of MobileNet.
But just start off with one of the pre-made models. Just start experimenting with it, use it.
Kind of what's called inference mode.
Have some fun with it.
But you know, things like MobileNet, it's kind of fun. It is useful for other things, but to really take what's inside an image you probably do need to use an API.
All the different cloud providers have APIs. We have one, it's called the Azure Computer Vision. Can we give that a go.
All right, last demo, the favourite.
A fairly good one, can't build it up enough really. Right here we go, it's called Image2Image.
And I'll just show you.
(audience laughs) So this is running in the browser.
It's running in the browser, if you go to aijs.rocks you'll find it.
Or in fact if you just go to this link you'll find it. It's made by a gentlemen called Zaid.
I think he's still in university.
Very very very clever, clever guy.
What it does, is if you draw the outline of a cat, it will fill in the rest of the cat.
That's what it's doing.
And it's using an algorithm called Pix2pix, which is an implementation of something called a GAN. Which is a generative adversarial neural network. Which is a fairly new technique, it's only been around maybe for just over two years. And I think its a really really really interesting type of algorithm.
And I'm just going to explain to you how it works. So it uses basically two neural networks, and we all know what they are now.
Cause I taught you it.
It uses two neural networks, that are competing with each other, right? Adversarial, they're competing with each other. You have one called the generator, and you have another one called the discriminator. The generator's job, is given some input.
And the input we're giving it is a set of images, which are just outlines.
Okay, given some inputs, its job is to generate, or fill in the outline and generate a cat image. But, remember when we first created a neural network, we just initialised it with random numbers. So it's not gonna do a good job.
Of course it's not, it's just random right? So it's probably just going to generate noise. We take all those images that it generated, and we combine them with a set of real cat images. Okay.
And the discriminators job is to figure out if an image is a real cat image, or a generated cat image. That's the discriminator's job.
So what happens is the discriminator, if it got it like, wrong.
And the discriminator, of course it's gonna get it wrong. Cause it's just randomised weights to begin with. Then it's gonna do that thing I described before, it's gonna kind of retune its weights with how wrong it is, you know.
So it becomes better.
If however, if it got it right, well then that means the generator's not doing a good job of generating good enough cat images. So, the generator then tunes up.
It's kind of very simplified version of what's going on. But essentially that's what's happening is that as the generator gets better, the discriminator gets better, the generator gets better.
And they're fighting with each other all the way through, until eventually the generator is doing such a good job of generating cat images the discriminator can't really tell anymore, what's a real cat image and what's a fake cat image. And then you throw away the discriminator.
Probably not throw that away, you just cut and paste the generator maybe.
Or you export the generator, that's to say. And then you can export it in a format, in a JSON-format so it's kind of usable in the browser.
And that's basically ... and then you can start using it the browser, and that's what Zaid did.
And that's it, that's how it works.
The generator is a GAN.
All right, it's like cool.
It can generate cat images from outlines.
Who cares, how useful is that in this world? Well, it doesn't just have to be cat images, okay. So this is another GAN.
These outlines is the inputs.
And these are the actual outputs.
All right? But this isn't Pix2pix, this is Vid2vid.
All right.
This is generated from this.
This is actually not running in Javascript yet. But, it is running.
You also can just have one input, and multiple outputs.
Again these are all generated from the input outline image. Fun and games arriving now, this quiet.
The input doesn't have to be outlines of things. The input can be whatever you want it to be. Whatever you want it to be, right.
So here's an example where the input is what's called a segmented image.
So the different colours give an idea of depth, and what that allows you to do is perhaps something like this.
So she's not dancing, that is a generated image from that. Could do this for my wedding video or something. (audience laughs) It would be a lot better than what it actually looked like. It actually looked a little bit like that.
And anyway, here's another one of a street scene, of a car , it's a game.
So top left is the input, these two are different algorithms, so you can see they're not very good.
This is the Vid2vid algorithm.
And you can see that that's all generated.
That entire scene is generated from the top left. And again, as I said, the input can be anything you want it to be.
Anything you want it to be.
It doesn't just have to be outlines, doesn't just have to be segmented images.
It can actually just be text.
The inputs can be texts.
There's an example where the text at the top is the input to the generator.
And the generator's job is to generate an image, to represent the text.
So, "this flower has long thin yellow petals, "and a lot of yellow anthers in the centre". Stage one is after 600 loops.
Stage two is after 1200 loops, and you can see, pretty good.
That's not so good.
The rest of them are pretty good.
This is another one for birds.
So "this bird is white, black and brown in colour, "with a brown beak".
Uh, 600 isn't so good.
But 1200 is really good, right.
Pretty good.
Apart from that one.
Oh, and that one.
(audience laughs) They're a little pretty good.
How far away do you think we are from somebody just being able to type, "create me a e-commerce website, blue-ish.
"I want to use Paypal, four pages, an about page ..." And it just generates it for you.
I gave this talk once, and somebody in the back went "we're doing that".
So, you know, I don't think we're that far off. To be honest with you.
Yeah, that's where we're at.
To summarise, so GANs, they learn to generate new images. They're generative.
And that's what's so exciting about them, generative. Discriminators, which is how most models kind of work these days, that people train up.
They can just tell you if something is something or isn't something, or they categorise something. GANs, they're generative and actually generate stuff. Which is why they're so exciting cause they create.
And you see a lot of these examples if you look online. There's a lot of stuff being done in the art and music space with GANs.
Cause they can generate it from something.
They require a tonne of computer training, like a lot.
More than perhaps a normal one.
But you can take the model, and if you're very very careful with how you optimise and export it.
You can actually make it small enough to be run in a browser.
Which is how Zaid did all his examples.
Yeah, that's it.
If you want to learn a little bit more about TensorFlow, I am considering writing a book.
All I've done so far is made a cover.
(audience laughs) It's a good cover, but if you go to this URL and you sign up.
And if you sign up with email.
If I get 100 people to sign up, I'm going to do a kickstarter or something. See if people are interested.
So yeah, if you want to do that, you can do that.
And if you want to learn a little bit more about perhaps how to use one of the API solutions. It's kind of a nice introduction.
You actually create your own mojifier.
I've got like a whole tutorial.
If you click on that link, it'll take you through everything step by step.
To build your own.
I see people taking pictures, that's nice.
(mumbles) And yeah that's it, if you want to follow me you can follow me on @Jawache.
And yeah, thank you very much for your time. (audience applauds) (upbeat music)