Going to be looking at three applications:
The app takes an image, detects faces, detects the emotion of the faces, then applies appropriate emoji over the faces. We can finally answer the age old question about whether the Mona Lisa is smiling or not!
— Meggan Turner @ Web Directions ✨ (@megganeturner) June 20, 2019
But how does it calculate emotion on a face?
- detect facial features
- use a neural network
…so these are just incredibly easy. Right? Right. No?
Neural networks are based on biology. Neurons:
dendrites->body->axons …easy to code right?
We create a graph, with nodes and edges replicating the idea of signals going into a body and passing along an effect. Signals are weighted; an activation function is called to crunch the numbers; and it returns the result. Yes, a lot of it simple maths like multiplication. Forget the meme about
if statements! There are no
ifs! Choosing the right activation function and calculations can be tricky of course.
The inputs in a real scenario are the seed data. Facial recognition takes in a set of example face images which have been classified by humans. In an assisted learning process, you’ll initialise with random numbers; compare the initial outputs with human-determined scores; and use back propagation to tune the numbers. Once the outputs match what a human would apply, it’s been tuned.
There is already a library of human-rated emotion images. There is in fact a commoditisation of machine learning problems. Azure provides the
Face API, which takes an image and returns JSON with face attributes including emotion. That’s what Asim used to create TheMojifier.
Neural networks are incredibly powerful, but simple to understand. Don’t assume you have to build everything at this point, do some searching to see what’s already available.
Tensorflow, Mobilenet & I’m fine
TensorFlow announced TensorFlow.js in 2018 and it really is TensorFlow written in JS (there is a version you can load off a CDN). This lets you train models, or load pre-trained models right in the browser. It’s a great way to get started.
So you can set up image recognition in four lines of JS. That’s pretty incredible.
Azure provides an image description tool, Computer Vision …which Sarah Drasner used to create an automatic ALT text demo. Unfortunately the accuracy wasn’t fantastic, which the internet in its usual way was kind enough to very politely and respectfully point out.
TensorFlow.js doesn’t have any dependencies; MobileNet is a simple way to analyse images; give it a go. For bigger applications you’ll want something with an API and a larger training pool backing it up.
It uses a Generative Adversarial Neural Network (GAN) to take one image and generate another.
This uses a generator to create an image and a discriminator to decide if the image is a real cat (against a library of cat pics) or a sketch-derived cat pic. The two fight it out with each other (hence adversarial) until the discriminator can no longer tell if the image it’s been given is real or generated. So over time as the two get better the result gets better.
It’s not just pictures though, you can generate video; generate multiple outputs; etc.
The input can be anything – not just outlines or segmented images. It can just be text describing an image.
How long until you just write “I want an ecommerce site, blue, using paypal…”?
GANs learn to generate new images, although they take a lot of compute to train.
land.asim.dev/tfjs – Asim is considering writing a book, sign up if you are interested.
Mojifier tutorial: aka.ms/mojifier