Strike a pose – Gesture recognition in JavaScript with Machine Learning & Arduino

Most of our interactions with technology aren’t really intuitive. We’ve had to adapt to it by learning to type, swipe, execute specific voice commands, etc… but what if we could train technology to adapt to us?

Programming for hardware in JavaScript has already been made accessible with frameworks like Johnny-five, but, combining it with machine learning, we have the opportunity to create new and smarter interactions.

In this presentation, I will talk about how to build a simple gesture recognition system using JavaScript, Arduino and Machine learning.

The goal is to play Street Fighter using nothing but body movements. Why? Why is not the point!


  • arduino with wifi
  • IMU hardware sensor linked up to the arduino, providing accelerometer and gyro
  • button to differentiate gameplay from other movement (there are other ways to do this but this was simple)
  • JavaScript: tensorflow.js, johnny five js library, nodejs server

Step 1: collect the data. What you get back is raw data, six numbers on every line.

Step 2: process the data with tensorflow and train it with lots of samples. We label the events with numbers, to suit tensorflow – 0, 1, 2 are the punch, uppercut and hadoken. The data is mapped into ‘features’ in tensorflow terms. Then you map them into Tensors. You divide up your data into test and training data.

Step 3: create a machine learning model. This step is still more art than science, you try different models and see what works for you. The literal output is a data file (JSON).

Step 4: predict. This is where you start feeding in new data and see if it gets the answers right. For the game, this means we take in new data and predict which move was made.

For the demo Charlie has switched hardware to use a Google Daydream remote, as it produces the same data. But that just changes the data source, not the rest of the code.

Live demo! It did work despite tempting the demo gods!

Once you know how to record and detect gestures, you can record anything.

Next demo: a Harry Potter spell casting app. (lots of wireless interference in the room so it didn’t work). The point is that the code is the same, you just feed it different gestures.

What else can you do with this? Well the inputs are the same as your phone! So whatever you can do with these sensors you will be able to do with phones when those APIs become mainstream. Or you might have any one of a range of other hardware with gyro and accelerometer.

The key steps to understand:

  • record data – literally getting the data from a source
  • process the data – preparing it so it can be used
  • splitting – cut the data down to what your algorithm needs
  • training – to improve accuracy of predictions
  • predicting – able to use live data for new predictions

Also it doesn’t matter that this isn’t a practical example, the point is to learn how to use hardware and machine learning. If we don’t learn fun new things we’ll never be able to do cool new stuff!