Speaking with Machines: Conversation as an Interaction Model

The past few years have been filled with chatbot experiments—some brilliant, many not—but the future has yet to be experienced. As artificial intelligence capabilities advance, conversation will become the next major interaction model, not just a messenger experience.

This talk will explain why conversation will play such a large role in the future, define how it will happen, and suggest how you can integrate conversation into your product roadmap.

Joe Toscano

Why is conversation the next step in human/computer interaction?

If you think of a short history of HCI, we started from the true primitive – binary code. We moved along to command line code, which was better but still not mainstream-accessible. It was GUIs and point-and-click that really made computers accessible to the average person. Next came touch and gesture on mobiles and tablets.

The next step is conversation and voice. It’s the first time in computing history that the machine is learning our language, instead of the other way around. Of course that means we need to really understand language.

  • Starting with a letter: a letter is a symbol that has a defined meaning, based on the collective agreement of a language community. Once a human learns a letter, we can recognise that pattern even if the typeface changes.
  • Words: a group of letters that the majority of people in a language community agree how to pronounce.
  • Sentences: combinations of words… this is where machines really struggle, because sentences have phrasing, context, slang, all kinds of nuance that computers struggle to understand.

There are 46.2 quadrillion ways to create a five-word sentence.

Personifying conversation is another challenge. Bots have personalities – even choosing for it to have ‘no personality’ is a choice and a form of personality. Humans will attribute personality to everything around us. It’s innate.

Defining voice is the first step of defining personality. You can use pitch, range, volume, speed in audio; in text and audio you have word choice, content choice, punctuation and grammar.

We can define personalities and know where people are from based on words they use – the choice of words between coke, pop and soda will tell you roughly where an American grew up. Women are more polite than men.

Character development comes next. Look to popular culture – radio and podcast hosts define a personality with nothing more than voice/sound. TV does it too.

Just as TV set cultural norms, conversational interfaces will play a huge role in the way children grow up. Kids are growing up to be more demanding. Could there be a link to using conversational interfaces?

Batman is a great character. Back story, quotes, voice… all highly recognisable.

Designing conversations…

  1. initiate
  2. listen
  3. respond

Initiating isn’t too hard at the moment – ‘hey siri’, ‘hey google’ – but it will get trickier, where the device will just be listening and you’ll speak to it in context.

One odd impact of voice is that character-based names don’t work. Brands which use tricky spelling don’t work when they’re read out, they just sound like ‘tumbler’ and ‘net flicks’.

Another problem is you may need a lot of information to do something like book a flight. People don’t speak the way the query needs to be constructed. In reality you have to go back and forward to clarify things, to get details and prompt for more information.

Responding has to be considered as well: quantity, relevance and clarity. Don’t give us a five minute monologue.