It's time to talk.png

Voice and other Conversational Interfaces are the latest mainstream technical innovation that could have a major impact on your business. I therefore present you with 10 easy questions – which might be a bit tougher to answer – in order to come up with a sound Voice Strategy.

 

1. Where are you now? Getting your bearings

Voice interfaces are being adopted faster than any communication technology before it:

Smart speaker market penetration.jpg

Next to dedicated speakers – like the Amazon Echo and Google Home – they are in cars, mobile phones, earphones, appliances (think Smart TV’s), etc.

It’s estimated that there will be 8 billion digital voice assistants in use by 2023.

And as recent Microsoft research showed: Voice assistance is becoming the norm, as 72% of surveyed respondents reported using a digital assistant in the past 6 months.

 

2. Do you need a voice strategy? Or 20/20 hindsight

hindsight.png

Did you need an Internet Strategy in 1990?

Did you need a Search Strategy in 2000?

Did you need a Mobile Strategy in 2010?

Even if you don’t have a vision on the future of voice interfaces yourself, others do. So keeping tabs on developments in voice (and knowing what they are) is a strategy as well.

Keep an eye out for remarks and terminology like this:

Voice use quotes and terminology.jpeg
 

3. Do you have an open mind? Possible paradigm shift

Voice is not the “faster horse”, so don’t bluntly compare it with the previous interfaces. Take a look a this graphic by voice visionary Brian Roemmele:

Interface graph.jpg

Each era had its respective winners – from IBM to Microsoft, Google and Apple. Who, and in what way, will dominate the Voice Interface era (which, according to above mentioned Brian Roemmele, might be the last interface) is still very much undecided.

 

4. Does your business interact with humans? Voice is the most natural

Our brains are evolutionary wired for voice. Voice is the human I/O.

Everything you type and read is the work product of a “silent voice” in your brain.

The brain processes are visualised in this graphic by again Brian Roemmele:

Brain processing.png

And ~100% of the information from this phonological loop/speech is retained for ~400 seconds and may be processed.

In comparison:

  • ~97% of the information of the visual cortex become exformation. This means it’s lost immediately!

  • ~75% of information of the auditory cortex become exformation. Again lost!

 

5. How does a user invoke your product/service? By asking!

Expect, in the near future, for a user to simply state an intent to the nearest available, trusted, voice assistent and have it fulfilled according to his/her previously uttered or learned preferences.

DILBERT © Scott Adams. Used By permission of ANDREWS MCMEEL SYNDICATION. All rights reserved.

DILBERT © Scott Adams. Used By permission of ANDREWS MCMEEL SYNDICATION. All rights reserved.

This may indeed seem awkward now, but in order to be or stay part of this flow you need to ‘deconstruct’ your users needs into intents. And learn what wordings (utterances) are used and in which context.

In the short term you will need to claim your voice search query and in the long run expect users to switch assistants and/or queries very reluctantly.

 

6. Is the interaction near- of far-field? They are different use cases

Near-field and far-field refers to the distance between the speaker/ear(s) and microphone/mouth(s):

From near-field (left) to far-field (right)

From near-field (left) to far-field (right)

So for near-field use cases you should consider users with Apple AirPods earphones with built-in ‘Hey Siri’ detection or the rumoured Amazon Echo Earbuds. Also a voice assistant in a car with a single occupant can be seen as near-field. The point is that the interaction can be considered confidential, because it can not be overheard.

In the case headphones/earphones you should also keep in mind the future possibility of gesture control (i.e. taps and head movements) for ‘silent’ user input or feedback.

The remaining far-field use cases are those where the interaction is out in the open, where the conversation may be overheard or is intentionally shared between different users.

 

7. How smart is your AI? Claim your domain knowledge

Artificial General Intelligence is still a long ways away, but that doesn’t mean you can’t master a specific domain already.

AI definitions.png

Start by defining a narrow domain and feed your artificial intelligence, meanwhile managing user expectations.

AI improves on iterations and variations, so start generating as many as possible.

 

8. Are you making preparations? Start ‘dogfooding’

In order to get your voice/chat service and the underlying AI to a minimal viable level you have to start feeding and testing your system. An internal version of your service (i.e. ‘eat your own dog food’) is essential.

Kayak App with ‘Add to Siri’ button which allows you to link a dedicated spoken intent like ‘My travel plans’ to the array of commands.

Kayak App with ‘Add to Siri’ button which allows you to link a dedicated spoken intent like ‘My travel plans’ to the array of commands.

The ideal place to start is with the consumer care agents (which also makes it an opportunity, not a cost).

Each customer inquiry and resolution is ‘free’ input for your system. And the agent can be the (temporary) controlling and mitigating interface between your fledgeling conversational system and the end-user.

You can already take advantage of some low-hanging fruit. For example by adding the “Add to Siri” button to your mobile App, so users can start to become familiar with the possibilities of voice control. For more information on the potential see this article on ‘Siri shortcuts’. 

 

9. Do you have screen/human follow-up in place? You must

It’s not voice only, but voice first. This means that in some situations the response to a spoken request needs to be visual, e.g. an overview, (long) list of options or an image on an app/watch/speaker screen. This is called multi-modal interaction.

One reason for this is that humans speak faster than they can type, but conversely we read (and visually scan/compare multiple data points) faster than we can hear. However, it should be purposeful use of the screen and not just fallback, because of limitations in the context awareness of the voice assistent.

Handover.jpg

Also, in many cases for now a human respondent is still more intelligent and therefore faster (with the bot doing mundane and preparatory tasks). So the handover between bot assistent 🤖 to human support 👨‍💻 needs to be seamless. 

See for example Audible, Amazon’s e-book company, that is Offering Live Customer Support Through Amazon Echo Devices.

 

10. Do you have the right expertise? Voice and Conversation Design Skills

Even though Voice is ‘yet’ another digital interface, the skills for designing one are quite different from those for web and mobile.

Mindmap.jpg

Especially the psychology and dynamics behind a good dialogue are new. Not to mention the NLP and AI components.

So if you have trouble formulating the right answers to the previous questions; get in touch via almar@virvie.com and we’ll schedule some time to talk!

Comment