Voice First, Screen Second: Designing for Voice in Adobe XD

Prototypr Editors
Prototypr
Published in
18 min readDec 5, 2018

--

Ever since the first iPhone came about, the screen has been the primary medium for which we have been designing, and also the main way we interact with digital media. Apart from our screens gradually increasing in size over the last few years, we have also seen a gradual shift away from it, towards the use of different types of immersive interfaces for the consumption of media. One to take notice of, in particular, is the voice user interface (VUI). In just 1 year from now, Gartner predicts about 30% of all searches will be done through new audio-centric technologies, without the use of a screen.

As voice recognition technology is improving, it’s becoming clear that the screen is no longer going to be the primary medium for which we are designing for, as it’s no longer the primary means for users interact with products and services. Or as Jason Amunwa puts it in the UX of voice:

“Voice interaction represents the biggest UX challenge since the birth of the smartphone.”

Comparable to the adoption of mobile-first design during the rise of the mobile/tablet, we’re beginning to see a transition from screen-first to voice-first design as VUIs become an accepted way to interact with technology.

Whilst the area is still taking shape, it hasn’t been too easily accessible for creators to experiment with VUIs during the design process. Recently though, Adobe XD released a brand new voice prototyping feature to become the first design platform to enable voice prototyping. In this article, we’ll look at the shift towards voice-first design, best practices for prototyping for voice, and how to do it leveraging the features available in XD.

Contents

[Part 1] 🤖The Transition from Screen-First to Voice-First
1.1: The Adoption Rate of Voice UIs
1.2: 3 Methods of Voice Interaction
[Part 2] 🗣️Designing for Voice
2.1: The role of the screen in voice-first design
2.2: Display Cards and the Alexa Presentation Language
2.3: Guidelines: When Voice UIs are useful
2.4: Methodologies for Voice Design
[Part 3] 📻Building Voice Prototypes in XD for Amazon Alexa
3.1: Working on Dialog Flows
3.2: Adding Visuals to Voice-first Prototypes in XD
[Part 4] 🔬Conclusion: Testing Prototypes and More

Part 1.

The Transition from screen-first to voice-first

Over the last few years, voice interfaces have emerged on a range of devices. We first saw them creep onto our mobiles and tablets through voice commands in our apps (e.g. “Hey, Siri”), and more recently we saw voice give rise to new a device altogether: the smart speaker. Underneath the interface though, the products and services offered don’t change too much — voice can be seen as another means to interact, solving the same underlying issues as other user interfaces.

Image adapted from Bert Brautigam: The New Skeuomorphism is in Your Voice Assistant

There’s no doubt that voice applications are far from perfect just yet though, with Cooper reporting the following user retention issues for Alexa Skills and chatbots:

  • Alexa Skill user retention is just 3% after 2 weeks (compared to 10–11% for a typical mobile apps)
  • 40% of users abandon chatbots after a single interaction.

That may make you wonder why XD would add a voice prototyping feature to their product early on. However, looking at the number of voice-enabled devices out there, it becomes clear that we need a way to design for voice interfaces today:

The Adoption Rate of Voice UIs

Voice design agency, Rain, say there are a range of voice interaction devices available, but the actual usage of voice UIs across each device differs:

People who own the device, and used the Voice Assistant:

Image adapted from https://rain.agency

Everything from smart speakers, phones, cars, computers, TVs, and watches all have some form of voice UI available. However, as seen in the chart above, whilst voice features are available on all these devices, the smart speaker is by far the largest catalyst in the adoption of the technology. 93.3% of smart-speaker owners actually use the voice features.

To take this further, looking at the adoption rate of the smart speaker is an even bigger eye-opener:

Image adapted from Active Forecast

The blue curve on the left shows that 50% of the US population adopted smart speakers within 5 years since their commercial introduction — that’s approximately 3–4 times faster than the adoption rate of the radio, Internet, and computer. For a more concrete example, Rain had also worked with Headspace to create a voice-first app, and their results are also impressive:

“In just 2 months, Headspace reached the same number of monthly active users via voice as it took to amass on mobile over 40 months”

With those results, it becomes evident that the poor retention rates of VUIs shown in the previous section could well be down to the design of our voice apps. For instance, voice technologist, Kaeya, reported a large retention rate of 21% as opposed to the 3% average retention reported by Cooper.

With this, and the adoption rate of VUIs, it validates the reason Adobe has added voice prototyping to XD. In all, we could say that voice is starting to hit its mainstream adoption, as seen here:

Adapted from Stephen Gay

3 Methods of Voice Interaction

As Intercom state, voice isn’t a new UI paradigm or its own platform, but a new interface to design and deliver for. Whilst voice brings a new way to interact with products, it needn’t be used in isolation. For example, voice interfaces are often combined with screens in various ways. The Nielson Norman Group highlight 3 of these as:

  1. 📱Screen-first Interaction: Here, we start with an application designed primarily for screen, and voice controls are added afterwards to enhance the experience.
  2. 🗣 Voice-only Interaction: Here there is no screen at all, and input and output is based on sound, such as a smart speaker.
  3. 💬Voice-first Interaction: This is where an app designed primarily for voice is enhanced through the addition of a screen to output information.

Each approach combines screen and voice UIs in different ways. Here are examples of each, and the times at which they emerged:

All of these interaction methods bring their own benefits and limitations. The NN Group also explain that because screen and voice are two very different interaction models, the integration of the two presents design challenges that have prevented us from realising the full benefits of doing so. They go on to highlight that:

  • Voice is most efficient for inputting information.
  • Screens are most efficient for outputting information.

The most effective approach to combine voice and screen so far appears to be the voice-first approach taken by Amazon’s smart speakers: the Echo Show and Echo Spot. This is the approach we’ll take for prototyping, and explore it next:

Part 2.

Designing for Voice

As outlined in the previous section, voice is fairly limited when it comes to outputting information. Therefore, screens can be used to augment a voice-first experience. Here’s how the Echo Show and Echo Spot have been designed to enable this:

The role of the screen in voice-first design

The Echo Show and Echo Spot are essentially smart speakers that also have displays. The primary user interface remains to be voice, and the screen is used to enhance the limited output of voice UIs to create a more immersive experience.

It’s important to note that the smart speaker screen only displays content to complement voice responses from Alexa, so all interactions remain lead by voice commands. If the screen was to be removed, the use of the application would not be inhibited. As Amazon put it:

“Although the display component may enhance the user experience considerably, voice continues to be the primary interaction method with Alexa.”

Display Cards and the Alexa Presentation Language

Similar to limiting elements to the most essential in a mobile-first approach, the UI elements and interactions are limited when it comes to designing voice-first. In fact, Amazon have created templates and ‘Display Cards’ that limit the displays to the following elements:

  • Body templates: display content including text and images
  • List templates: display content as horizontal or vertical lists
  • AudioPlayer or Video: appears when the skill plays audio/videos
  • Touch selection events: List items and action links may be activated by touch if the skill is coded to support that

To get a better idea, here’s an example ‘Body template’ filled out in response to the voice command “Who is Usain Bolt?”:

“Alexa, who is Usain Bolt?”

As well as these Display Cards, Amazon provides it’s own Alexa Presentation Language, which is also available as a UI Kit for Adobe XD:

Download the Adobe XD Alexa UI Kit

However, before jumping into that XD prototype, it’s useful to consider when it’s actually necessary to design for voice and some best practices to do so:

Guidelines: When Voice UIs are useful

Have a compelling reason to use voice…The solution should present a compelling reason to use voice over existing habits (i.e. touch, click, etc.) ~ Voysis

Despite the popularity of Voice UIs, it doesn’t always make sense to design voice-first, or even to for voice at all. We need to consider whether a Voice UI would actually be more useful than a Visual UI. Will it solve problems in a better way for the user?

Before designing a VUI, it’s therefore worth checking out existing guidelines available. For example, Microsoft’s design guidelines for Cortana encourage us to ask questions such as:

Does the design reduce the number of steps needed to solve a user’s problem?

Will the design solve the user’s problem better, easier, or faster than any of the alternative experiences?

Is the design intuitive?

Does the design avoid complexity?

If the answers are positive, then it’s a useful time to adopt a voice interface. Microsoft go on to provide some excellent suggestions for times in which Voice UIs would benefit users:

When the user’s hands are busy

  • 🚗 Scenario: When a user is driving.
    🎙 Example Question: “What is the fastest route to the petrol station”.
  • 🍳 e.g. During a user’s cooking session.
    🎙 Example Question: “Was that 2 eggs or 3?”

When using voice is an easier way to interact

  • 🛌 Scenario: If a user is in bed.
    🎙 Example Question: “Turn the music and lights off”.

When using voice is more efficient

  • 🎵Scenario: Playing the most popular Oasis song.
    🎙Example Question: “Play the most popular song from Oasis”.

More principles and guidelines can be found in this excellent resource from Clearleft:

Now we’ve gone over how voice can work well with screens, let’s have a look some methodologies and frameworks that can help us get started with voice design:

Methodologies for Voice Design

When it comes down to actually making a voice app, the helpful guidelines from the likes of Amazon, Google, and Microsoft are a great place to start. Each provide us with design guidelines for creating voice-first products:

As highlighted by VUI designer, Pavel Gvay, each approach consists of similar steps such as identifying users, scripting dialogues and iterating on designs. For a more general approach, you can consider Pavel’s own outline:

  1. Identify people from the audience
  2. Filter scripts
  3. Create a character
  4. Write examples dialogues
  5. Build a dialogue tree

In this outline, it’s notable that scripted chats and sample dialogues play a key part in shaping the conversational experience. They help us take into account the many variants that can occur in the course of a conversation, thus helping us outline the different paths a user could take to achieve their goal within our voice app. Lyndon Cerejo provides another 5 step process to designing for voice, which also ends with the creation of a dialogue tree.

Therefore, if you’re creating your own prototype whilst following this, it’s well worth working on your voice assistant’s conversational flow before moving onto higher fidelity prototypes. Here’s another useful resource on designing those chat dialogues:

As it happens, Adobe XD is also a useful tool in designing your conversational flows as we’ll see in the next section where we’ll build a voice prototype in XD for Amazon Alexa.

Part 3.

Building Voice Prototypes in XD for Amazon Alexa

Let’s see how easy it is to make a voice prototype in XD. Once you have explored a dialog flow (or during your exploration), it’s a good time to play bring it into XD.

Working on Dialog Flows

A dialog flow (or dialog map) is a visualisation of the overall conversation between a user and machine. It maps out user inputs and robot responses, enabling us to design the conversational experience. There are lots of mapping tools that can help do this, but XD itself is also useful here.

You might start by bringing your dialog flow into XD by adding plain text to artboards to indicate user input and Alexa’s responses, and connecting the artboards to show the conversational flow. Here is a rough one created in XD for a fictional music recommendation app as an example:

Music Recommendation App

A simplified script for this dialogue can look something like this:

Example script: Music recommendation app🙍[User] Alexa, recommend me some music
🤖[Alexa] Okay, what activity are you doing? I'll suggest music.
🙍[User] I'm going running.
🤖[Alexa] Last time you went running, you listened to 'Call on Me' by Eric Prydz, and 80s dance music. Which do you prefer?
🙍[User] 80s dance music, please.
🤖[Alexa] Now playing 80s dance music. Enjoy.

Using Artboards for Flows

Now, to create a dialogue flow of this conversation (including alternate user responses), we can use XD’s artboards in the following way:

- 🙍User Input can be added on the bottom half of each artboard (in grey text)- 🤖Alexa's response is added on the top half (blue text)

Using XD’s Prototyping Mode, arrows between elements and artboards can then be created, showing the many possible conversational avenues:

For example, where the user may have a choice of responses, a separate connection can be created between that input and the next response. The conversation flow therefore changes depending on what the user says. This is shown on the left, where the user has 3 options as indicated inside the pink box.

🗣️ Adding Voice Triggers

Furthermore, we can add voices to our flows in XD’s Prototyping Mode, where voice triggers can now be used instead of a regular click or drag. This can help us test voice commands early on. Here’s how to create a voice trigger:

🙍Creating Voice Triggers1. Add the trigger: To add a voice trigger, drag a connector from an element in one artboard to another (see image below).2. Add the voice command: Next, from the dialog that opens, select 'Voice' from the trigger menu, and add some text in the command field. This is the command the user must say to trigger XD to move to the next screen in play mode:

As seen above, the text in our artboards can simply be pasted in as voice commands, making it available for testing, whilst also remaining a visible element of our overall dialogue flow.

In addition to adding voice triggers for user input, you can also make use of ‘Speech Playback’ to simulate Alexa’s response:

🤖 Adding Alexa's Response1. Use Time Trigger: Simply select the main anchor of the second artboard and choose 'Time' as the Trigger.2. Set up Speech Playback: Set the action to 'Speech Playback', and enter the text Alexa is going to say (e.g. paste in the blue text).

The result is a very simple voice prototype. Here’s a video:

You can download this example dialogue flow here:

For more information on how to create voice triggers and set up speech playback, check out this video.

From doing this, we can already see that XD can be used to test conversational flows without any visuals, simply by using commands and responses. This is also useful when it comes to creating voice-only prototypes for smart speakers without screens (such as Google Home), as illustrated by Susse Sønderby:

Susse even goes a step further by adding images of the actual Google Home smart speaker, using XD’s Auto Animate feature to simulate the device’s lights.

Adding Visuals to Voice-first Prototypes in XD

As seen throughout this article, combining voice and screen is an important area for designers to be able to explore, and Amazon’s Alexa Echo Show and Echo Spot are the most prevalent use cases today. Accordingly (and as mentioned in part 2), XD provides us with a VUI Kit for those devices, made in line with Alexa Presentation Language (APL):

“This VUI kit empowers designers by giving them the best starting point when venturing into voice-enabled design in XD, because the kit includes APL.”

You can download it here. Devices include:

  • Echo Spot
  • Echo Show
  • Fire TV

In the previous section, we created a quick voice prototype from a simple dialog for a music playlist recommender app. Here’s how you might make use of the Echo Show screen to enhance that voice-only experience using the Alexa VUI Kit:

Using the VUI Kit

As well as providing all the building blocks you need to get started with the available devices, there’s a useful example prototype at the bottom of the VUI Kit to help you get started. You can press play to try the example out for yourself, and also dig into the design to get familiar with how the voice triggers have been added:

To quickly move from the voice-only dialog prototype created in the previous section, it’s possible to copy and paste template artboards from the VUI Kit into your own project and recreate your flow with extra visuals for the screen. For example, here is the before and after of the music recommendation prototype:

Left: Voice-only dialogue, Right: Voice-first prototype with screen

Part 4.

Conclusion: Testing Prototypes and More

In all, we’ve now got an overview as to why voice prototyping is important, how voice interfaces and screens can work together, and also a great way to start designing for voice with Adobe XD.

Despite highlighting useful methodologies and best practices, there are still some areas that we have not covered such as testing prototypes early with users, and how to choose appropriate visual elements when it comes to adding a visual user interface to a voice-first prototype. To conclude, here are further resources to improve your process:

Testing with Wizard of Oz

A great way to test voice prototypes, as suggested to Ben Sauer, is to use the ‘Wizard of Oz’ testing method, which he highlights as having a renewed importance with the rise of voice interfaces.

Wizard of Oz testing involves a user interacting with what they think is a real voice assistant, when really a human ‘Wizard’ is hidden away behind the scenes, typing responses back to the user. It’s especially useful early on to shape conversational flows. Here are some resources to get you started with this technique:

Selecting Visual Elements

Visual elements were the last thing to be considered in this article, and also had the least focus since perfecting the dialogue flow is a primary concern for voice UIs as indicated by Microsoft:

“Visual elements should not be required to communicate the intent, they should only support the intent.”

With that, to help you decide better on which elements to include in your voice prototype, check out these excellent blog posts by Paul Cutsinger which contains some tips on selecting visual elements for voice-first prototypes:

It’s also worth checking out Microsoft’s guidelines, and the following article in Smashing Magazine that talks about combining GUIs and VUIs:

Overall, when it comes to combining a voice UI with a visual UI, the voice-first approach taken throughout is definitely beneficial, but at the same time we must consider how much the functionality of the screen actually benefits users, as opposed to frustrating them. The NN Group state:

Deliberately handicapping the functionality of a screen in the name of ‘pure’ voice interaction unnecessarily limits the usefulness of the device and increases users’ cognitive load and frustration.

Whilst voice is becoming mainstream, it’s also clear that best practices and devices for combining voice and screen are still improving along with new devices that emerge each year. This actually makes it a great time to experiment and find out what works best through tools like Adobe XD.

Words by Graeme | Illustrations by María ✏️ | Follow us for more 🙂

Build with us

If you’re working on a voice project of your own, feel free to drop a word in the Projects category of the Prototypr Makerspace Community to get feedback on your ideas.

Related articles:

--

--