What is Voice User Interface (VUI)?



We are all familiar with User Interfaces (UI). All computer programs, pieces of software, and operating systems are designed to be as easy to use as possible for users. Good UI is crucial to creating a good product, that people will want to use again. On the other hand if there are systemic problems with some element of UI, whether it be the user’s flow as they use it, or even just how the information is presented, it will be a serious impediment to how users feel about something. This is known as the User Experience (UX). The prevalence of voice activated software in new tech has grown massively in popularity in recent years, although actual usage has lagged behind. Designers are working hard to improve the UX of virtual assistants, and optimising UI to accurately respond to natural language is vital. This is known as Voice User Interface (VUI).  

 

VUI - Voice User Interface

  Think about any voice activated program you have used before. Did you manage to successfully resolve your problem? Did you feel frustrated at some aspect? Would you use it again? Good VUI uses the same principles that all UI uses. Users want to have a problem solved, whether it be a query, a function, or a complaint. They expect any UI system to give them relevant content in a timely manner, so they can resolve the problem as quickly as possible. With that in mind the role of a VUI designer is to recognise the commands that already exist to move from one element to another within the structure of the program or system, and then to consider how people might articulate that request. Voice activated systems have a homepage, where users can navigate to and from. Because there is no Graphical User Interface designers have to be able to accommodate a wide range of commands and queries, all of which may vary in length, complexity, dialect, and word choice. The designer must look at what commands currently exist, for example ‘Navigate from homepage to account settings’ and think about all the ways that someone might say this: ‘change password’, ‘how do I change my name?’, ‘Account settings’.  

 

Linguistic theory behind successful VUI

  To fully understand some of the challenges that VUI designers face you have to understand some of the philosophy of conversation. G.P. Grice, one of the intellectual powerhouses of British linguistic philosophy has played a formative role in voice recognition theory, and consequently on VUI itself. He proposed a theory of conversation known as the ‘Cooperation Principle’. This theory states that any two people using language are agreeing to cooperate with each other, whether they are having an argument or working together to solve a problem. We are conversationally in agreement, whether or not we are socially in agreement. If the two participants do not cooperate they are unable to communicate. He identified four maxims of cooperation, which are ‘rules’, in the sense that are descriptive of conversation, rather than governing the interaction.  

  1. Maxim of quantity - Any information that is given is as brief as possible. When you ask someone whether they want tea they don’t give you an exhaustive list of all the teas that they do not like.
  2. Maxim of quality - Information is true (or not knowingly false). We expect the people we are talking to to tell the truth.
  3. Maxim of relation - Conversation follows a continuous stream of related material. If I ask you if you want to eat lunch with me you do not reply with the temperature on this day 50 years ago.
  4. Maxim of manner - Information is as brief, clear, and ordered as possible. When you read an informational book you expect the contents to be at the beginning, before the introduction, rather than on page 98.

  Grice calls these maxims and not rules because we do ‘break’ them, on a fairly regular basis. One way that we might break these maxims is by ‘covertly’ breaking them, for example we might violate the maxim of quality by lying. This is still a form of cooperation, because the intent is to convey a given meaning, but it is socially antagonistic. More importantly, at least when it comes to VUI, we can also break maxims by ‘flouting’ them. Think back to the maxim of quantity. Your friend might be telling you all about the tea she doesn’t like because she knows that when you go to the kitchen you will make a ginger tea, something that she hates, and she would prefer to have a different tea from your wide variety. This is a way of adding extra information that is not fully explained by simply saying ‘yes’ or ‘no’ to the original question, but is not strictly a relevant answer to the question. The important part is that it is obvious to both sides of the conversation that a maxim has been broken. Similarly the principle of quality can be broken in jokes, irony, and sarcasm. Given that these are all parts of conversation that we would probably want a good bot to respond to, it is important that VUI designers are able to properly incorporate these elements into a bots lexicon. The risk otherwise is that they might end up creating a product that ultimately feels unresponsive and limited, creating a frustrating UX, and leaving them wishing they had been able to talk to a person instead.  

 

Examples of Gricean Implicature and ‘Flouting’ Conversation

  Perhaps one of the most tricky cases where we break a maxim is when there is a pre-established convention. While a response about the weather 50 years ago is not relevant to my question ‘Do you want to have lunch together?’ according to the maxim of relation, you might equally reply ‘I’ve already eaten.’ This is not a ‘standard’ relevant response to my question, because it does not appropriately answer my question. It is a flagrant violation, because we both understand that the implication of the statement is that given that you have already eaten, and infer you do not want to eat again. (Of course you could be breaking the maxim covertly too, and fully intend on eating a second lunch with Jan from accounts).  

 

Ludwig Wittgenstein wrote that 'if a lion could speak we could not understand him.' This slightly cryptic claim strikes at the heart of what makes VUI so difficult.

 

This demonstrates the great difficulty of VUI. Users can and will say practically anything, and expect to get information this way. They also expect to be able to have as natural conversation as possible. Another philosopher of language, Ludwig Wittgenstein, wrote that ‘if a lion could speak we could not understand him.' This slightly cryptic claim strikes at the heart of what makes VUI so difficult. We have enough trouble articulating our own thoughts into coherent sentences that are generally intelligible. Making sense of a non-human is a mammoth task, and having a poor VUI will lead to a poor user experience.  

 

Voice Activated Design Principles

  VUI developers are in charge of making sure anyone using the product is not frustrated by how it operates. Those frustrating voice activated phone lines that we all hate using are frustrating because they do not follow some basic principles of design, leading to a poor user experience.  

  • Following the maxim of quantity, responses should be as concise as possible. This means that just reading out a long list of services, followed by numbers you can press to activate them will not be very effective. This is especially true if users are unable to interrupt the menu if they hear the option they want. Nothing is more likely to frustrate a user than having to listen to a long pre-recorded menu of options that they will not need.
  • The level of formality used is also very important. In speech we tend to make far more ready use of contractions like ‘don’t’ and ‘won’t’, that might not be considered appropriate for written text. VUI designers must select the best words to use for speech, rather than for writing, and this takes skill and care. This of course, can go too far. If the actor you have chosen to voice your chatbot sounds a bit too cheery when directing people on how to report illegal fly-tipping in their area, users might feel frustrated. In UX, it is very important to make sure that the tone and formality levels work in conjunction with the other linguistic elements that you want to control to create a natural sounding bot that people feel comfortable communicating with.
  • AI innovations offer an exciting step forward in VUI design. Many chatbots are programmed to be able to understand someone asking for repetition, but few are able to cope with requests for clarification. This requires a great deal more from the bot: it must be able to understand the sentiment of the speaker, find a more appropriate way of asking the same thing, and prepare to try different ways of compiling a question.

 

Alex is a customer looking to find out the opening times of a shop. He rings up and is directed to a bot. ‘When are you open?’ However there are multiple branches of this shop, so the bot asks him ‘which shop would you like to hear about?’. Alex does not know this, so he says ‘I don’t know what you mean.’ A well optimised bot might reformulate the question to ask ‘which branch would you like to know about?’, or even ‘where are you based. I’ll tell you the closest branch.’ If it was using AI it would then be able to try out different questions to the same query from multiple people and work out which initial question was most likely to be understood by customers. Don Norman, a cognitive scientist who worked for Apple and was behind many of the Macintosh-era innovations that are now standard practices in UI design, says that people tend to treat both bots and computers as if they were people. Any virtual assistant or voice recognition software that uses speech recognition will need to have a human-like ability if we want it to be effective. Graphical user interfaces allow users to play around with functions and explore in their own way.    

 

People tend to treat both bots and computers as if they were people.

 

Of course VUI is more challenging because it is harder to incorporate extra or ‘hidden’ functions into the design. The only way that users can know what is available is to be told it, in one way or another. But as Grice has shown we rely on implicature in so much of our everyday language that it can be extremely challenging to give users what they want, not least because they might not know themselves. Conversational ambiguity and flouted intent are intelligible by humans because we are experts in understanding mixed messages. VUI designers must have many strings to their bow, being part linguist, social theorist, and of course designer. The most important thing to bear in mind is that the end user will always be the priority. If they are not having a good experience using your VUI then it is not well designed. Good customer service VUI is particularly challenging because the variety and complexity of queries. It requires the perfect balance between friendly and helpful and sincere concern if required. We Build Bots are currently developing this kind of sophisticated VUI, and have worked with companies and government departments to provide a cost-effective and responsive way of focusing on exceptional customer service.