Voice User Interface DesignThis book is a comprehensive and authoritative guide to voice user interface (VUI) design. The VUI is perhaps the most critical factor in the success of any automated speech recognition (ASR) system, determining whether the user experience will be satisfying or frustrating, or even whether the customer will remain one. This book describes a practical methodology for creating an effective VUI design. The methodology is scientifically based on principles in linguistics, psychology, and language technology, and is illustrated here by examples drawn from the authors' work at Nuance Communications, the market leader in ASR development and deployment. The book begins with an overview of VUI design issues and a description of the technology. The authors then introduce the major phases of their methodology. They first show how to specify requirements and make high-level design decisions during the definition phase. They next cover, in great detail, the design phase, with clear explanations and demonstrations of each design principle and its real-world applications. Finally, they examine problems unique to VUI design in system development, testing, and tuning. Key principles are illustrated with a running sample application. A companion Web site provides audio clips for each example: www.VUIDesign.org The cover photograph depicts the first ASR system, Radio Rex: a toy dog who sits in his house until the sound of his name calls him out. Produced in 1911, Rex was among the few commercial successes in earlier days of speech recognition. Voice User Interface Design reveals the design principles and practices that produce commercial success in an era when effective ASRs are not toys but competitive necessities. |
Contents
Introduction to Voice User Interfaces | 3 |
11 What Is a Voice User Interface? | 5 |
111 Auditory Interfaces | 6 |
112 Spoken Language Interfaces | 7 |
12 Why Speech? | 9 |
13 Where Do We Go from Here? | 12 |
Overview of Spoken Language Technology | 15 |
21 Architecture of a Spoken Language System | 16 |
1045 Romans Perspire AngloSaxons Sweat | 160 |
105 Register and Consistency | 161 |
106 Jargon | 164 |
107 The Cooperative Principle | 166 |
108 Conclusion | 168 |
Planning Prosody | 171 |
111 What Is Prosody? | 172 |
112 Functions of Prosody | 173 |
212 Recognition | 19 |
213 Other Speech Technologies | 24 |
22 The Impact of Speech Technology on Design Decisions | 26 |
221 Performance Challenges | 27 |
222 Problem Solving | 28 |
223 Definition Files | 29 |
23 Conclusion | 31 |
Overview of the Methodology | 33 |
311 User Input | 34 |
312 Integrated Business and User Needs | 35 |
314 Conversational Design | 36 |
315 Context | 37 |
32 Steps of the Methodology | 38 |
322 HighLevel Design | 39 |
325 Testing | 40 |
33 Applying the Methodology to RealWorld Applications | 41 |
332 Dealing with RealWorld Budget and Time Constraints | 42 |
Requirements and HighLevel Design Methodology | 45 |
411 Understanding the Business | 46 |
412 Understanding the Users | 48 |
413 Understanding the Application | 53 |
42 HighLevel Design | 55 |
422 Dialog Strategy and Grammar Type | 56 |
425 Metaphor | 57 |
427 Nonverbal Audio | 58 |
43 Conclusion | 61 |
HighLevel Design Elements | 63 |
52 Pervasive Dialog Elements | 67 |
522 Universals | 72 |
53 Conclusion | 73 |
Creating Persona by Design | 75 |
61 What Is Persona? | 77 |
62 Where Does Persona Come From? | 78 |
63 A Checklist for Persona Design | 79 |
632 Brand and Image | 80 |
634 Application | 81 |
65 Conclusion | 83 |
Sample Application Requirements and HighLevel Design | 85 |
71 Lexington Brokerage | 86 |
721 Understanding the Business Goals and Context | 87 |
722 Understanding the Caller | 89 |
723 Understanding the Application | 92 |
73 HighLevel Design | 95 |
733 Pervasive Dialog Elements | 96 |
734 Recurring Terminology | 97 |
736 Persona | 98 |
737 Nonverbal Audio | 100 |
Detailed Design Methodology | 103 |
81 Anatomy of a Dialog State | 104 |
82 Call Flow Design | 105 |
83 Prompt Design | 107 |
831 Conversational Design | 108 |
832 Auditory Design | 110 |
841 Formal Usability Testing | 111 |
842 Card Sorting | 116 |
86 Conclusion | 118 |
Minimizing Cognitive Load | 119 |
91 Conceptual Complexity | 120 |
911 Constancy | 121 |
912 Consistency | 123 |
913 Context Setting | 124 |
92 Memory Load | 125 |
921 Menu Size | 126 |
93 Attention | 129 |
94 Conclusion | 131 |
Designing Prompts | 133 |
101 Conversation as Discourse | 135 |
102 Cohesion | 137 |
1021 Pronouns and Time Adverbs | 138 |
1022 Discourse Markers | 139 |
103 Information Structure | 147 |
104 Spoken Versus Written English | 152 |
1041 Pointer Words | 153 |
1042 Contraction | 155 |
1043 Must and May | 156 |
1044 Will Versus Going To | 158 |
113 Stress | 175 |
114 Intonation | 180 |
1142 Contours in Context | 183 |
115 Concatenating Phone Numbers | 189 |
1152 Concatenation DigitbyDigit | 190 |
1153 Concatenation by Groups | 191 |
116 Minimizing Concatenation Splices | 192 |
117 Pauses | 196 |
118 TTS Guidelines | 199 |
1181 Analyze Application Usage | 201 |
1184 Make Content Easy to Understand | 202 |
1185 Use Appropriate Formats | 203 |
119 Conclusion | 204 |
Maximizing Efficiency and Clarity | 205 |
121 Efficiency | 206 |
1211 Dont Lose Work | 207 |
1214 Use Caller Modeling to Save Steps | 208 |
122 Clarity | 209 |
1221 Mental Models for Natural Language Understanding | 210 |
1222 Navigational Clarity Through Landmarking | 211 |
123 Balancing Efficiency and Clarity | 212 |
1232 Taper Prompts | 213 |
124 Conclusion | 215 |
Optimizing Accuracy and Recovering from Errors | 217 |
131 Measuring Accuracy | 218 |
132 Dialog Design Guidelines for Maximizing Accuracy | 219 |
133 Recovering from Errors | 221 |
1332 Recovering from Rejects and Timeouts | 224 |
134 Conclusion | 228 |
Sample Application Detailed Design | 229 |
1411 The Login Subdialog | 231 |
1412 The Quotes Subdialog | 233 |
1413 The Trading Subdialog | 234 |
142 Prompt Design | 235 |
143 User Testing | 237 |
144 Conclusion | 241 |
Development Testing and Tuning Methodology | 245 |
1511 Application Development | 246 |
1513 Audio Production | 247 |
1522 Recognition Testing | 248 |
1523 Evaluative Usability Testing | 249 |
1531 Dialog Tuning | 251 |
1532 Recognition Tuning | 253 |
154 Conclusion | 256 |
Creating Grammars | 257 |
161 Grammar Development | 259 |
1612 Developing Grammars for Statistical Language Models | 263 |
1613 Developing Robust Natural Language Grammars | 264 |
1614 Developing Statistical Natural Language Grammars | 266 |
1621 Testing RuleBased Grammars | 267 |
1622 Testing Statistical Language Models | 270 |
163 Grammar Tuning | 271 |
1632 Tuning Statistical Language Models | 272 |
164 Conclusion | 273 |
Working with Voice Actors | 275 |
171 Scripting for Success | 276 |
1712 Scripting Tips | 278 |
172 Choosing Your Voice Actor | 289 |
1722 Coachability | 290 |
173 Running a Recording Session | 291 |
1732 Voice Coaching | 293 |
174 Conclusion | 295 |
Sample Application Development Testing and Tuning | 297 |
1812 Grammar Development | 298 |
1813 Audio Production | 300 |
182 Testing | 301 |
1821 Evaluative Usability Testing | 302 |
183 Tuning | 303 |
1832 Recognition Tuning | 306 |
1833 Grammar Tuning | 307 |
1834 User Survey | 308 |
Conclusion | 311 |
Appendix | 313 |
315 | |
325 | |
Other editions - View all
Voice User Interface Design Michael Harris Cohen,James P. Giangola,Jennifer Balogh Snippet view - 2004 |