Voice User Interface Design

Front Cover
Addison-Wesley Professional, 2004 - Computers - 336 pages

This book is a comprehensive and authoritative guide to voice user interface (VUI) design. The VUI is perhaps the most critical factor in the success of any automated speech recognition (ASR) system, determining whether the user experience will be satisfying or frustrating, or even whether the customer will remain one. This book describes a practical methodology for creating an effective VUI design. The methodology is scientifically based on principles in linguistics, psychology, and language technology, and is illustrated here by examples drawn from the authors' work at Nuance Communications, the market leader in ASR development and deployment.

The book begins with an overview of VUI design issues and a description of the technology. The authors then introduce the major phases of their methodology. They first show how to specify requirements and make high-level design decisions during the definition phase. They next cover, in great detail, the design phase, with clear explanations and demonstrations of each design principle and its real-world applications. Finally, they examine problems unique to VUI design in system development, testing, and tuning. Key principles are illustrated with a running sample application.

A companion Web site provides audio clips for each example: www.VUIDesign.org

The cover photograph depicts the first ASR system, Radio Rex: a toy dog who sits in his house until the sound of his name calls him out. Produced in 1911, Rex was among the few commercial successes in earlier days of speech recognition. Voice User Interface Design reveals the design principles and practices that produce commercial success in an era when effective ASRs are not toys but competitive necessities.



 

Contents

Introduction to Voice User Interfaces
3
11 What Is a Voice User Interface?
5
111 Auditory Interfaces
6
112 Spoken Language Interfaces
7
12 Why Speech?
9
13 Where Do We Go from Here?
12
Overview of Spoken Language Technology
15
21 Architecture of a Spoken Language System
16
1045 Romans Perspire AngloSaxons Sweat
160
105 Register and Consistency
161
106 Jargon
164
107 The Cooperative Principle
166
108 Conclusion
168
Planning Prosody
171
111 What Is Prosody?
172
112 Functions of Prosody
173

212 Recognition
19
213 Other Speech Technologies
24
22 The Impact of Speech Technology on Design Decisions
26
221 Performance Challenges
27
222 Problem Solving
28
223 Definition Files
29
23 Conclusion
31
Overview of the Methodology
33
311 User Input
34
312 Integrated Business and User Needs
35
314 Conversational Design
36
315 Context
37
32 Steps of the Methodology
38
322 HighLevel Design
39
325 Testing
40
33 Applying the Methodology to RealWorld Applications
41
332 Dealing with RealWorld Budget and Time Constraints
42
Requirements and HighLevel Design Methodology
45
411 Understanding the Business
46
412 Understanding the Users
48
413 Understanding the Application
53
42 HighLevel Design
55
422 Dialog Strategy and Grammar Type
56
425 Metaphor
57
427 Nonverbal Audio
58
43 Conclusion
61
HighLevel Design Elements
63
52 Pervasive Dialog Elements
67
522 Universals
72
53 Conclusion
73
Creating Persona by Design
75
61 What Is Persona?
77
62 Where Does Persona Come From?
78
63 A Checklist for Persona Design
79
632 Brand and Image
80
634 Application
81
65 Conclusion
83
Sample Application Requirements and HighLevel Design
85
71 Lexington Brokerage
86
721 Understanding the Business Goals and Context
87
722 Understanding the Caller
89
723 Understanding the Application
92
73 HighLevel Design
95
733 Pervasive Dialog Elements
96
734 Recurring Terminology
97
736 Persona
98
737 Nonverbal Audio
100
Detailed Design Methodology
103
81 Anatomy of a Dialog State
104
82 Call Flow Design
105
83 Prompt Design
107
831 Conversational Design
108
832 Auditory Design
110
841 Formal Usability Testing
111
842 Card Sorting
116
86 Conclusion
118
Minimizing Cognitive Load
119
91 Conceptual Complexity
120
911 Constancy
121
912 Consistency
123
913 Context Setting
124
92 Memory Load
125
921 Menu Size
126
93 Attention
129
94 Conclusion
131
Designing Prompts
133
101 Conversation as Discourse
135
102 Cohesion
137
1021 Pronouns and Time Adverbs
138
1022 Discourse Markers
139
103 Information Structure
147
104 Spoken Versus Written English
152
1041 Pointer Words
153
1042 Contraction
155
1043 Must and May
156
1044 Will Versus Going To
158
113 Stress
175
114 Intonation
180
1142 Contours in Context
183
115 Concatenating Phone Numbers
189
1152 Concatenation DigitbyDigit
190
1153 Concatenation by Groups
191
116 Minimizing Concatenation Splices
192
117 Pauses
196
118 TTS Guidelines
199
1181 Analyze Application Usage
201
1184 Make Content Easy to Understand
202
1185 Use Appropriate Formats
203
119 Conclusion
204
Maximizing Efficiency and Clarity
205
121 Efficiency
206
1211 Dont Lose Work
207
1214 Use Caller Modeling to Save Steps
208
122 Clarity
209
1221 Mental Models for Natural Language Understanding
210
1222 Navigational Clarity Through Landmarking
211
123 Balancing Efficiency and Clarity
212
1232 Taper Prompts
213
124 Conclusion
215
Optimizing Accuracy and Recovering from Errors
217
131 Measuring Accuracy
218
132 Dialog Design Guidelines for Maximizing Accuracy
219
133 Recovering from Errors
221
1332 Recovering from Rejects and Timeouts
224
134 Conclusion
228
Sample Application Detailed Design
229
1411 The Login Subdialog
231
1412 The Quotes Subdialog
233
1413 The Trading Subdialog
234
142 Prompt Design
235
143 User Testing
237
144 Conclusion
241
Development Testing and Tuning Methodology
245
1511 Application Development
246
1513 Audio Production
247
1522 Recognition Testing
248
1523 Evaluative Usability Testing
249
1531 Dialog Tuning
251
1532 Recognition Tuning
253
154 Conclusion
256
Creating Grammars
257
161 Grammar Development
259
1612 Developing Grammars for Statistical Language Models
263
1613 Developing Robust Natural Language Grammars
264
1614 Developing Statistical Natural Language Grammars
266
1621 Testing RuleBased Grammars
267
1622 Testing Statistical Language Models
270
163 Grammar Tuning
271
1632 Tuning Statistical Language Models
272
164 Conclusion
273
Working with Voice Actors
275
171 Scripting for Success
276
1712 Scripting Tips
278
172 Choosing Your Voice Actor
289
1722 Coachability
290
173 Running a Recording Session
291
1732 Voice Coaching
293
174 Conclusion
295
Sample Application Development Testing and Tuning
297
1812 Grammar Development
298
1813 Audio Production
300
182 Testing
301
1821 Evaluative Usability Testing
302
183 Tuning
303
1832 Recognition Tuning
306
1833 Grammar Tuning
307
1834 User Survey
308
Conclusion
311
Appendix
313
Bibliography
315
Index
325
Copyright

Other editions - View all

Common terms and phrases