Voice User Interface Design

Michael Harris Cohen, James P. Giangola, Jennifer Balogh

Addison-Wesley Professional, 2004 - Computers - 336 pages

This book is a comprehensive and authoritative guide to voice user interface (VUI) design. The VUI is perhaps the most critical factor in the success of any automated speech recognition (ASR) system, determining whether the user experience will be satisfying or frustrating, or even whether the customer will remain one. This book describes a practical methodology for creating an effective VUI design. The methodology is scientifically based on principles in linguistics, psychology, and language technology, and is illustrated here by examples drawn from the authors' work at Nuance Communications, the market leader in ASR development and deployment.

The book begins with an overview of VUI design issues and a description of the technology. The authors then introduce the major phases of their methodology. They first show how to specify requirements and make high-level design decisions during the definition phase. They next cover, in great detail, the design phase, with clear explanations and demonstrations of each design principle and its real-world applications. Finally, they examine problems unique to VUI design in system development, testing, and tuning. Key principles are illustrated with a running sample application.

A companion Web site provides audio clips for each example: www.VUIDesign.org

The cover photograph depicts the first ASR system, Radio Rex: a toy dog who sits in his house until the sound of his name calls him out. Produced in 1911, Rex was among the few commercial successes in earlier days of speech recognition. Voice User Interface Design reveals the design principles and practices that produce commercial success in an era when effective ASRs are not toys but competitive necessities.

Preview this book »

Selected pages

Title Page

Index

References

Introduction to Voice User Interfaces	3

11 What Is a Voice User Interface?	5

111 Auditory Interfaces	6

112 Spoken Language Interfaces	7

12 Why Speech?	9

13 Where Do We Go from Here?	12

Overview of Spoken Language Technology	15

21 Architecture of a Spoken Language System	16

1045 Romans Perspire AngloSaxons Sweat	160

105 Register and Consistency	161

106 Jargon	164

107 The Cooperative Principle	166

108 Conclusion	168

Planning Prosody	171

111 What Is Prosody?	172

112 Functions of Prosody	173

212 Recognition	19

213 Other Speech Technologies	24

22 The Impact of Speech Technology on Design Decisions	26

221 Performance Challenges	27

222 Problem Solving	28

223 Definition Files	29

23 Conclusion	31

Overview of the Methodology	33

311 User Input	34

312 Integrated Business and User Needs	35

314 Conversational Design	36

315 Context	37

32 Steps of the Methodology	38

322 HighLevel Design	39

325 Testing	40

33 Applying the Methodology to RealWorld Applications	41

332 Dealing with RealWorld Budget and Time Constraints	42

Requirements and HighLevel Design Methodology	45

411 Understanding the Business	46

412 Understanding the Users	48

413 Understanding the Application	53

42 HighLevel Design	55

422 Dialog Strategy and Grammar Type	56

425 Metaphor	57

427 Nonverbal Audio	58

43 Conclusion	61

HighLevel Design Elements	63

52 Pervasive Dialog Elements	67

522 Universals	72

53 Conclusion	73

Creating Persona by Design	75

61 What Is Persona?	77

62 Where Does Persona Come From?	78

63 A Checklist for Persona Design	79

632 Brand and Image	80

634 Application	81

65 Conclusion	83

Sample Application Requirements and HighLevel Design	85

71 Lexington Brokerage	86

721 Understanding the Business Goals and Context	87

722 Understanding the Caller	89

723 Understanding the Application	92

73 HighLevel Design	95

733 Pervasive Dialog Elements	96

734 Recurring Terminology	97

736 Persona	98

737 Nonverbal Audio	100

Detailed Design Methodology	103

81 Anatomy of a Dialog State	104

82 Call Flow Design	105

83 Prompt Design	107

831 Conversational Design	108

832 Auditory Design	110

841 Formal Usability Testing	111

842 Card Sorting	116

86 Conclusion	118

Minimizing Cognitive Load	119

91 Conceptual Complexity	120

911 Constancy	121

912 Consistency	123

913 Context Setting	124

92 Memory Load	125

921 Menu Size	126

93 Attention	129

94 Conclusion	131

Designing Prompts	133

101 Conversation as Discourse	135

102 Cohesion	137

1021 Pronouns and Time Adverbs	138

1022 Discourse Markers	139

103 Information Structure	147

104 Spoken Versus Written English	152

1041 Pointer Words	153

1042 Contraction	155

1043 Must and May	156

1044 Will Versus Going To	158

113 Stress	175

114 Intonation	180

1142 Contours in Context	183

115 Concatenating Phone Numbers	189

1152 Concatenation DigitbyDigit	190

1153 Concatenation by Groups	191

116 Minimizing Concatenation Splices	192

117 Pauses	196

118 TTS Guidelines	199

1181 Analyze Application Usage	201

1184 Make Content Easy to Understand	202

1185 Use Appropriate Formats	203

119 Conclusion	204

Maximizing Efficiency and Clarity	205

121 Efficiency	206

1211 Dont Lose Work	207

1214 Use Caller Modeling to Save Steps	208

122 Clarity	209

1221 Mental Models for Natural Language Understanding	210

1222 Navigational Clarity Through Landmarking	211

123 Balancing Efficiency and Clarity	212

1232 Taper Prompts	213

124 Conclusion	215

Optimizing Accuracy and Recovering from Errors	217

131 Measuring Accuracy	218

132 Dialog Design Guidelines for Maximizing Accuracy	219

133 Recovering from Errors	221

1332 Recovering from Rejects and Timeouts	224

134 Conclusion	228

Sample Application Detailed Design	229

1411 The Login Subdialog	231

1412 The Quotes Subdialog	233

1413 The Trading Subdialog	234

142 Prompt Design	235

143 User Testing	237

144 Conclusion	241

Development Testing and Tuning Methodology	245

1511 Application Development	246

1513 Audio Production	247

1522 Recognition Testing	248

1523 Evaluative Usability Testing	249

1531 Dialog Tuning	251

1532 Recognition Tuning	253

154 Conclusion	256

Creating Grammars	257

161 Grammar Development	259

1612 Developing Grammars for Statistical Language Models	263

1613 Developing Robust Natural Language Grammars	264

1614 Developing Statistical Natural Language Grammars	266

1621 Testing RuleBased Grammars	267

1622 Testing Statistical Language Models	270

163 Grammar Tuning	271

1632 Tuning Statistical Language Models	272

164 Conclusion	273

Working with Voice Actors	275

171 Scripting for Success	276

1712 Scripting Tips	278

172 Choosing Your Voice Actor	289

1722 Coachability	290

173 Running a Recording Session	291

1732 Voice Coaching	293

174 Conclusion	295

Sample Application Development Testing and Tuning	297

1812 Grammar Development	298

1813 Audio Production	300

182 Testing	301

1821 Evaluative Usability Testing	302

183 Tuning	303

1832 Recognition Tuning	306

1833 Grammar Tuning	307

1834 User Survey	308

Conclusion	311

Appendix	313

Bibliography	315

Index	325

Copyright

Bibliographic information

Title	Voice User Interface Design O'Reilly Online Learning
Authors	Michael Harris Cohen, James P. Giangola, Jennifer Balogh
Edition	illustrated
Publisher	Addison-Wesley Professional, 2004
ISBN	0321185765, 9780321185761
Length	336 pages
Subjects	Computers › Software Development & Engineering › General Computers / Computer Engineering Computers / Human-Computer Interaction (HCI) Computers / Software Development & Engineering / General Computers / Speech & Audio Processing Computers / User Interfaces

Export Citation	BiBTeX EndNote RefMan

About Google Books - Privacy Policy - Terms of Service - Information for Publishers - Report an issue - Help - Google Home

Books

Voice User Interface Design

Selected pages

Contents

Other editions - View all

Common terms and phrases

References to this book

Bibliographic information