Language Modeling for Information Retrieval

Bruce Croft, John Lafferty

Springer Science & Business Media, 31 мая 2003 г. - Всего страниц: 246

A statisticallanguage model, or more simply a language model, is a prob abilistic mechanism for generating text. Such adefinition is general enough to include an endless variety of schemes. However, a distinction should be made between generative models, which can in principle be used to synthesize artificial text, and discriminative techniques to classify text into predefined cat egories. The first statisticallanguage modeler was Claude Shannon. In exploring the application of his newly founded theory of information to human language, Shannon considered language as a statistical source, and measured how weH simple n-gram models predicted or, equivalently, compressed natural text. To do this, he estimated the entropy of English through experiments with human subjects, and also estimated the cross-entropy of the n-gram models on natural 1 text. The ability of language models to be quantitatively evaluated in tbis way is one of their important virtues. Of course, estimating the true entropy of language is an elusive goal, aiming at many moving targets, since language is so varied and evolves so quickly. Yet fifty years after Shannon's study, language models remain, by all measures, far from the Shannon entropy liInit in terms of their predictive power. However, tbis has not kept them from being useful for a variety of text processing tasks, and moreover can be viewed as encouragement that there is still great room for improvement in statisticallanguage modeling.

Просмотреть книгу »

Содержание

III	1

IV	2

V	6

VI	9

VII	11

VIII	15

IX	18

X	31

XXXVI	137

XXXVII	139

XXXVIII	141

XLII	142

XLIII	143

XLIV	144

XLV	146

XLVI	147

XI	51

XII	54

XIII	57

XIV	58

XV	59

XVI	65

XVII	70

XVIII	73

XIX	76

XX	81

XXI	89

XXII	95

XXIII	96

XXIV	107

XXV	116

XXVI	120

XXVII	125

XXVIII	127

XXIX	129

XXX	130

XXXI	131

XXXII	132

XXXIV	134

XXXV	135

XLVII	148

XLVIII	160

XLIX	167

L	169

LI	171

LII	178

LIII	183

LIV	185

LV	186

LVII	189

LVIII	191

LIX	196

LX	201

LXI	204

LXII	213

LXIII	219

LXIV	220

LXV	221

LXVI	223

LXVII	226

LXVIII	231

LXIX	241

LXX

Авторские права

Библиографические данные

Название	Language Modeling for Information Retrieval The Information Retrieval Series (Том 13)
Редакторы	Bruce Croft, John Lafferty
Издание:	иллюстрированное
Издатель	Springer Science & Business Media, 2003
ISBN	1402012160, 9781402012167
Количество страниц	Всего страниц: 246

Экспорт цитаты	BiBTeX EndNote RefMan

О Google Книгах - Политика конфиденциальности - Условия использования - Информация для издателей - Сообщить о проблеме - Справка - Главная страница Google

Книги

Language Modeling for Information Retrieval

Содержание

Другие издания - Просмотреть все

Часто встречающиеся слова и выражения

Ссылки на эту книгу

Библиографические данные