Towards a Better Understanding of Language Model Information Retrieval

Heijden, M. van der

Towards a Better Understanding of Language Model Information Retrieval

Files

heijdenvdmma-thesis.pdf (119.07 KB)

Authors

Heijden, M. van der

Issue Date

2008-08-20

Language

en

URI

http://theses.ubn.ru.nl/handle/123456789/170

Abstract

Language models form a class of successful probabilistic models in information retrieval. However, knowledge of why some methods perform better than others in a particular situation remains limited. In this study we analyze what language model factors influence information retrieval performance. Starting from popular smoothing methods we review what data features have been used. Document length and a measure of document word distribution turned out to be the important factors, in addition to a distinction in estimating the probability of seen and unseen words. We propose a class of parameter-free smoothing methods, of which multiple specific instances are possible. Instead of parameter tuning however, an analysis of data features should be used to decide upon a specific method. Finally, we discuss some initial experiments.

Supervisor

Sprinkhuizen-Kuyper, I.G.

Weide, Th.P. van der

Faculty

Faculteit der Sociale Wetenschappen

Programme

Artificial Intelligence

Specialisation

Master Artificial Intelligence

Collections

Faculteit der Sociale Wetenschappen

Full item page

Towards a Better Understanding of Language Model Information Retrieval

Keywords

Files

Authors

Issue Date

Language

Document type

Journal Title

Journal ISSN

Volume Title

Publisher

Title

ISSN

Volume

Issue

Startpage

Endpage

URI

DOI

Abstract

Description

Citation

Supervisor

Faculty

Programme

Specialisation

Collections