Towards a Better Understanding of Language Model Information Retrieval
Keywords
Loading...
Authors
Issue Date
2008-08-20
Language
en
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Language models form a class of successful probabilistic models in information
retrieval. However, knowledge of why some methods perform better than others in
a particular situation remains limited. In this study we analyze what language model
factors influence information retrieval performance. Starting from popular smoothing
methods we review what data features have been used. Document length and a
measure of document word distribution turned out to be the important factors, in
addition to a distinction in estimating the probability of seen and unseen words. We
propose a class of parameter-free smoothing methods, of which multiple specific
instances are possible. Instead of parameter tuning however, an analysis of data
features should be used to decide upon a specific method. Finally, we discuss some
initial experiments.
Description
Citation
Faculty
Faculteit der Sociale Wetenschappen