Evaluating hallucinations and repair in open-domain dialogue systems.
This study investigates the repair strategies employed by large language models, during conversations with human interactors and the influence these strategies have on the human interactor’s perception. A corpus of 1123 conversations was collected and analysed, as well as a survey of 14 respondents. The results indicate that the chatbot is limited in its ability to resolve conversational errors and that hallucinations had no adverse influence on user experience. This research has implications for the development of open-domain dialogue systems and conversational agents in the form of evaluation metrics that can be used in order to create a realistic understanding of the capabilities of this technology.
Faculteit der Letteren