Automatically summarizing Dutch human-machine dialogues using transfer learning approaches
Interest in automatic dialogue summarization is increasing, as is the availability of raw dialogue data due to the use of online messenger services, chatbots, and other human- machine interaction systems. Summaries are a natural way of presenting key information in a dialogue. Currently, available dialogue summarization corpora are primarily in English. To that end, this project introduces an extension of two Dutch human-machine dialogue data set to form a Dutch dialogue summarization data set. The data set is used to fi ne-tune two transformer-based automatic dialogue summarizers. Though this technique is already used in abstractive summarization, applying it in the fi eld of dialogue summarization is a novel use of the technique. Automatic evaluation of the generated summaries shows that the transformer-based models are able to generate fluent summaries compared to an extractive baseline. However, further manual evaluation shows that when summarizing the dialogues, the transformer-based models create misleading sentences by introducing fabricated words and combining words from the dialogue incorrectly. Future research should therefore focus on mitigating such errors through creating larger data sets and searching for an automatic evaluation metric that can accurately depict these errors. A final note is that this project focuses on the technical feasibility of an automatic dialogue summarization system.
Faculteit der Sociale Wetenschappen