The Role of Talker-Specific Prosody in Predictive Speech Perception

Severijnen, Giulio

The Role of Talker-Specific Prosody in Predictive Speech Perception

Files

s4294467_Severijnen,G_MSc_Thesis_2020.pdf (1.25 MB)

Authors

Severijnen, Giulio

Issue Date

2020-01-13

Language

en

URI

https://theses.ubn.ru.nl/handle/123456789/14669

Abstract

One of the challenges in speech perception for listeners is to deal with the huge segmental and suprasegmental variability in the acoustic signal between different talkers. Most studies have focused on how listeners deal with segmental variability. In this EEG experiment, we investigated how listeners learn about variability in suprasegmental cues between talkers to recognize spoken words. Participants learned non-word minimal stress pairs (e.g., USklot/usKLOT), and objects to which the non-words referred (e.g., the item USklot referring to a “lamp”, the item usKLOT referring to a “train”). These non-words were produced by two different talkers and each talker only used one acoustic cue to signal lexical stress patterns (e.g., Talker A only used F0 and Talker B only used amplitude). This allowed participants to learn the correct item-to-object mappings as well as, through perceptual learning, which cues were used by each talker. At test, participants heard semantically constraining sentences, spoken by either talker, containing these non-words in sentence-final position. The sentence-final word could either be produced using the correct cues (e.g., Talker A using F0; control condition) or the incorrect cues (e.g., Talker A using amplitude; cue-switch condition). If participants learned about the talker-specific cues, they would be able to predict upcoming talker-matching word-forms (e.g., USklot cued using only F0). We hypothesized that the sentences in the cue-switch condition would lead to longer RTs and elicit a relatively larger N200 response compared to the control condition. Results showed that the sentences in the cue-switch condition indeed led to longer RTs compared to the control condition. This suggests that these sentences created a mismatch between predicted and perceived word-forms based on the talker-specific cues. In contrast, the N200 amplitude was not modulated by these sentences. We conclude that these results illustrate talker-specific prediction of suprasegmental cues, picked up through perceptual learning on previous encounters.