Mapping and modelling word-final n-deletion in Dutch using Twitter data
The pronunciation of word-final -en in Dutch constitutes a notable discrepancy between spoken and written Dutch for many speakers. Whether speakers pronounce -n at the end of the word depends on their region of origin, among a wide array of other factors. The rise of Twitter as a sociolinguistic data source provides a new opportunity to study the geographic distribution of word-final n-deletion and other non-geographic factors that influence its prevalence. The main research question guiding this study is: what can the examination of Twitter data tell us about the degree to which the occurrence of word-final n-deletion is distributed across the Netherlands and Flanders, and to what degree do internal and external linguistic factors influence its prevalence? The secondary research question was: to what degree is the use of Twitter data useful in mapping individual phonological features, especially in terms of the quality of the results? These questions were answered by automatically searching a large corpus of tweets in Dutch and submitting the resulting data to logistic regression and random forest classifier models. While we hypothesized that the resulting maps and results pertaining to the non-geographic features would mirror those relating to word-final n-deletion in spoken language, we instead found evidence that word-final n-deletion as used on Twitter constitutes a separate phenomenon. Therefore, the use of Twitter data also did not prove fruitful in the study of phonetic features per se—however, it does open the door to further research on this newly discovered form of word-final n-deletion.
Faculteit der Letteren