Transformer as a computational model of human language processing: An exploratory study to compare modelling capacities of Transformer with Gated Recurrent Unit, using evidence from reading times.
Transformer was introduced by Vaswani et al. (2017) as an artificial neural network. Its operation is solely reliant upon the attention mechanism. In the current study, the cognitive modelling abilities of Transformer and GRU (a type of gated recurrent neural network) were compared, and expected to be different owing to the conceptual distinctions in the attention and (gated) recurrence mechanisms. Furthermore, the manners which neural networks process and handle information (i.e., preceding words in a sequence) may carry implications about human sentence processing. Methodologically, modelling abilities were indicated by the goodness-of-fit measures between surprisal estimates computed by GRU and Transformer and self-paced reading time and gaze duration from human behavioural data sets. Subsequently, to compare the abilities of the model to account for human behavioural data, these goodness-of-fit estimates were fitted to Generalized Additive Models as a function of language model accuracy. Our findings, indeed, claimed that Transformer outperformed GRU in both processing measures (i.e., reading time and gaze duration) in terms of its modelling capacities. It was then reasoned that the divergent manners which GRU and Transformer use previous materials for the prediction of upcoming words could result in their varied performance. Moreover, because next word prediction is a task in which hierarchical structure might be unimportant, the recurrence mechanism did not have an advantage over the attention one. Keywords: Transformer; attention mechanism; recurrent neural networks; surprisal estimates; cognitive modelling; self-paced reading time; gaze duration.
Faculteit der Letteren