Predicting Textual Complexity for Elementary School Students
Keywords
No Thumbnail Available
Authors
Issue Date
2019-06-11
Language
en
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
In this thesis the viability of predicting textual complexity from short texts written by
primary school students was investigated. Linguistic features were extracted from
texts from the BasiScript corpus using T-Scan, and analyzed using Multiple Linear
Regression Analysis and Principal Component Analysis. Although the Multiple Linear
Regression results cannot be shown to be correct for the individual features due
to collinearity, a strong effect size was found for both the total amount of features
(R2 = .68) and for a subset of 50 features (R2 = .55). Approximately 68 percent of the
variability in textual complexity can be predicted using the total amount of features,
and approximately 55 percent using the subset of features. Multiple Linear Regression
Analysis using a subset of only five selected Principal Components showed a
moderate effect size (R2 = .43). Additionally, the first few Principal Components
showed a structural relation in the highest contributing features, with features related
to word complexity, concreteness, relational cohesion and relational coherence
having a relatively high contribution. These results suggest that to a certain extent a
prediction of text complexity can be made. A follow-up study should investigate the
optimal way to select features so that collinearity is removed, yet predictive power
is retained. Looking at the results in this study this should be a possible and logical
next step.
Description
Citation
Supervisor
Faculty
Faculteit der Sociale Wetenschappen