Accuracy of an imputation task on human cognitive data
Keywords
Loading...
Authors
Issue Date
2025-06-18
Language
en
Document type
Journal Title
Journal ISSN
Volume Title
Publisher
Title
ISSN
Volume
Issue
Startpage
Endpage
DOI
Abstract
The issue of missing data is omnipresent in every research domain. Appropriate handling
of missingness in collected data and recognition of the type of missing data can aid
in avoiding bias, maximize available data usage and maintain statistical power. Data
imputation can significantly support these goals. It is defined as estimation incorporating
uncertainty of making predictions and accounting for variability that naturally occurs
within variables being predicted. This work focuses on exploring the effects of imputation
on the classification task. There are three imputation algorithms considered: stochastic
regression imputation, Bayesian imputation and imputation via the joint model. The
study consists of two phases: simulation and application. During the first stage, the
accuracy of the algorithms’ estimations was compared. On the RMSE scale of 0-250,
the stochastic regression imputation scored 4.74, the Bayesian imputation 73.31, and the
joint model 174. In the application phase, the classification quality of the imputed data
was compared against the classifier trained on a complete dataset. It was shown that
imputation significantly affects the result of the classification task. The accuracy of the
classification can either remain on the same level as the classifier trained on complete
data or worsen it. Moreover, classifiers are not consistent with each other, resulting in
high McNemar score values. Additionally, it was discovered that classifiers in this study
were heavily biased, favouring the group of a larger size. The study is not free from
limitations, nonetheless, it was concluded that the final choice regarding which classifier
to use depends on the individual goal since it influences further analysis.
Description
Citation
Faculty
Faculteit der Sociale Wetenschappen
