Automatic Quality Assessment of Datasets for Machine Learning

dc.contributor.advisorKwisthout, Johan
dc.contributor.advisorVos, Maarten
dc.contributor.authorBruin, Laurens de
dc.date.issued2025-04-15
dc.description.abstractThis project explores data quality and it’s impact on machine learning performance. We introduce an assessment framework to au tomatically quantify the data quality of a given dataset based on the data quality dimen sions completeness, consistency, and accuracy. Datasets for the project were synthesized by systematically introducing quality issues of varying severity. Training a range of machine learning models on these datasets reveals the large impact of the completeness and accu racy dimensions on model performance, and a lower impact associated with the consistency of data. The framework effectively assesses data quality in dataset versions where a single dimension is degraded but it’s performance can be improved on versions with multiple degraded dimensions. Future work includes refining quality scores for the dimensions and extending the framework to include more ma chine learning tasks and models.
dc.identifier.urihttps://theses.ubn.ru.nl/handle/123456789/18915
dc.language.isoen
dc.thesis.facultyFaculteit der Sociale Wetenschappen
dc.thesis.specialisationspecialisations::Faculteit der Sociale Wetenschappen::Artificial Intelligence::Master Artificial Intelligence
dc.thesis.studyprogrammestudyprogrammes::Faculteit der Sociale Wetenschappen::Artificial Intelligence
dc.thesis.typeMaster
dc.titleAutomatic Quality Assessment of Datasets for Machine Learning

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Bruin, de, L. s-1002199-MSc-MKI94-Thesis-2025.pdf
Size:
4.27 MB
Format:
Adobe Portable Document Format