Automatic Quality Assessment of Datasets for Machine Learning
| dc.contributor.advisor | Kwisthout, Johan | |
| dc.contributor.advisor | Vos, Maarten | |
| dc.contributor.author | Bruin, Laurens de | |
| dc.date.issued | 2025-04-15 | |
| dc.description.abstract | This project explores data quality and it’s impact on machine learning performance. We introduce an assessment framework to au tomatically quantify the data quality of a given dataset based on the data quality dimen sions completeness, consistency, and accuracy. Datasets for the project were synthesized by systematically introducing quality issues of varying severity. Training a range of machine learning models on these datasets reveals the large impact of the completeness and accu racy dimensions on model performance, and a lower impact associated with the consistency of data. The framework effectively assesses data quality in dataset versions where a single dimension is degraded but it’s performance can be improved on versions with multiple degraded dimensions. Future work includes refining quality scores for the dimensions and extending the framework to include more ma chine learning tasks and models. | |
| dc.identifier.uri | https://theses.ubn.ru.nl/handle/123456789/18915 | |
| dc.language.iso | en | |
| dc.thesis.faculty | Faculteit der Sociale Wetenschappen | |
| dc.thesis.specialisation | specialisations::Faculteit der Sociale Wetenschappen::Artificial Intelligence::Master Artificial Intelligence | |
| dc.thesis.studyprogramme | studyprogrammes::Faculteit der Sociale Wetenschappen::Artificial Intelligence | |
| dc.thesis.type | Master | |
| dc.title | Automatic Quality Assessment of Datasets for Machine Learning |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Bruin, de, L. s-1002199-MSc-MKI94-Thesis-2025.pdf
- Size:
- 4.27 MB
- Format:
- Adobe Portable Document Format
