Optimizing fairness in machine learning systems through random repair of a biased dataset.

Philips, V.W.

Optimizing fairness in machine learning systems through random repair of a biased dataset.

Files

4288149 Philips.pdf (508.7 KB)

Authors

Philips, V.W.

Issue Date

2019-11-25

Language

en

Abstract

Machine learning systems use datasets that could have biases in their data. These biases could cause unwanted biases in decision making towards groups with sensitive attribute such as race, gender and sexual orientation [5][7]. A possible method to address this problem is a preprocessing method called Random Repair which was introduced in [3] in 2018 [3]. In this thesis we provide insights on the effects that random repair has on fairness and classifier accuracy in machine learning systems. We applied random repair on the adult income dataset and on the COMPAS recidivism dataset. These datasets are known to be biased datasets [5][13]. For each dataset, we compared the fairness and accuracy of a logistic regression classifier and a random forest classifier before and after preprocessing the datasets with random repair. In this thesis we measure the fairness called demographic parity of our classifiers by calculating the disparate impact index. Our research shows that increasing the fairness through random repair results in the desired amount of fairness for both classifiers in both datasets. However, increasing the fairness trough random repair also decreases the classifier accuracy of both classifiers.

URI

https://theses.ubn.ru.nl/handle/123456789/12582

Supervisor

Heskes, T.M.

Faculty

Faculteit der Sociale Wetenschappen