The importance of equality in deepfake detection datasets

Keywords
Loading...
Thumbnail Image
Issue Date
2023-01-26
Language
en
Document type
Journal Title
Journal ISSN
Volume Title
Publisher
Title
ISSN
Volume
Issue
Startpage
Endpage
DOI
Abstract
Deepfake technology is developing rapidly and chances are deepfakes will become indistinguishable from reality in the near future. The only way to distinguish real and fake images will be deepfake detection. However, the accuracy of the detection models is far from perfect and the datasets used for training these models predominantly contain White people. There is some evidence that these biased datasets also lead to a performance gap between deepfakes of different types of people. Most biased datasets are created because authors do not take this problem seriously which is seen by the number of words they attribute to the demographic distribution and the type of words they use in their accompanying papers. Next to that, they often sample videos from places where the distribution is unknown or likely to be skewed towards a certain ethnic group. I propose three categories in which the proposed datasets fall or in which yet-to-be-created datasets can fall. A better distinction is needed to differentiate between different types of people rather than calling people White or Black, to ensure that no ethnic group is biased against. With a lack thereof, the only procedure to ensure no bias is introduced is to manually sample from a source in such a way that all types of people are represented equally. We then need to rely on people’s own judgments, however, two examples have shown that it is possible to create unbiased deepfake detection datasets.
Description
Citation
Faculty
Faculteit der Sociale Wetenschappen