The importance of equality in deepfake detection datasets
Keywords
Loading...
Authors
Issue Date
2023-01-26
Language
en
Document type
Journal Title
Journal ISSN
Volume Title
Publisher
Title
ISSN
Volume
Issue
Startpage
Endpage
DOI
Abstract
Deepfake technology is developing rapidly and chances are deepfakes will become
indistinguishable from reality in the near future. The only way to distinguish real and
fake images will be deepfake detection. However, the accuracy of the detection models
is far from perfect and the datasets used for training these models predominantly
contain White people. There is some evidence that these biased datasets also lead
to a performance gap between deepfakes of different types of people. Most biased
datasets are created because authors do not take this problem seriously which is seen
by the number of words they attribute to the demographic distribution and the type
of words they use in their accompanying papers. Next to that, they often sample
videos from places where the distribution is unknown or likely to be skewed towards
a certain ethnic group. I propose three categories in which the proposed datasets
fall or in which yet-to-be-created datasets can fall. A better distinction is needed
to differentiate between different types of people rather than calling people White
or Black, to ensure that no ethnic group is biased against. With a lack thereof, the
only procedure to ensure no bias is introduced is to manually sample from a source
in such a way that all types of people are represented equally. We then need to rely
on people’s own judgments, however, two examples have shown that it is possible to
create unbiased deepfake detection datasets.
Description
Citation
Supervisor
Faculty
Faculteit der Sociale Wetenschappen