Approximating Black-Box Deep Neural Networks using Active Learning as a Proxy Measurement for Robustness
Keywords
Loading...
Authors
Issue Date
2021-03-23
Language
en
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Machine learning models are part of our every-day lives. Let it be software that
is responsible for handling self-driving cars or medical imaging techniques used for
magnetic resonance imaging (MRI). These machine learning models can be divided
into two main categories when it comes to visibility. The rst one is called a "whitebox"
model since the access to this model's internal workings is accessible to the public.
The second one is called a "black-box" model that is only accessible to the public to
its limited interfaces. The overall consensus is that these models behave reliably in all
situations regardless of the input that is given. In other words, these models should
behave robustly. However, that is often not the case based on inputs that have been
altered using adversarial crafting techniques. Furthermore, verifying the robustness of
black-box models is an especially hard problem because the internal workings are not
accessible.
This research concentrates on approximating black-box machine learning models by
using ve di erent active learning techniques. The approximated model may not share
the same architecture as the original model but active learning assures that its behavior
is similar to make assumptions about the robustness of the original black-box model.
The experiments include ve di erent deep neural networks of various complexities that
have been trained on three di erent datasets, namely MNIST, CIFAR-10, and GTSRB.
The VGG16 architecture has been used as the main architecture to approximate the
other network architectures by using ve di erent active learning strategies, namely
random-, uncertainty-, K-center-, DFAL- and a combination of DFAL and K-center
strategy. The approximation of the di erent network architectures has been evaluated
using two di erent metrics. The rst one is an overall agreement on a hold-out test
set, while the second one is a transferability score that evaluates if adversarial crafting
techniques based on the FGSM attack are also applicable to the original black-box
model.
This research shows that it is possible to approximate black-box deep neural networks
e ciently by using a combination of the DFAL- and K-center strategy but only if the
original black-box model has been trained towards a high accuracy.
3
Description
Citation
Faculty
Faculteit der Sociale Wetenschappen