Impact of reverberation through deep neural networks on adversarial perturbations
Abstract
The vulnerability of Deep Neural Network (DNN) models
to maliciously crafted adversarial perturbations is a
critical topic considering their ongoing large-scale deployment.
In this work, we explore an interesting phenomenon
that occurs when an image is reinjected multiple times
into a DNN, according to a procedure (called reverberation)
that has been first proposed in cognitive psychology to
avoid the catastrophic forgetting issue, through its impact
on adversarial perturbations. We describe reverberation in
vanilla autoencoders and propose a new reverberant architecture
combining a classifier and an autoencoder that allows
the joint observation of the logits and reconstructed
images. We experimentally measure the impact of reverberation
on adversarial perturbations placing ourselves in a
scenario of adversarial example detection. The results show
that clean and adversarial examples even with small levels
of perturbation behave very differently throughout reverberation.
While computationally efficient (reverberation
is only based on inferences), our approach yields promising
results for adversarial examples detection, consistent
across datasets, adversarial attacks and DNN architectures.
Origin : Files produced by the author(s)