Deep neural network (DNN) classifiers are powerful tools that drive a broad
spectrum of important applications, from image recognition to autonomous
vehicles. Unfortunately, DNNs are known to be vulnerable to adversarial attacks
that affect virtually all state-of-the-art models. These attacks make small
imperceptible modifications to inputs that are sufficient to induce the DNNs to
produce the wrong classification.
In this paper we propose a novel, lightweight adversarial correction and/or
detection mechanism for image classifiers that relies on undervolting (running
a chip at a voltage that is slightly below its safe margin). We propose using
controlled undervolting of the chip running the inference process in order to
introduce a limited number of compute errors. We show that these errors disrupt
the adversarial input in a way that can be used either to correct the
classification or detect the input as adversarial. We evaluate the proposed
solution in an FPGA design and through software simulation. We evaluate 10
attacks on two popular DNNs and show an average detection rate of 80% to 95%.