Widely deployed deep neural network (DNN) models have been proven to be
vulnerable to adversarial perturbations in many applications (e.g., image,
audio and text classifications). To date, there are only a few adversarial
perturbations proposed to deviate the DNN models in video recognition systems
by simply injecting 2D perturbations into video frames. However, such attacks
may overly perturb the videos without learning the spatio-temporal features
(across temporal frames), which are commonly extracted by DNN models for video
recognition. To our best knowledge, we propose the first black-box attack
framework that generates universal 3-dimensional (U3D) perturbations to subvert
a variety of video recognition systems. U3D has many advantages, such as (1) as
the transfer-based attack, U3D can universally attack multiple DNN models for
video recognition without accessing to the target DNN model; (2) the high
transferability of U3D makes such universal black-box attack easy-to-launch,
which can be further enhanced by integrating queries over the target model when
necessary; (3) U3D ensures human-imperceptibility; (4) U3D can bypass the
existing state-of-the-art defense schemes; (5) U3D can be efficiently generated
with a few pre-learned parameters, and then immediately injected to attack
real-time DNN-based video recognition systems. We have conducted extensive
experiments to evaluate U3D on multiple DNN models and three large-scale video
datasets. The experimental results demonstrate its superiority and

By admin