The adversarial attack literature contains a myriad of algorithms for
crafting perturbations which yield pathological behavior in neural networks. In
many cases, multiple algorithms target the same tasks and even enforce the same
constraints. In this work, we show that different attack algorithms produce
adversarial examples which are distinct not only in their effectiveness but
also in how they qualitatively affect their victims. We begin by demonstrating
that one can determine the attack algorithm that crafted an adversarial
example. Then, we leverage recent advances in parameter-space saliency maps to
show, both visually and quantitatively, that adversarial attack algorithms
differ in which parts of the network and image they target. Our findings
suggest that prospective adversarial attacks should be compared not only via
their success rates at fooling models but also via deeper downstream effects
they have on victims.

By admin