This list contains published white-box defenses to adversarial examples that have been open-sourced, along with third-party analyses / security evaluations that have been open-sourced.
Submit a new defense or analysis.
Defense | Venue | Dataset | Threat Model | Natural Accuracy | Claims | Analyses |
---|---|---|---|---|---|---|
Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks (Papernot et al.) (code) | S&P 2016 | MNIST | $$\ell_0 (\epsilon = 112)$$ | 99.51% accuracy |
0.45% adversary success rate in changing classifier’s prediction |
|
Deflecting Adversarial Attacks with Pixel Deflection (Prakash et al.) (code) | CVPR 2018 | ImageNet | $$\ell_2 (\epsilon = 0.05)$$ | 98.9% accuracy (on images originally classified correctly by underlying model) |
81% accuracy (on images originally classified correctly) |
|
Defense against Adversarial Attacks Using High-Level Representation Guided Denoiser (Liao et al.) (code) | CVPR 2018 | ImageNet | $$\ell_\infty (\epsilon = 4/255)$$ | 75% accuracy |
75% accuracy |
|
Towards Deep Learning Models Resistant to Adversarial Attacks (Madry et al.) (code) | ICLR 2018 | CIFAR‑10 | $$\ell_\infty (\epsilon = 8/255)$$ | 87% accuracy |
46% accuracy |
|
Provable defenses against adversarial examples via the convex outer adversarial polytope (Wong & Kolter) (code) | ICML 2018 | MNIST | $$\ell_\infty (\epsilon = 0.1)$$ | 98.2% accuracy |
94.2% accuracy |
|
Mitigating Adversarial Effects Through Randomization (Xie et al.) (code) | ICLR 2018 | ImageNet | $$\ell_\infty (\epsilon = 10/255)$$ | 99.2% accuracy (on images originally classified correctly by underlying model) |
86% accuracy (on images originally classified correctly) |
|
Thermometer Encoding: One Hot Way To Resist Adversarial Examples (Buckman et al.) (code) | ICLR 2018 | CIFAR‑10 | $$\ell_\infty (\epsilon = 8/255)$$ | 90% accuracy |
79% accuracy |
|
Countering Adversarial Images using Input Transformations (Guo et al.) (code) | ICLR 2018 | ImageNet | $$\ell_2 (\epsilon = 0.06)$$ | 75% accuracy |
70% accuracy on ImageNet with average normalized \(\ell_2\) perturbation of 0.06 |
|
Stochastic Activation Pruning for Robust Adversarial Defense (Dhillon et al.) (code) | ICLR 2018 | CIFAR‑10 | $$\ell_\infty (\epsilon = 4/255)$$ | 83% accuracy |
51% accuracy |
|
PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples (Song et al.) (code) | ICLR 2018 | CIFAR‑10 | $$\ell_\infty (\epsilon = 8/255)$$ | 90% accuracy |
70% accuracy |
|
Towards the first adversarially robust neural network model on MNIST (Schott et al.) (code) | NeurIPS SECML 2018 | MNIST | $$\ell_2 (\epsilon = 1.5)$$ | 99% accuracy |
80% accuracy |
|
Provable Robustness of ReLU networks via Maximization of Linear Regions (Croce et al.) (code) | AISTATS 2019 | MNIST | $$\ell_\infty (\epsilon = 0.1)$$ | 98.81% accuracy |
96.42% accuracy (empirical), 96.37% accuracy (certified) |
|
Provable Robustness of ReLU networks via Maximization of Linear Regions (Croce et al.) (code) | AISTATS 2019 | FMNIST | $$\ell_\infty (\epsilon = 0.1)$$ | 85.50% accuracy |
(on first 1000 test points) 73.4% accuracy (empirical), 69.3% accuracy (certified) |
|
Provable Robustness of ReLU networks via Maximization of Linear Regions (Croce et al.) (code) | AISTATS 2019 | GTS | $$\ell_2 (\epsilon = 0.2)$$ | 84.65% accuracy |
(on first 1000 test points) 67.9% accuracy (empirical), 66.8% accuracy (certified) |
|
Harnessing the Vulnerability of Latent Layers in Adversarially Trained Models (Sinha et al.) (code) | IJCAI 2019 | CIFAR‑10 | $$\ell_\infty (\epsilon = 0.03)$$ | 87.8% accuracy |
53.82% accuracy |
|
Provably Robust Boosted Decision Stumps and Trees against Adversarial Attacks (Andriushchenko & Hein) (code) | NeurIPS 2019 | MNIST | $$\ell_\infty (\epsilon = 0.3)$$ | 97.32% accuracy |
87.54% accuracy (certified) |
|
Provably Robust Boosted Decision Stumps and Trees against Adversarial Attacks (Andriushchenko & Hein) (code) | NeurIPS 2019 | FMNIST | $$\ell_\infty (\epsilon = 0.1)$$ | 85.85% accuracy |
76.83% accuracy (certified) |
|