Published Adversarial Example Defenses

This list contains published white-box defenses to adversarial examples that have been open-sourced, along with third-party analyses / security evaluations that have been open-sourced.

Submit a new defense or analysis.

Defense	Venue	Dataset	Threat Model	Natural Accuracy	Claims	Analyses
Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks (Papernot et al.) (code)	S&P 2016	MNIST	$$\ell_0 (\epsilon = 112)$$	99.51% accuracy	0.45% adversary success rate in changing classifier’s prediction	3.6% accuracy [CW16] (code)
Deflecting Adversarial Attacks with Pixel Deflection (Prakash et al.) (code)	CVPR 2018	ImageNet	$$\ell_2 (\epsilon = 0.05)$$	98.9% accuracy (on images originally classified correctly by underlying model)	81% accuracy (on images originally classified correctly)	0% accuracy [AC18] (code)
Defense against Adversarial Attacks Using High-Level Representation Guided Denoiser (Liao et al.) (code)	CVPR 2018	ImageNet	$$\ell_\infty (\epsilon = 4/255)$$	75% accuracy	75% accuracy	0% accuracy [AC18] (code)
Towards Deep Learning Models Resistant to Adversarial Attacks (Madry et al.) (code)	ICLR 2018	CIFAR‑10	$$\ell_\infty (\epsilon = 8/255)$$	87% accuracy	46% accuracy
Provable defenses against adversarial examples via the convex outer adversarial polytope (Wong & Kolter) (code)	ICML 2018	MNIST	$$\ell_\infty (\epsilon = 0.1)$$	98.2% accuracy	94.2% accuracy
Mitigating Adversarial Effects Through Randomization (Xie et al.) (code)	ICLR 2018	ImageNet	$$\ell_\infty (\epsilon = 10/255)$$	99.2% accuracy (on images originally classified correctly by underlying model)	86% accuracy (on images originally classified correctly)	0% accuracy [ACW18] (code)
Thermometer Encoding: One Hot Way To Resist Adversarial Examples (Buckman et al.) (code)	ICLR 2018	CIFAR‑10	$$\ell_\infty (\epsilon = 8/255)$$	90% accuracy	79% accuracy	30% accuracy [ACW18] (code)
Countering Adversarial Images using Input Transformations (Guo et al.) (code)	ICLR 2018	ImageNet	$$\ell_2 (\epsilon = 0.06)$$	75% accuracy	70% accuracy on ImageNet with average normalized $\ell_2$ perturbation of 0.06	0% accuracy [ACW18] (code)
Stochastic Activation Pruning for Robust Adversarial Defense (Dhillon et al.) (code)	ICLR 2018	CIFAR‑10	$$\ell_\infty (\epsilon = 4/255)$$	83% accuracy	51% accuracy	0% accuracy [ACW18] (code)
PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples (Song et al.) (code)	ICLR 2018	CIFAR‑10	$$\ell_\infty (\epsilon = 8/255)$$	90% accuracy	70% accuracy	9% accuracy [ACW18] (code)
Towards the first adversarially robust neural network model on MNIST (Schott et al.) (code)	NeurIPS SECML 2018	MNIST	$$\ell_2 (\epsilon = 1.5)$$	99% accuracy	80% accuracy
Provable Robustness of ReLU networks via Maximization of Linear Regions (Croce et al.) (code)	AISTATS 2019	MNIST	$$\ell_\infty (\epsilon = 0.1)$$	98.81% accuracy	96.42% accuracy (empirical), 96.37% accuracy (certified)
Provable Robustness of ReLU networks via Maximization of Linear Regions (Croce et al.) (code)	AISTATS 2019	FMNIST	$$\ell_\infty (\epsilon = 0.1)$$	85.50% accuracy	(on first 1000 test points) 73.4% accuracy (empirical), 69.3% accuracy (certified)
Provable Robustness of ReLU networks via Maximization of Linear Regions (Croce et al.) (code)	AISTATS 2019	GTS	$$\ell_2 (\epsilon = 0.2)$$	84.65% accuracy	(on first 1000 test points) 67.9% accuracy (empirical), 66.8% accuracy (certified)
Harnessing the Vulnerability of Latent Layers in Adversarially Trained Models (Sinha et al.) (code)	IJCAI 2019	CIFAR‑10	$$\ell_\infty (\epsilon = 0.03)$$	87.8% accuracy	53.82% accuracy
Provably Robust Boosted Decision Stumps and Trees against Adversarial Attacks (Andriushchenko & Hein) (code)	NeurIPS 2019	MNIST	$$\ell_\infty (\epsilon = 0.3)$$	97.32% accuracy	87.54% accuracy (certified)
Provably Robust Boosted Decision Stumps and Trees against Adversarial Attacks (Andriushchenko & Hein) (code)	NeurIPS 2019	FMNIST	$$\ell_\infty (\epsilon = 0.1)$$	85.85% accuracy	76.83% accuracy (certified)