This list contains proposed white-box defenses to adversarial examples that have been open-sourced, along with third-party analyses / security evaluations that have been open-sourced.

Submit a new defense or analysis.

Defense Dataset Threat Model Natural Accuracy Claims Analyses
Bandlimiting Neural Networks Against Adversarial Attacks (Lin et al.) (code) ImageNet $$\ell_\infty (\epsilon = 8/255)$$

77.32% accuracy

76.06% accuracy

Bandlimiting Neural Networks Against Adversarial Attacks (Lin et al.) (code) CIFAR‑10 $$\ell_\infty (\epsilon = 8/255)$$

92.55% accuracy

88.41% accuracy

Adversarial Logit Pairing (Kannan et al.) (code) ImageNet $$\ell_\infty (\epsilon = 16/255)$$

72%

27.9% accuracy

Combatting and detecting FGSM and PGD adversarial noise (James Gannon) (code) MNIST $$\ell_\infty (\epsilon = 0.1)$$

98.2%

78.2% accuracy

Adversarial Defense by Restricting the Hidden Space of Deep Neural Networks (Mustafa et al.) (code) CIFAR‑10 $$\ell_\infty (\epsilon = 8/255)$$

90.62% accuracy

32.32% accuracy