reproducibilityindex.ai

ML-LOO: Detecting Adversarial Examples with Feature Attribution

Authors: Puyudi Yang, Jianbo Chen, Cho-Jui Hsieh, Jane-Ling Wang, Michael Jordan6639-6647

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	As demonstrated in extensive experiments, our method achieves superior performances in distinguishing adversarial examples from popular attack methods on a variety of real data sets compared to stateof-the-art detection methods.
Researcher Affiliation	Academia	1University of California, Davis 2University of California, Berkeley 3University of California, Los Angeles
Pseudocode	No	No explicit pseudocode or algorithm blocks were found in the paper.
Open Source Code	Yes	The code for ML-LOO is available at our Github page. We comment here that our proposed framework of adversarial detection via feature attribution is generic to popular feature attribution methods. As an example, we show the performance of Integrated Gradients (Sundararajan, Taly, and Yan 2017) for adversarial detection in the supplementary material at https://github.com/Jianbo-Lab/ML-LOO.
Open Datasets	Yes	on three data sets: MNIST, CIFAR-10 and CIFAR-100, with the standard train/test split (Chollet and others 2015).
Dataset Splits	No	The paper mentions "standard train/test split" but does not explicitly specify a validation split or its size for model training. It does refer to training data for the detection methods: "1,000 adversarial images with the corresponding 1,000 natural images were used for the training process of LID, Mahalanobis and our method."
Hardware Specification	No	The paper does not explicitly describe the hardware used for experiments (e.g., specific GPU/CPU models, memory, or cloud instance types).
Software Dependencies	No	The paper mentions "Keras" but does not provide specific version numbers for it or any other software dependencies, making it difficult to reproduce the software environment.
Experiment Setup	Yes	We set the conﬁdence parameter c = 0 for C&W-LC and c = 50 for C&W-HC. For mixed-conﬁdence C&W attack, we generate adversarial images from C&W attack with the conﬁdence parameter in Equation (2) randomly selected from {1, 3, 5, ..., 29}... For mixed-conﬁdence ℓ∞-PGD attack, we generated adversarial images from ℓ∞-PGD with different conﬁdence levels by randomly selecting the constraint ε in Equation (3) from {1, 2, 3, 4, 5, 6, 7, 8}/255. The loss is minimized with Adam (Kingma and Ba 2014).