reproducibilityindex.ai

Adversarial Attacks on the Interpretation of Neuron Activation Maximization

Authors: Geraldin Nanfack, Alexander Fulleringer, Jonathan Marty, Michael Eickenberg, Eugene Belilovsky

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide evidence of the success of this manipulation on several pre-trained models for the classification task with Image Net. ... Experiments and Results We now describe the experimental setup and the results obtained after running attacks. For all of our attacks, we use the Image Net (Deng et al. 2009) training set as D. We use the Py Torch (Paszke et al. 2019) pretrained Alex Net (Krizhevsky, Sutskever, and Hinton 2012) for our analysis. In Appx. G and H we provide an ablation study on Efficient Net (Tan and Le 2019), Res Net-50 (He et al. 2016), and Vi T-B/32 (Dosovitskiy et al. 2020) with similar findings.
Researcher Affiliation	Academia	Geraldin Nanfack1 2, Alexander Fulleringer1 2, Jonathan Marty3, Michael Eickenberg4, Eugene Belilovsky1 2 1Concordia University 2 Mila Quebec AI Institute 3 Princeton University 4 Flatiron Institute
Pseudocode	No	The paper describes its attack framework and loss functions using mathematical equations (e.g., Eq. 1, 2, 3, 4, 5) but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	For all of our attacks, we use the Image Net (Deng et al. 2009) training set as D. ... For the fairwashing attack, we use the Image Net People Subtree dataset (Yang et al. 2020), which is a set of 14k images with labeled demography (gender, race, and age), derived from Image Net-21k.
Dataset Splits	Yes	For all of our attacks, we use the Image Net (Deng et al. 2009) training set as D. ... The final validation performance was 56.2%, a drop of less than half a percent. ... Table 2: Accuracy/fairness measures (DDI/DEO) computed respectively on the Image Net val. set and on the annotated testing set. ... For the fairwashing attack, we use the Image Net People Subtree dataset (Yang et al. 2020), which is a set of 14k images with labeled demography (gender, race, and age), derived from Image Net-21k. We use the 75 25% split for training and testing sets, and D0 attack and D1 attack are binary groups (w.r.t. protected attribute) from the training set.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions using PyTorch (Paszke et al. 2019) and other tools like CLIP (Radford et al. 2021) and MILAN (Hernandez et al. 2022), but it does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	More technical details regarding hyperparameters for all the attacks can be found in Appx. B.