Adversarial Attacks on the Interpretation of Neuron Activation Maximization
Authors: Geraldin Nanfack, Alexander Fulleringer, Jonathan Marty, Michael Eickenberg, Eugene Belilovsky
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide evidence of the success of this manipulation on several pre-trained models for the classification task with Image Net. ... Experiments and Results We now describe the experimental setup and the results obtained after running attacks. For all of our attacks, we use the Image Net (Deng et al. 2009) training set as D. We use the Py Torch (Paszke et al. 2019) pretrained Alex Net (Krizhevsky, Sutskever, and Hinton 2012) for our analysis. In Appx. G and H we provide an ablation study on Efficient Net (Tan and Le 2019), Res Net-50 (He et al. 2016), and Vi T-B/32 (Dosovitskiy et al. 2020) with similar findings. |
| Researcher Affiliation | Academia | Geraldin Nanfack1 2*, Alexander Fulleringer1 2*, Jonathan Marty3, Michael Eickenberg4, Eugene Belilovsky1 2 1Concordia University 2 Mila Quebec AI Institute 3 Princeton University 4 Flatiron Institute |
| Pseudocode | No | The paper describes its attack framework and loss functions using mathematical equations (e.g., Eq. 1, 2, 3, 4, 5) but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide explicit statements or links indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | For all of our attacks, we use the Image Net (Deng et al. 2009) training set as D. ... For the fairwashing attack, we use the Image Net People Subtree dataset (Yang et al. 2020), which is a set of 14k images with labeled demography (gender, race, and age), derived from Image Net-21k. |
| Dataset Splits | Yes | For all of our attacks, we use the Image Net (Deng et al. 2009) training set as D. ... The final validation performance was 56.2%, a drop of less than half a percent. ... Table 2: Accuracy/fairness measures (DDI/DEO) computed respectively on the Image Net val. set and on the annotated testing set. ... For the fairwashing attack, we use the Image Net People Subtree dataset (Yang et al. 2020), which is a set of 14k images with labeled demography (gender, race, and age), derived from Image Net-21k. We use the 75 25% split for training and testing sets, and D0 attack and D1 attack are binary groups (w.r.t. protected attribute) from the training set. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions using PyTorch (Paszke et al. 2019) and other tools like CLIP (Radford et al. 2021) and MILAN (Hernandez et al. 2022), but it does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | More technical details regarding hyperparameters for all the attacks can be found in Appx. B. |