Fooling Neural Network Interpretations via Adversarial Model Manipulation
Authors: Juyeon Heo, Sunghwan Joo, Taesup Moon
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our results are validated by both visually showing the fooled explanations and reporting quantitative metrics that measure the deviations from the original explanations. |
| Researcher Affiliation | Academia | Juyeon Heo1 , Sunghwan Joo1 , and Taesup Moon1,2 1Department of Electrical and Computer Engineering, 2Department of Artificial Intelligence Sungkyunkwan University, Suwon, Korea, 16419 heojuyeon12@gmail.com, {shjoo840, tsmoon}@skku.edu |
| Pseudocode | No | The paper describes methods and formulas but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The source code is available at https://github.com/rmrisforbidden/Fooling Neural Network-Interpretations. |
| Open Datasets | Yes | For all our fooling methods, we used the Image Net training set [30] as our D and took three pretrained models, VGG19 [31], Res Net50 [32], and Dense Net121 [33], for carrying out the foolings. |
| Dataset Splits | Yes | We show the fooled explanation generalizes to the entire validation set, indicating that the interpretations are truly fooled, not just for some specific inputs, in contrast to [11, 13, 14]. [...] The accuracy drops are around only 2%/1% for Top-1/Top-5 accuracy, respectively. Table 3: Accuracy of the pre-trained models and the manipulated models on the entire Image Net validation set. [...] Figure 4(a) shows the average AOPC curves on 10K validation images for the original and manipulated Dense Net121 (Top-k fooled with Grad-CAM) models |
| Hardware Specification | No | The paper does not specify any hardware used for the experiments (e.g., GPU models, CPU types). |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers. |
| Experiment Setup | Yes | For all our fooling methods, we used the Image Net training set [30] as our D and took three pretrained models, VGG19 [31], Res Net50 [32], and Dense Net121 [33], for carrying out the foolings. For the Active fooling, we additionally constructed Dfool with images that contain two classes, {c1 = African Elephant , c2 = Firetruck }, by constructing each image by concatenating two images from each class in the 2 2 block. [...] We empirically defined Rf as [0, 0.2], [0, 0.3], [0.1, 1], and [0.5, 2] for Location, Top-k, Center-mass, and Active fooling, respectively. |