Fooling Neural Network Interpretations via Adversarial Model Manipulation

Authors: Juyeon Heo, Sunghwan Joo, Taesup Moon

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our results are validated by both visually showing the fooled explanations and reporting quantitative metrics that measure the deviations from the original explanations.
Researcher Affiliation Academia Juyeon Heo1 , Sunghwan Joo1 , and Taesup Moon1,2 1Department of Electrical and Computer Engineering, 2Department of Artificial Intelligence Sungkyunkwan University, Suwon, Korea, 16419 heojuyeon12@gmail.com, {shjoo840, tsmoon}@skku.edu
Pseudocode No The paper describes methods and formulas but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes The source code is available at https://github.com/rmrisforbidden/Fooling Neural Network-Interpretations.
Open Datasets Yes For all our fooling methods, we used the Image Net training set [30] as our D and took three pretrained models, VGG19 [31], Res Net50 [32], and Dense Net121 [33], for carrying out the foolings.
Dataset Splits Yes We show the fooled explanation generalizes to the entire validation set, indicating that the interpretations are truly fooled, not just for some specific inputs, in contrast to [11, 13, 14]. [...] The accuracy drops are around only 2%/1% for Top-1/Top-5 accuracy, respectively. Table 3: Accuracy of the pre-trained models and the manipulated models on the entire Image Net validation set. [...] Figure 4(a) shows the average AOPC curves on 10K validation images for the original and manipulated Dense Net121 (Top-k fooled with Grad-CAM) models
Hardware Specification No The paper does not specify any hardware used for the experiments (e.g., GPU models, CPU types).
Software Dependencies No The paper does not list specific software dependencies with version numbers.
Experiment Setup Yes For all our fooling methods, we used the Image Net training set [30] as our D and took three pretrained models, VGG19 [31], Res Net50 [32], and Dense Net121 [33], for carrying out the foolings. For the Active fooling, we additionally constructed Dfool with images that contain two classes, {c1 = African Elephant , c2 = Firetruck }, by constructing each image by concatenating two images from each class in the 2 2 block. [...] We empirically defined Rf as [0, 0.2], [0, 0.3], [0.1, 1], and [0.5, 2] for Location, Top-k, Center-mass, and Active fooling, respectively.