Fooling Explanations in Text Classifiers
Authors: Adam Ivankay, Ivan Girardi, Chiara Marchiori, Pascal Frossard
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the performance of the attribution robustness estimation performance in TEF on five sequence classification datasets, utilizing three DNN architectures and three transformer architectures for each dataset. TEF can significantly decrease the correlation between unchanged and perturbed input attributions, which shows that all models and explanation methods are susceptible to TEF perturbations. |
| Researcher Affiliation | Collaboration | Adam Ivankay IBM Research Zurich R uschlikon, Switzerland aiv@zurich.ibm.com Ivan Girardi IBM Research Zurich R uschlikon, Switzerland ivg@zurich.ibm.com Chiara Marchiori IBM Research Zurich R uschlikon, Switzerland chi@zurich.ibm.com Pascal Frossard Ecole Polytechnique F ed erale de Lausanne (EPFL) Lausanne, Switzerland pascal.frossard@epfl.ch |
| Pseudocode | Yes | Algorithm 1 Text Explanation Fooler (TEF) Input: Input sentence s with predicted class l, classifier F, attribution A, attribution distance d, number of synonyms N, maximum perturbation ratio max Output: Adversarial sentence sadv |
| Open Source Code | No | The paper does not provide a specific link or an explicit statement about releasing the source code for the methodology described in the paper. |
| Open Datasets | Yes | Our TEF attack is evaluated on five commonly used public sequence classification datasets, AG s News (Zhang et al., 2015), MR reviews (Zhang et al., 2015), IMDB Movie Reviews (Maas et al., 2011), Fake News Dataset 1 and Yelp (Asghar, 2016). |
| Dataset Splits | No | The paper mentions the use of datasets but does not specify exact training, validation, or test splits (e.g., percentages, counts, or references to predefined splits with details). It notes that samples are grouped into bins based on perturbation ratios for analysis, but this does not describe the original dataset splits for model training and evaluation. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, memory) used to run the experiments. It does not mention any cloud resources or computing clusters with hardware specifications. |
| Software Dependencies | No | The paper mentions the software used, such as 'Py Torch (Paszke et al., 2019) with Captum (Kokhlikyan et al., 2020)', 'Huggingface Transformers library (Wolf et al., 2020)', and 'Spa Cy (Honnibal et al., 2020) tokenizer'. However, it does not provide specific version numbers for these software components, which are necessary for reproducible dependency descriptions. |
| Experiment Setup | No | The paper describes the models, datasets, and evaluation metrics used, and parameters for the TEF attack like N=15 for candidate selection. However, it does not provide specific hyperparameters for training the deep neural networks (e.g., learning rates, batch sizes, number of epochs, optimizer details) or other system-level training configurations, which are crucial for reproducing the model training phase of the experiments. |