reproducibilityindex.ai

Fooling Explanations in Text Classifiers

Authors: Adam Ivankay, Ivan Girardi, Chiara Marchiori, Pascal Frossard

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the performance of the attribution robustness estimation performance in TEF on ﬁve sequence classiﬁcation datasets, utilizing three DNN architectures and three transformer architectures for each dataset. TEF can signiﬁcantly decrease the correlation between unchanged and perturbed input attributions, which shows that all models and explanation methods are susceptible to TEF perturbations.
Researcher Affiliation	Collaboration	Adam Ivankay IBM Research Zurich R uschlikon, Switzerland aiv@zurich.ibm.com Ivan Girardi IBM Research Zurich R uschlikon, Switzerland ivg@zurich.ibm.com Chiara Marchiori IBM Research Zurich R uschlikon, Switzerland chi@zurich.ibm.com Pascal Frossard Ecole Polytechnique F ed erale de Lausanne (EPFL) Lausanne, Switzerland pascal.frossard@epfl.ch
Pseudocode	Yes	Algorithm 1 Text Explanation Fooler (TEF) Input: Input sentence s with predicted class l, classiﬁer F, attribution A, attribution distance d, number of synonyms N, maximum perturbation ratio max Output: Adversarial sentence sadv
Open Source Code	No	The paper does not provide a specific link or an explicit statement about releasing the source code for the methodology described in the paper.
Open Datasets	Yes	Our TEF attack is evaluated on ﬁve commonly used public sequence classiﬁcation datasets, AG s News (Zhang et al., 2015), MR reviews (Zhang et al., 2015), IMDB Movie Reviews (Maas et al., 2011), Fake News Dataset 1 and Yelp (Asghar, 2016).
Dataset Splits	No	The paper mentions the use of datasets but does not specify exact training, validation, or test splits (e.g., percentages, counts, or references to predefined splits with details). It notes that samples are grouped into bins based on perturbation ratios for analysis, but this does not describe the original dataset splits for model training and evaluation.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, memory) used to run the experiments. It does not mention any cloud resources or computing clusters with hardware specifications.
Software Dependencies	No	The paper mentions the software used, such as 'Py Torch (Paszke et al., 2019) with Captum (Kokhlikyan et al., 2020)', 'Huggingface Transformers library (Wolf et al., 2020)', and 'Spa Cy (Honnibal et al., 2020) tokenizer'. However, it does not provide specific version numbers for these software components, which are necessary for reproducible dependency descriptions.
Experiment Setup	No	The paper describes the models, datasets, and evaluation metrics used, and parameters for the TEF attack like N=15 for candidate selection. However, it does not provide specific hyperparameters for training the deep neural networks (e.g., learning rates, batch sizes, number of epochs, optimizer details) or other system-level training configurations, which are crucial for reproducing the model training phase of the experiments.