Rethinking Attention-Model Explainability through Faithfulness Violation Test
Authors: Yibing Liu, Haoliang Li, Yangyang Guo, Chenqi Kong, Jing Li, Shiqi Wang
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments with three groups of explanation methods across tasks, datasets, and model architectures. We find, consistently, most methods tested are limited by the faithfulness violation issue. |
| Researcher Affiliation | Academia | 1City University of Hong Kong, Hong Kong 2National University of Singapore, Singapore 3The Hong Kong Polytechnic University, Hong Kong. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | All code for model implementation and data processing is made avaliable1. 1https://github.com/Bier One/Attention-Faithfulness |
| Open Datasets | Yes | Sentiment Analysis. We use Stanford Sentiment Treebank (SST) (Socher et al., 2013) and Yelp2 datasets. ...Topic Classification. We utilize AGNews3 and 20News4 datasets... Paraphrase Detection. We adopt the Quora Question Paraphrase (QQP) dataset (Wang et al., 2019)... Natural Language Inference. We utilize the SNLI (Bowman et al., 2015)... Question Answering. We make use of the b Ab I-1 dataset (Weston et al., 2016)... Visual Question Answering (VQA). We use VQA 2.0 (Goyal et al., 2017) and GQA (Hudson & Manning, 2019) datasets... |
| Dataset Splits | No | Table 1. Task and dataset statistics. We provide more details for each dataset in Appendix A. (Table shows #Train column but no explicit #Validation column or general statement about validation splits across datasets.) |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions various models and embeddings used (e.g., GloVe, LSTM, CNN, Visual BERT, LXMERT), but it does not specify programming language versions or library versions (e.g., PyTorch, TensorFlow) with specific version numbers. |
| Experiment Setup | No | The paper details the tasks, datasets, models, explanation methods, and evaluation metrics used in the experimental setup. However, it does not explicitly provide specific hyperparameter values such as learning rates, batch sizes, number of epochs, or optimizer configurations. |