reproducibilityindex.ai

Rethinking Attention-Model Explainability through Faithfulness Violation Test

Authors: Yibing Liu, Haoliang Li, Yangyang Guo, Chenqi Kong, Jing Li, Shiqi Wang

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments with three groups of explanation methods across tasks, datasets, and model architectures. We find, consistently, most methods tested are limited by the faithfulness violation issue.
Researcher Affiliation	Academia	1City University of Hong Kong, Hong Kong 2National University of Singapore, Singapore 3The Hong Kong Polytechnic University, Hong Kong.
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	All code for model implementation and data processing is made avaliable1. 1https://github.com/Bier One/Attention-Faithfulness
Open Datasets	Yes	Sentiment Analysis. We use Stanford Sentiment Treebank (SST) (Socher et al., 2013) and Yelp2 datasets. ...Topic Classification. We utilize AGNews3 and 20News4 datasets... Paraphrase Detection. We adopt the Quora Question Paraphrase (QQP) dataset (Wang et al., 2019)... Natural Language Inference. We utilize the SNLI (Bowman et al., 2015)... Question Answering. We make use of the b Ab I-1 dataset (Weston et al., 2016)... Visual Question Answering (VQA). We use VQA 2.0 (Goyal et al., 2017) and GQA (Hudson & Manning, 2019) datasets...
Dataset Splits	No	Table 1. Task and dataset statistics. We provide more details for each dataset in Appendix A. (Table shows #Train column but no explicit #Validation column or general statement about validation splits across datasets.)
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions various models and embeddings used (e.g., GloVe, LSTM, CNN, Visual BERT, LXMERT), but it does not specify programming language versions or library versions (e.g., PyTorch, TensorFlow) with specific version numbers.
Experiment Setup	No	The paper details the tasks, datasets, models, explanation methods, and evaluation metrics used in the experimental setup. However, it does not explicitly provide specific hyperparameter values such as learning rates, batch sizes, number of epochs, or optimizer configurations.