reproducibilityindex.ai

Trade-off Between Efficiency and Consistency for Removal-based Explanations

Authors: Yifan Zhang, Haowei He, Zhiquan Tan, Yang Yuan

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical findings indicate that the proposed methods achieve a substantial reduction in interpretation error, up to 31.8 times lower when compared to alternative techniques . In Section 5, we demonstrate that on datasets like IMDb and Image Net, our algorithms achieve up to 31.8x lower interpretation error compared with other methods.
Researcher Affiliation	Academia	Yifan Zhang 1 Haowei He 1 Zhiquan Tan2 Yang Yuan1,3,4 1IIIS, Tsinghua University 2Department of Math, Tsinghua Univesity 3Shanghai Artificial Intelligence Laboratory 4Shanghai Qizhi Institute {zhangyif21,hhw19,tanzq21}@mails.tsinghua.edu.cn yuanyang@tsinghua.edu.cn
Pseudocode	Yes	Algorithm 1: Harmonica [27]... Algorithm 2: Harmonica-anchor... Algorithm 3: Harmonica-anchor-constrained
Open Source Code	Yes	Code is available at https://github.com/trusty-ai/efficient-consistent-explanations.
Open Datasets	Yes	The two language tasks we select are the SST-2 [55] dataset for sentiment analysis and the IMDb [40] dataset for movie review classification. The vision task is the Image Net [33] for image classification. ... applied to the same ground-truth image segmentation provided by the MS-COCO dataset [35].
Dataset Splits	No	The paper mentions using well-known datasets but does not explicitly provide specific train/validation/test dataset splits (percentages, counts, or explicit citation to a defined split) needed to reproduce the experiments. It only states a 'test accuracy' for a trained model.
Hardware Specification	Yes	All the experiments are run on a server with 4 Nvidia 2080 Ti GPUs.
Software Dependencies	No	The paper mentions software like PyTorch, Adam optimizer, GloVe, and Captum.attr.Lime, but it does not specify their version numbers within the text. It states that 'More information about the run-time Python environment and implementation details can be found in our code.'
Experiment Setup	Yes	For the two language tasks, i.e., SST-2 and IMDb, we use the same CNN neural network. ... The word embedding layer is pre-trained by Glo Ve [46] and the maximum word number is set to 25, 000. Besides the embedding layer, the network consists of several convolutional kernels with different kernel sizes (3, 4, and 5). After that, we use several fully connected layers, non-linear layers, and pooling layers to process the features. A Sigmoid function is attached to the tail of the network to ensure that the output can be seen as a probability distribution. Our networks are trained with an Adam [32] optimizer with a learning rate of 1e-2 for 5 epochs.