Trade-off Between Efficiency and Consistency for Removal-based Explanations
Authors: Yifan Zhang, Haowei He, Zhiquan Tan, Yang Yuan
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical findings indicate that the proposed methods achieve a substantial reduction in interpretation error, up to 31.8 times lower when compared to alternative techniques . In Section 5, we demonstrate that on datasets like IMDb and Image Net, our algorithms achieve up to 31.8x lower interpretation error compared with other methods. |
| Researcher Affiliation | Academia | Yifan Zhang 1 Haowei He 1 Zhiquan Tan2 Yang Yuan1,3,4 1IIIS, Tsinghua University 2Department of Math, Tsinghua Univesity 3Shanghai Artificial Intelligence Laboratory 4Shanghai Qizhi Institute {zhangyif21,hhw19,tanzq21}@mails.tsinghua.edu.cn yuanyang@tsinghua.edu.cn |
| Pseudocode | Yes | Algorithm 1: Harmonica [27]... Algorithm 2: Harmonica-anchor... Algorithm 3: Harmonica-anchor-constrained |
| Open Source Code | Yes | Code is available at https://github.com/trusty-ai/efficient-consistent-explanations. |
| Open Datasets | Yes | The two language tasks we select are the SST-2 [55] dataset for sentiment analysis and the IMDb [40] dataset for movie review classification. The vision task is the Image Net [33] for image classification. ... applied to the same ground-truth image segmentation provided by the MS-COCO dataset [35]. |
| Dataset Splits | No | The paper mentions using well-known datasets but does not explicitly provide specific train/validation/test dataset splits (percentages, counts, or explicit citation to a defined split) needed to reproduce the experiments. It only states a 'test accuracy' for a trained model. |
| Hardware Specification | Yes | All the experiments are run on a server with 4 Nvidia 2080 Ti GPUs. |
| Software Dependencies | No | The paper mentions software like PyTorch, Adam optimizer, GloVe, and Captum.attr.Lime, but it does not specify their version numbers within the text. It states that 'More information about the run-time Python environment and implementation details can be found in our code.' |
| Experiment Setup | Yes | For the two language tasks, i.e., SST-2 and IMDb, we use the same CNN neural network. ... The word embedding layer is pre-trained by Glo Ve [46] and the maximum word number is set to 25, 000. Besides the embedding layer, the network consists of several convolutional kernels with different kernel sizes (3, 4, and 5). After that, we use several fully connected layers, non-linear layers, and pooling layers to process the features. A Sigmoid function is attached to the tail of the network to ensure that the output can be seen as a probability distribution. Our networks are trained with an Adam [32] optimizer with a learning rate of 1e-2 for 5 epochs. |