Learning Groupwise Explanations for Black-Box Models

Authors: Jingyue Gao, Xiting Wang, Yasha Wang, Yulan Yan, Xing Xie

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on six datasets demonstrate the effectiveness of our method. and Finally, we conduct both quantitative experiments and experiments with real users to demonstrate the effectiveness of our method.
Researcher Affiliation Collaboration 1Peking University 2Microsoft Research Asia 3Microsoft {gaojingyue1997, wangyasha}@pku.edu.cn, {xitwan, yulanyan, xing.xie}@microsoft.com
Pseudocode No The paper describes the GIME framework and its optimization process in Section 4 but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes Source code: https://github.com/jygao97/GIME and Codes are provided in the supplementary material to facilitate reproduction of the experimental results.
Open Datasets Yes Datasets. We use six real-world benchmark datasets. The first three are textual datasets and the last three are tabular ones. Specifically, Polarity [Maas et al., 2011] contains highly polar movie reviews and the task is to classify their sentiment. Subjectivity [Pang and Lee, 2004] includes processed sentences that are labeled as either subjective or objective. 20 Newsgroup2 is a collection of news articles. (2http://qwone.com/ jason/20Newsgroups/) and Auto MPG concerns predicting fuel consumption based on attributes of cars. Wine Quality predicts wine quality based on physicochemical tests. Communities enables predicting community crimes based on socio-economic data.
Dataset Splits Yes Train Valid/Test Features in Table 1 shows specific numbers for each, e.g., Polarity (TE) 7,000 1,500 43,548 and We train f and learn explanations on the training set, tune hyperparameters by using the validation set, and evaluate explanations on the test set.
Hardware Specification No The paper does not provide any specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running the experiments.
Software Dependencies No The paper mentions models like BERT and SVR, but does not provide specific version numbers for any software libraries or dependencies used in the experiments.
Experiment Setup Yes If not specifically mentioned, K is set to 20 for large datasets (Polarity and Subjectivity), 10 for middle-sized datasets (20 Newsgroup, Wine Quality, Communities), and 4 for small datasets (Auto MPG). We ensure that all explanations have the same number of nonzero features (5 for tabular data and 50 for textual datasets).