Learning Groupwise Explanations for Black-Box Models
Authors: Jingyue Gao, Xiting Wang, Yasha Wang, Yulan Yan, Xing Xie
IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on six datasets demonstrate the effectiveness of our method. and Finally, we conduct both quantitative experiments and experiments with real users to demonstrate the effectiveness of our method. |
| Researcher Affiliation | Collaboration | 1Peking University 2Microsoft Research Asia 3Microsoft {gaojingyue1997, wangyasha}@pku.edu.cn, {xitwan, yulanyan, xing.xie}@microsoft.com |
| Pseudocode | No | The paper describes the GIME framework and its optimization process in Section 4 but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Source code: https://github.com/jygao97/GIME and Codes are provided in the supplementary material to facilitate reproduction of the experimental results. |
| Open Datasets | Yes | Datasets. We use six real-world benchmark datasets. The first three are textual datasets and the last three are tabular ones. Specifically, Polarity [Maas et al., 2011] contains highly polar movie reviews and the task is to classify their sentiment. Subjectivity [Pang and Lee, 2004] includes processed sentences that are labeled as either subjective or objective. 20 Newsgroup2 is a collection of news articles. (2http://qwone.com/ jason/20Newsgroups/) and Auto MPG concerns predicting fuel consumption based on attributes of cars. Wine Quality predicts wine quality based on physicochemical tests. Communities enables predicting community crimes based on socio-economic data. |
| Dataset Splits | Yes | Train Valid/Test Features in Table 1 shows specific numbers for each, e.g., Polarity (TE) 7,000 1,500 43,548 and We train f and learn explanations on the training set, tune hyperparameters by using the validation set, and evaluate explanations on the test set. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running the experiments. |
| Software Dependencies | No | The paper mentions models like BERT and SVR, but does not provide specific version numbers for any software libraries or dependencies used in the experiments. |
| Experiment Setup | Yes | If not specifically mentioned, K is set to 20 for large datasets (Polarity and Subjectivity), 10 for middle-sized datasets (20 Newsgroup, Wine Quality, Communities), and 4 for small datasets (Auto MPG). We ensure that all explanations have the same number of nonzero features (5 for tabular data and 50 for textual datasets). |