reproducibilityindex.ai

Learning Groupwise Explanations for Black-Box Models

Authors: Jingyue Gao, Xiting Wang, Yasha Wang, Yulan Yan, Xing Xie

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on six datasets demonstrate the effectiveness of our method. and Finally, we conduct both quantitative experiments and experiments with real users to demonstrate the effectiveness of our method.
Researcher Affiliation	Collaboration	1Peking University 2Microsoft Research Asia 3Microsoft {gaojingyue1997, wangyasha}@pku.edu.cn, {xitwan, yulanyan, xing.xie}@microsoft.com
Pseudocode	No	The paper describes the GIME framework and its optimization process in Section 4 but does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Source code: https://github.com/jygao97/GIME and Codes are provided in the supplementary material to facilitate reproduction of the experimental results.
Open Datasets	Yes	Datasets. We use six real-world benchmark datasets. The ﬁrst three are textual datasets and the last three are tabular ones. Speciﬁcally, Polarity [Maas et al., 2011] contains highly polar movie reviews and the task is to classify their sentiment. Subjectivity [Pang and Lee, 2004] includes processed sentences that are labeled as either subjective or objective. 20 Newsgroup2 is a collection of news articles. (2http://qwone.com/ jason/20Newsgroups/) and Auto MPG concerns predicting fuel consumption based on attributes of cars. Wine Quality predicts wine quality based on physicochemical tests. Communities enables predicting community crimes based on socio-economic data.
Dataset Splits	Yes	Train Valid/Test Features in Table 1 shows specific numbers for each, e.g., Polarity (TE) 7,000 1,500 43,548 and We train f and learn explanations on the training set, tune hyperparameters by using the validation set, and evaluate explanations on the test set.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running the experiments.
Software Dependencies	No	The paper mentions models like BERT and SVR, but does not provide specific version numbers for any software libraries or dependencies used in the experiments.
Experiment Setup	Yes	If not speciﬁcally mentioned, K is set to 20 for large datasets (Polarity and Subjectivity), 10 for middle-sized datasets (20 Newsgroup, Wine Quality, Communities), and 4 for small datasets (Auto MPG). We ensure that all explanations have the same number of nonzero features (5 for tabular data and 50 for textual datasets).