reproducibilityindex.ai

Explanations of Black-Box Models based on Directional Feature Interactions

Authors: Aria Masoomi, Davin Hill, Zhonghui Xu, Craig P Hersh, Edwin K. Silverman, Peter J. Castaldi, Stratis Ioannidis, Jennifer Dy

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We apply our bivariate method on Shapley value explanations, and experimentally demonstrate the ability of directional explanations to discover feature interactions. We show the superiority of our method against state-of-the-art on CIFAR10, IMDB, Census, Divorce, Drug, and gene data.
Researcher Affiliation	Academia	1Northeastern University, Department of Electrical and Computer Engineering, Boston, MA, USA. 2Brigham and Women s Hospital, Channing Division of Network Medicine, Boston, MA, USA
Pseudocode	Yes	Algorithm 1 Approximate Graph G with Shapley Sampling Algorithm
Open Source Code	Yes	All source code is publicly available.3 (Footnote 3: https://github.com/davinhill/Bivariate Shapley)
Open Datasets	Yes	We evaluate our methods on COPDGene (Regan et al., 2010), CIFAR10 (Krizhevsky, 2009) and MNIST (Le Cun & Cortes, 2010) image data, IMDB text data, and on three tabular UCI datasets (Drug, Divorce, and Census) (Dua & Graff, 2017).
Dataset Splits	No	Table 3 'Summary of the datasets and models in our investigation' provides 'Train/Test Samples' counts (e.g., '1,641/407' for COPD) but does not specify a separate validation split or the methodology for cross-validation.
Hardware Specification	Yes	All experiments are performed on an internal cluster with Intel Xeon Gold 6132 CPUs and Nvidia Tesla V100 GPUs.
Software Dependencies	No	The paper mentions several software packages and libraries such as Network X, Scikit-Network, kernel SHAP, Pytorch Geometric, NLTK, GloVe, Adam, and XGBoost, but it does not specify version numbers for any of them (e.g., 'We use the package Network X (Schult, 2008)').
Experiment Setup	Yes	The paper provides specific experimental setup details for each dataset and model in Section G.1.3. For example, for COPDGene, it states: 'We use a neural network with 4 fully-connected layers of 200 hidden units, batch normalization, and relu activation. The model is trained using Adam (Kingma & Ba, 2017) with learning rate 10 3 for 800 epochs'.