reproducibilityindex.ai

Discriminative Feature Attributions: Bridging Post Hoc Explainability and Inherent Interpretability

Authors: Usha Bhalla, Suraj Srinivas, Himabindu Lakkaraju

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform extensive experiments on semi-synthetic and real-world datasets, and show that Di ET produces models that (1) closely approximate the original black-box models they are intended to explain, and (2) yield explanations that match approximate ground truths available by construction.
Researcher Affiliation	Academia	Usha Bhalla* Harvard University usha_bhalla@g.harvard.edu Suraj Srinivas* Harvard University ssrinivas@seas.harvard.edu Himabindu Lakkaraju Harvard University hlakkaraju@hbs.edu
Pseudocode	Yes	Algorithm 1 Distractor Erasure Tuning
Open Source Code	Yes	Our code is made public here.
Open Datasets	Yes	Hard MNIST: The first is a harder variant of MNIST... Chest X-ray: Second, we consider a semi-synthetic chest x-ray dataset for pneumonia classification [27]. Celeb A: The last dataset is a subset of Celeb A [28] for hair color classification... Models were trained on the original train/test split given by https://github.com/jayaneetha/colorized-MNIST for Hard MNIST and [27] for the Chest X-ray dataset and with a random 80/20 split for Celeb A.
Dataset Splits	Yes	Models were trained on the original train/test split given by https://github.com/jayaneetha/colorized-MNIST for Hard MNIST and [27] for the Chest X-ray dataset and with a random 80/20 split for Celeb A. ... The original model is trained on 8835 samples from the train split. Di ETis finetuned on 1500 samples from a separate unlabeled validation split.
Hardware Specification	Yes	We ran all experiments on a single A100 80 GB GPU with 32 GB memory.
Software Dependencies	No	The paper mentions software components like 'Adam', 'SGD', and implies the use of deep learning frameworks (e.g., 'ResNet18'). However, it does not specify version numbers for any of these software dependencies, which is required for reproducibility.
Experiment Setup	Yes	Baseline models were trained with Adam for 10 epochs with learning rate 1e 4 and batch size 256. ... The model distillation and data distillation terms are weighted with λ1 = λ2 = 1. ... We learn our masks with SGD (lr=300, batch size = 128) and our robust models with Adam (lr=1e 4, batch size = 128).