Reliable Post hoc Explanations: Modeling Uncertainty in Explainability

Authors: Dylan Slack, Anna Hilgard, Sameer Singh, Himabindu Lakkaraju

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental evaluation with multiple real world datasets and user studies demonstrate that the efficacy of the proposed framework.
Researcher Affiliation Academia Dylan Slack UC Irvine dslack@uci.edu Sophie Hilgard Harvard University ash798@g.harvard.edu Sameer Singh UC Irvine sameer@uci.edu Himabindu Lakkaraju Harvard University hlakkaraju@hbs.edu
Pseudocode Yes Algorithm 1 Focused sampling for local explanations
Open Source Code Yes Project Page: https://dylanslacks.website/reliable/index.html
Open Datasets Yes Our first structured dataset is COMPAS [27]... The second structured dataset is the German Credit dataset from the UCI repository [28]... We also include popular image datasets MNIST [29] and Imagenet [30].
Dataset Splits No We create 80/20 train/test splits for these two datasets [COMPAS, German Credit]... The paper does not explicitly state a separate validation split or the methodology for one.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory, cloud resources) used to run the experiments.
Software Dependencies No We train a random forest classifier (sklearn implementation with 100 estimators)... The paper mentions 'sklearn' but does not specify a version number for this or any other software dependency.
Experiment Setup Yes We create 80/20 train/test splits... and train a random forest classifier (sklearn implementation with 100 estimators)... For the MNIST... we train a 2-layer CNN... For generating explanations, we use standard implementations of the baselines LIME and Kernel SHAP with default settings... For our framework, the desired level of certainty is expressed as the width of the 95% credible interval... We use S = 200 as the initial number of perturbations... During focused sampling, we set the batch size B to 50.