reproducibilityindex.ai

Robust and Stable Black Box Explanations

Authors: Himabindu Lakkaraju, Nino Arsov, Osbert Bastani

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental evaluation with realworld and synthetic datasets demonstrates that our approach substantially improves robustness of explanations without sacriﬁcing their ﬁdelity on the original data distribution.
Researcher Affiliation	Academia	1Harvard University 2Macedonian Academy of Arts & Sciences 3University of Pennsylvania.
Pseudocode	No	The paper discusses algorithmic approaches but does not present them in structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any statement or link regarding the availability of its source code.
Open Datasets	Yes	We analyzed three real-world datasets from criminal justice, healthcare, and education domains (Lakkaraju et al., 2016). Our ﬁrst dataset contains bail outcomes from two different state courts in the U.S. 1990-2009. It includes criminal history, demographic attributes, information about current offenses, and other details on 31K defendants who were released on bail. Our second dataset contains academic performance records of about 19K students who were set to graduate high school in 2012 from two different school districts in the U.S. It includes information about grades, absence rates, suspensions, and tardiness scores from grades 6 to 8 for each of these students. Our third dataset contains electronic health records of about 22K patients who visited hospitals in two different counties in California between 2010-2012. It includes demographic information, symptoms, current and past medical conditions, and family history of each patient. Each patient is assigned a class label which indicates whether the patient has been diagnosed with diabetes.
Dataset Splits	No	The paper mentions using training data and 'shifted data' for evaluation, but does not provide specific train/validation/test splits (e.g., percentages or counts) or explicitly define a validation set for hyperparameter tuning.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers used for the experiments.
Experiment Setup	Yes	In case of LIME, SHAP, ROPE logistic multi, and ROPE dset multi, there is a parameter K which corresponds to the number of local explanations that need to be generated; K can also be thought as the number of subgroups in the data. We use Bayesian Information Criterion (BIC) to choose K. For a given dataset, we use the same K for all these techniques to ensure they construct explanations of the same size. For MUSE, we set all the parameters using the procedure in Lakkaraju et al. (2019b); to ensure these explanations are similar in size to the others, we ﬁx the number of outer rules to be K. Finally, when using ROPE to construct rule-based explanations, there is a term λ in our objective (Eq. 10); we ﬁx λ = 5. ... Here, we present results for a 5-layer DNN...