Robust and Stable Black Box Explanations
Authors: Himabindu Lakkaraju, Nino Arsov, Osbert Bastani
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental evaluation with realworld and synthetic datasets demonstrates that our approach substantially improves robustness of explanations without sacrificing their fidelity on the original data distribution. |
| Researcher Affiliation | Academia | 1Harvard University 2Macedonian Academy of Arts & Sciences 3University of Pennsylvania. |
| Pseudocode | No | The paper discusses algorithmic approaches but does not present them in structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement or link regarding the availability of its source code. |
| Open Datasets | Yes | We analyzed three real-world datasets from criminal justice, healthcare, and education domains (Lakkaraju et al., 2016). Our first dataset contains bail outcomes from two different state courts in the U.S. 1990-2009. It includes criminal history, demographic attributes, information about current offenses, and other details on 31K defendants who were released on bail. Our second dataset contains academic performance records of about 19K students who were set to graduate high school in 2012 from two different school districts in the U.S. It includes information about grades, absence rates, suspensions, and tardiness scores from grades 6 to 8 for each of these students. Our third dataset contains electronic health records of about 22K patients who visited hospitals in two different counties in California between 2010-2012. It includes demographic information, symptoms, current and past medical conditions, and family history of each patient. Each patient is assigned a class label which indicates whether the patient has been diagnosed with diabetes. |
| Dataset Splits | No | The paper mentions using training data and 'shifted data' for evaluation, but does not provide specific train/validation/test splits (e.g., percentages or counts) or explicitly define a validation set for hyperparameter tuning. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers used for the experiments. |
| Experiment Setup | Yes | In case of LIME, SHAP, ROPE logistic multi, and ROPE dset multi, there is a parameter K which corresponds to the number of local explanations that need to be generated; K can also be thought as the number of subgroups in the data. We use Bayesian Information Criterion (BIC) to choose K. For a given dataset, we use the same K for all these techniques to ensure they construct explanations of the same size. For MUSE, we set all the parameters using the procedure in Lakkaraju et al. (2019b); to ensure these explanations are similar in size to the others, we fix the number of outer rules to be K. Finally, when using ROPE to construct rule-based explanations, there is a term λ in our objective (Eq. 10); we fix λ = 5. ... Here, we present results for a 5-layer DNN... |