Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
When Stability meets Sufficiency: Informative Explanations that do not Overwhelm
Authors: Ronny Luss, Amit Dhurandhar
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate these claims, both qualitatively and quantitatively, with experiments that show the benefit of PSEM across three modalities (image, tabular and text) as well as versus other path explanations. A user study depicts the strength of the method in communicating the local behavior, where (many) users are able to correctly determine the prediction made by a model. |
| Researcher Affiliation | Industry | Ronny Luss EMAIL IBM Research, Yorktown Heights Amit Dhurandhar EMAIL IBM Research, Yorktown Heights |
| Pseudocode | Yes | Algorithm 1 Path-Sufficient Explanations Method (PSEM) |
| Open Source Code | No | The text states: "The PSEM implementation adapts CEM-PP code from https://github.com/IBM/AIX360." This indicates they adapted existing open-source code but does not explicitly state that their specific PSEM implementation or its modifications are made publicly available or provide a direct link to their own code repository. |
| Open Datasets | Yes | The HELOC dataset FICO (2018) contains credit applicant data... The Celeb A (Liu et al., 2015) dataset contains images... The MNIST dataset is comprised of handwritten digit images... The 20 Newsgroups dataset contains text documents... |
| Dataset Splits | No | The paper mentions various datasets (HELOC, Celeb A, MNIST, 20 Newsgroups) and notes test accuracy for some models, but it does not provide specific train/test/validation split percentages, sample counts, or references to predefined splits needed to reproduce the data partitioning for any of the datasets. |
| Hardware Specification | No | All experiments used 1 GPU and up to 16 GB RAM. |
| Software Dependencies | No | The paper mentions adapting CEM-PP code and implementing IR, but it does not specify version numbers for any key software components or libraries (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | Table 5: Parameters used for various experiments Dataset β η N κ MNIST {0.0001, 0.001, 0.01, 0.1, 1.0} 10.0 5 0.75 HELOC {0.00001, 0.0001, 0.001, 0.01, 0.1} 30.0 5 0.2 Celeb A {0.001, 0.005, 0.01, 0.05} 0.01 4 0.02 20 Newsgroups {0.0001, 0.0005, 0.001, 0.005, .1} 50.0 5 0.5 |