Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

When Stability meets Sufficiency: Informative Explanations that do not Overwhelm

Authors: Ronny Luss, Amit Dhurandhar

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate these claims, both qualitatively and quantitatively, with experiments that show the benefit of PSEM across three modalities (image, tabular and text) as well as versus other path explanations. A user study depicts the strength of the method in communicating the local behavior, where (many) users are able to correctly determine the prediction made by a model.
Researcher Affiliation Industry Ronny Luss EMAIL IBM Research, Yorktown Heights Amit Dhurandhar EMAIL IBM Research, Yorktown Heights
Pseudocode Yes Algorithm 1 Path-Sufficient Explanations Method (PSEM)
Open Source Code No The text states: "The PSEM implementation adapts CEM-PP code from https://github.com/IBM/AIX360." This indicates they adapted existing open-source code but does not explicitly state that their specific PSEM implementation or its modifications are made publicly available or provide a direct link to their own code repository.
Open Datasets Yes The HELOC dataset FICO (2018) contains credit applicant data... The Celeb A (Liu et al., 2015) dataset contains images... The MNIST dataset is comprised of handwritten digit images... The 20 Newsgroups dataset contains text documents...
Dataset Splits No The paper mentions various datasets (HELOC, Celeb A, MNIST, 20 Newsgroups) and notes test accuracy for some models, but it does not provide specific train/test/validation split percentages, sample counts, or references to predefined splits needed to reproduce the data partitioning for any of the datasets.
Hardware Specification No All experiments used 1 GPU and up to 16 GB RAM.
Software Dependencies No The paper mentions adapting CEM-PP code and implementing IR, but it does not specify version numbers for any key software components or libraries (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes Table 5: Parameters used for various experiments Dataset β η N κ MNIST {0.0001, 0.001, 0.01, 0.1, 1.0} 10.0 5 0.75 HELOC {0.00001, 0.0001, 0.001, 0.01, 0.1} 30.0 5 0.2 Celeb A {0.001, 0.005, 0.01, 0.05} 0.01 4 0.02 20 Newsgroups {0.0001, 0.0005, 0.001, 0.005, .1} 50.0 5 0.5