Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Addressing Leakage in Concept Bottleneck Models
Authors: Marton Havasi, Sonali Parbhoo, Finale Doshi-Velez
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present empirical results showcasing the efficacy of our proposed modifications to concept bottleneck models (CBM). |
| Researcher Affiliation | Academia | Marton Havasi School of Engineering and Applied Sciences Harvard University EMAIL Sonali Parbhoo Department of Electrical Engineering Imperial College London EMAIL Finale Doshi-Velez School of Engineering and Applied Sciences Harvard University EMAIL |
| Pseudocode | Yes | Psudocode for the interventions is shown in Appendix D. |
| Open Source Code | Yes | Our code is available at https://github.com/dtak/addressing-leakage. |
| Open Datasets | Yes | MIMIC-III EWS (Johnson et al., 2016) Caltech-UCSD Birds 2011 (Wah et al., 2011) |
| Dataset Splits | Yes | The dataset contains records from 17,289 patients over a combined N=796,250 time steps (split into 530,802 training and 265,448 test examples, while ensuring that no patient appears in both sets). This dataset contains N=11,788 (5,994 training and 5,794 test) images of 200 bird species native to North America |
| Hardware Specification | Yes | The times are recorded on a V100 GPU. |
| Software Dependencies | No | The paper mentions models and optimizers like Inception v3 network and Adam, but does not specify software dependencies with version numbers (e.g., PyTorch 1.x, TensorFlow 2.x, CUDA version). |
| Experiment Setup | Yes | For MIMIC-III EWS, the concept predictor is a two-layer feed-forward neural network with a hidden layer of size 100, and the label predictor is a two-layer network with hidden layer of size 50. In the autoregressive case, a small, two-layer network (hidden layer size 20) predicts each concept... We use M = 200 Monte-Carlo samples for prediction. For training hyperparameters, see Appendix C. |