Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Stochastic Concept Bottleneck Models
Authors: Moritz Vandenhirtz, Sonia Laguna, Ričards Marcinkevičs, Julia Vogt
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show empirically on synthetic tabular and natural image datasets that our approach improves intervention effectiveness significantly. Notably, we showcase the versatility and usability of SCBMs by examining a setting with CLIP-inferred concepts, alleviating the need for manual concept annotations. |
| Researcher Affiliation | Academia | Moritz Vandenhirtz , Sonia Laguna , Riˇcards Marcinkeviˇcs, Julia E. Vogt Department of Computer Science ETH Zurich Switzerland |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1The code is available here: https://github.com/mvandenhi/SCBM. |
| Open Datasets | Yes | As a natural image classification benchmark, we evaluate on the Caltech-UCSD Birds-200-2011 dataset (Wah et al., 2011), comprised of bird photographs from 200 distinct classes. [...] Additionally, we explore another natural image classification task on CIFAR-10 (Krizhevsky et al., 2009) with 10 classes. |
| Dataset Splits | Yes | We set N = 50,000, p = 1,500, and C = 100, with a 60%-20%-20% train-validation-test split. |
| Hardware Specification | Yes | Resource Usage For the experiments of the main paper, we used a cluster of mostly Ge Force RTX 2080s with 2 CPU workers. |
| Software Dependencies | Yes | All methods were implemented using Py Torch (v 2.1.1) (Ansel et al., 2024). |
| Experiment Setup | Yes | All models are trained for 150 epochs for the synthetic and 300 epochs for the natural image datasets with the Adam optimizer (Kingma & Ba, 2015) with a learning rate of 10 4 and a batch size of 64. |