Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Label-free Concept Bottleneck Models
Authors: Tuomas Oikarinen, Subhro Das, Lam M. Nguyen, Tsui-Wei Weng
ICLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present three main results on evaluating the accuracy and interpretability of the Label-free CBM in this section. [...] Datasets. To evaluate our approach, we train Label-free CBMs on 5 datasets. These are CIFAR-10, CIFAR-100 (Krizhevsky et al., 2009), CUB (Wah et al., 2011), Places365 (Zhou et al., 2017) and Image Net (Deng et al., 2009). |
| Researcher Affiliation | Collaboration | Tuomas Oikarinen UCSD CSE EMAIL Subhro Das MIT-IBM Watson AI Lab, IBM Research EMAIL Lam M. Nguyen IBM Research EMAIL Tsui-Wei Weng UCSD HDSI EMAIL |
| Pseudocode | No | The paper describes its method in detailed steps and equations, but it does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block, nor does it present structured code-like procedures. |
| Open Source Code | Yes | Our code is available at https://github.com/Trustworthy ML-Lab/Label-free-CBM. |
| Open Datasets | Yes | Datasets. To evaluate our approach, we train Label-free CBMs on 5 datasets. These are CIFAR-10, CIFAR-100 (Krizhevsky et al., 2009), CUB (Wah et al., 2011), Places365 (Zhou et al., 2017) and Image Net (Deng et al., 2009). |
| Dataset Splits | Yes | We optimize L(Wc) using the Adam optimizer on training data D, with early stopping when similarity on validation data starts to decrease. Finally to make sure our concepts are truthful, we drop all concepts j with sim(tj, qj) < 0.45 on validation data after training Wc. |
| Hardware Specification | Yes | All models are trained on a single Nvidia Tesla P100 GPU, and the full training run takes anywhere from few minutes to 20 hours depending on the dataset size. |
| Software Dependencies | No | The paper mentions software components like 'GPT-3', 'Open AI API', 'CLIP Vi T-B/16', 'all-mpnet-base-v2', 'Adam optimizer', and 'GLMSAGA solver', but it does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | We optimize L(Wc) using the Adam optimizer on training data D, with early stopping when similarity on validation data starts to decrease. [...] We optimize Equation (2) using the GLMSAGA solver created by (Wong et al., 2021). For the sparse models, we used α = 0.99 and λ was chosen such that each model has 25 to 35 nonzero weights per output class. |