Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Label-free Concept Bottleneck Models

Authors: Tuomas Oikarinen, Subhro Das, Lam M. Nguyen, Tsui-Wei Weng

ICLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present three main results on evaluating the accuracy and interpretability of the Label-free CBM in this section. [...] Datasets. To evaluate our approach, we train Label-free CBMs on 5 datasets. These are CIFAR-10, CIFAR-100 (Krizhevsky et al., 2009), CUB (Wah et al., 2011), Places365 (Zhou et al., 2017) and Image Net (Deng et al., 2009).
Researcher Affiliation Collaboration Tuomas Oikarinen UCSD CSE EMAIL Subhro Das MIT-IBM Watson AI Lab, IBM Research EMAIL Lam M. Nguyen IBM Research EMAIL Tsui-Wei Weng UCSD HDSI EMAIL
Pseudocode No The paper describes its method in detailed steps and equations, but it does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block, nor does it present structured code-like procedures.
Open Source Code Yes Our code is available at https://github.com/Trustworthy ML-Lab/Label-free-CBM.
Open Datasets Yes Datasets. To evaluate our approach, we train Label-free CBMs on 5 datasets. These are CIFAR-10, CIFAR-100 (Krizhevsky et al., 2009), CUB (Wah et al., 2011), Places365 (Zhou et al., 2017) and Image Net (Deng et al., 2009).
Dataset Splits Yes We optimize L(Wc) using the Adam optimizer on training data D, with early stopping when similarity on validation data starts to decrease. Finally to make sure our concepts are truthful, we drop all concepts j with sim(tj, qj) < 0.45 on validation data after training Wc.
Hardware Specification Yes All models are trained on a single Nvidia Tesla P100 GPU, and the full training run takes anywhere from few minutes to 20 hours depending on the dataset size.
Software Dependencies No The paper mentions software components like 'GPT-3', 'Open AI API', 'CLIP Vi T-B/16', 'all-mpnet-base-v2', 'Adam optimizer', and 'GLMSAGA solver', but it does not provide specific version numbers for these software dependencies.
Experiment Setup Yes We optimize L(Wc) using the Adam optimizer on training data D, with early stopping when similarity on validation data starts to decrease. [...] We optimize Equation (2) using the GLMSAGA solver created by (Wong et al., 2021). For the sparse models, we used α = 0.99 and λ was chosen such that each model has 25 to 35 nonzero weights per output class.