A Closer Look at the Intervention Procedure of Concept Bottleneck Models
Authors: Sungbin Shin, Yohan Jo, Sungsoo Ahn, Namhoon Lee
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We verify our findings through comprehensive evaluations, not only on the standard real datasets, but also on synthetic datasets that we generate based on a set of different causal graphs. We experiment with three datasets: (1) CUB (Wah et al., 2011) the standard dataset used to study CBMs, (2) Skin Con (Daneshjou et al., 2022b) a medical dataset used to build interpretable models, and (3) Synthetic the synthetic datasets we generate based on different causal graphs to conduct a wide range of controlled experiments. |
| Researcher Affiliation | Collaboration | 1POSTECH, South Korea 2Amazon Alexa AI, USA. |
| Pseudocode | Yes | Algorithm 1 Generating synthetic data |
| Open Source Code | Yes | Our code is available at https://github.com/ssbin4/Closer-Intervention-CBM. |
| Open Datasets | Yes | CUB (Wah et al., 2011) is the standard dataset used to study CBMs in the previous works (Koh et al., 2020; Zarlenga et al., 2022; Havasi et al., 2022; Sawada & Nakamura, 2022). Skin Con (Daneshjou et al., 2022b) is a medical dataset which can be used to build interpretable machine learning models. |
| Dataset Splits | Yes | Since training and test sets are not specified in the Skin Con dataset, we randomly split the dataset into 70%, 15%, 15% of training, validation, and test sets respectively. We randomly divide the generated examples into 70% of training sets, 15% of validation sets, and 15% of test sets. |
| Hardware Specification | Yes | τi ≈ 0.7 and τg ≈ 18.7 × 10−3 and τf ≈ 0.03 × 10−3 are acquired by measuring the inference time with RTX 3090 GPU and taking the average of 300 repetitions. |
| Software Dependencies | No | The paper mentions using Inception-v3, but it does not specify specific software versions for libraries or frameworks used (e.g., Python version, PyTorch version, etc.) beyond the model architecture. |
| Experiment Setup | Yes | We used λ = 0.01 for JNT and JNT+P whose values were directly taken from Koh et al. (2020). For the experiments without majority voting (Figure 30 in Appendix H), we use Inceptionv3 pretrained on the Imagenet for g and 2-layer MLP for f with the dimensionality of 200 so that it can describe more complex functions. We searched the best hyperparameters for both g and f over the same sets of values as in Koh et al. (2020). Specifically, we tried initial learning rates of [0.01, 0.001], constant learning rate and decaying the learning rate by 0.1 every [10, 15, 20] epoch, and the weight decay of [0.0004, 0.00004]. |