Learning to Receive Help: Intervention-Aware Concept Embedding Models
Authors: Mateo Espinosa Zarlenga, Katie Collins, Krishnamurthy Dvijotham, Adrian Weller, Zohreh Shams, Mateja Jamnik
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that Int CEMs significantly outperform state-of-the-art concept-interpretable models when provided with test-time concept interventions, demonstrating the effectiveness of our approach. In this section, we evaluate Int CEMs by exploring the following research questions: |
| Researcher Affiliation | Collaboration | Mateo Espinosa Zarlenga University of Cambridge me466@cam.ac.uk Katherine M. Collins University of Cambridge kmc61@cam.ac.uk Krishnamurthy (Dj) Dvijotham Google Deep Mind dvij@google.com Adrian Weller University of Cambridge Alan Turing Institute aw665@cam.ac.uk Zohreh Shams University of Cambridge zs315@cam.ac.uk Mateja Jamnik University of Cambridge mateja.jamnik@cl.cam.ac.uk |
| Pseudocode | No | The paper describes the architecture and training procedure in text and diagrams (e.g., Figure 2) but does not present formally structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | All of our code, including configs and scripts to recreate results shown in this paper, has been released as part of CEM s official public repository found at https://github.com/mateoespinosa/cem. |
| Open Datasets | Yes | We consider five vision tasks: (1) MNIST-Add, a task inspired by the UMNIST dataset [29] where one is provided with 12 MNIST [30] images containing digits in {0, , 9}, as well as their digit labels as concept annotations... (3) CUB [31]... (5) Celeb A, a task on the Celebrity Attributes Dataset [32]... |
| Dataset Splits | Yes | We monitor the validation loss by sampling 20% of the training set to make our validation set for each task. This validation set is also used to select the best-performing models whose results we report in Section 4. |
| Hardware Specification | Yes | All our non-ablation experiments were run in a shared GPU cluster with four Nvidia Titan Xp GPUs and 40 Intel(R) Xeon(R) E5-2630 v4 CPUs (at 2.20GHz) with 125GB of RAM. In contrast, all of our ablation experiments were run on a separate GPU cluster with 4x Nvidia A100-SXM-80GB GPUs, where each GPU is allocated 32 CPUs. |
| Software Dependencies | Yes | Our implementation of Int CEM is built on top of that repository using Py Torch 1.12 [40]... their code was written in Tensor Flow, and as such, we converted it to Py Torch. All numerical plots and graphics have been generated using Matplotlib 3.5... |
| Experiment Setup | Yes | We always train with stochastic gradient descent with batch size B (selected to well-utilise our hardware), initial learning rate rinitial, and momentum 0.9. Following the hyperparameter selection in reported in [5], we use rinitial = 0.01 for all tasks with the exception of the Celeb A task where this is set to rinitial = 0.005. Finally, to help find better local minima in our loss function during training, we decrease the learning rate by a factor of 0.1 whenever the training loss plateaus for 10 epochs. |