Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Concept Embedding Models: Beyond the Accuracy-Explainability Trade-Off

Authors: Mateo Espinosa Zarlenga, Pietro Barbiero, Gabriele Ciravegna, Giuseppe Marra, Francesco Giannini, Michelangelo Diligenti, Zohreh Shams, Frederic Precioso, Stefano Melacci, Adrian Weller, Pietro Lió, Mateja Jamnik

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments demonstrate that Concept Embedding Models (1) attain better or competitive task accuracy w.r.t. standard neural models without concepts, (2) provide concept representations capturing meaningful semantics including and beyond their ground truth labels, (3) support test-time concept interventions whose effect in test accuracy surpasses that in standard concept bottleneck models, and (4) scale to real-world conditions where complete concept supervisions are scarce.
Researcher Affiliation Collaboration Mateo Espinosa Zarlenga University of Cambridge EMAIL Pietro Barbiero University of Cambridge EMAIL Gabriele Ciravegna Université Côte d Azur, Inria, CNRS, I3S, Maasai, Nice, France EMAIL Giuseppe Marra KU Leuven EMAIL Francesco Giannini University of Siena EMAIL Michelangelo Diligenti University of Siena EMAIL Zohreh Shams Babylon Health University of Cambridge EMAIL Frederic Precioso Université Côte d Azur, Inria, CNRS, I3S, Maasai, Nice, France EMAIL Stefano Melacci University of Siena EMAIL Adrian Weller University of Cambridge Alan Turing Institute EMAIL Pietro Lio University of Cambridge EMAIL Mateja Jamnik University of Cambridge EMAIL
Pseudocode No The paper describes the architecture and methods in prose and diagrams (e.g., Figure 2) but does not include formal pseudocode or an algorithm block.
Open Source Code Yes We uploaded a zip file with our code and documentation in the supplemental material and made our code available in a public repository4. 4https://github.com/mateoespinosa/cem/
Open Datasets Yes Furthermore, we evaluate our methods on two real-world image tasks: the Caltech-UCSD Birds-200-2011 dataset (CUB, [16]), preprocessed as in [9], and the Large-scale Celeb Faces Attributes dataset (Celeb A, [25]).
Dataset Splits Yes For CUB, we use a 70/15/15 train/val/test split and for Celeb A, a 80/10/10 train/val/test split.
Hardware Specification Yes Computational resources We used a NVIDIA DGX Station with 8 NVIDIA V100 GPUs and 256GB of RAM.
Software Dependencies Yes We implemented our models in PyTorch [39] (version 1.10.0), using scikit-learn [40] (version 1.0.2) for the k-Medoids clustering algorithm.
Experiment Setup Yes All models were trained for 250 epochs using Adam [38] with a learning rate of 1e-3 and a batch size of 256.