From Causal to Concept-Based Representation Learning

Authors: Goutham Rajendran, Simon Buchholz, Bryon Aragam, Bernhard Schölkopf, Pradeep Ravikumar

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on synthetic data, multimodal CLIP models and large language models supplement our results and show the utility of our approach.
Researcher Affiliation Academia 1Machine Learning Dept., Carnegie Mellon University, Pittsburgh, USA 2Max Planck Institute for Intelligent Systems, Tübingen, Germany 3University of Chicago, Chicago, USA 4 ELLIS Institute, Tübingen, Germany
Pseudocode Yes Algorithm 1: Rejection sampling for controllable generative modeling
Open Source Code Yes The codes, along with instructions on how to run them, is attached in supplementary material.
Open Datasets Yes We embed images from the 3d-Shapes Dataset [16] with known factors of variation into the latent space of two different pretrained CLIP models.
Dataset Splits No We split the embedded images in to training and test sets of equal size. The paper only mentions training and test sets and does not provide an explicit validation set split or its percentages/counts.
Hardware Specification Yes The preprocessing to calculate the CLIP image embeddings required few hours on a A100-GPU... We train for 100 epochs, on a single A6000 GPU... The experiments are performed on eight A6000 GPUs.
Software Dependencies No We use the open-source large language model LLa MA [119] with 7 billion parameters (open sourced version from Hugging Face) and the sentence transformer SBERT [97] for the sentence embedding. The paper mentions software names but does not provide specific version numbers for these software components.
Experiment Setup Yes For the contrastive algorithm, we choose the architecture to either be linear or nonlinear with a 2-layer MLP with 32 hidden neurons in each layer... We train for 100 epochs... with η = 0.0001 and use Adam optimizer with learning rates 0.5 for the parametric layer and 0.005 for the non-parametric layer, with a Cosine Annealing schedule [72].