Disentanglement of Latent Representations via Causal Interventions

Authors: Gaël Gendron, Michael Witbrock, Gillian Dobbie

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test our method on standard synthetic and real-world disentanglement datasets. We show that it can effectively disentangle the factors of variation and perform precise interventions on high-level semantic attributes of an image without affecting its quality, even with imbalanced data distributions.
Researcher Affiliation Academia Gaël Gendron , Michael Witbrock , Gillian Dobbie University of Auckland ggen187@aucklanduni.ac.nz, m.witbrock@auckland.ac.nz, g.dobbie@auckland.ac.nz
Pseudocode No The paper includes figures illustrating the architecture and modes of inference (Figure 2), but it does not present any explicit pseudocode or algorithm blocks labeled as such.
Open Source Code Yes Our code and data are available here: https://github.com/Strong-AI-Lab/ct-vae.
Open Datasets Yes The Cars3D dataset [Reed et al., 2015] contains 3D CAD models of cars with 3 factors of variation: the type of the car, camera elevation, and azimuth. The Shapes3D dataset [Kim and Mnih, 2018] contains generated scenes representing an object standing on the floor in the middle of a room with four walls. The scene contains 6 factors of variation: the floor, wall and object colours, the scale and shape of the object, and the orientation of the camera in the room. The Sprites dataset [Reed et al., 2015] contains images of animated characters. There are 9 variant factors corresponding to character attributes such as hair or garments. The DSprites dataset [Higgins et al., 2017] contains 2D sprites generated based on 6 factors: the colour, shape, and scale of the sprite, the location of the sprite with x and y coordinates, and the rotation of the sprite. All the datasets described above are synthetic, and all of their generative factors of variation are labelled. We also apply our model to real-world data. The Celeb A dataset [Liu et al., 2015] is a set of celebrity faces labelled with 40 attributes including gender, hair colour, and age.
Dataset Splits No The paper mentions the datasets used and how transitions are built from them, but it does not specify any explicit training, validation, or test dataset splits (e.g., percentages or sample counts) needed to reproduce the experiment.
Hardware Specification No The paper does not provide specific details about the hardware used for the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies or libraries used (e.g., Python, PyTorch, TensorFlow, specific solvers).
Experiment Setup No The paper describes the two-stage training process (pre-training MCQ-VAE, then training CT layer) and the three operating modes, but it does not provide specific experimental setup details such as hyperparameter values (learning rates, batch sizes, number of epochs) or optimizer settings.