High Fidelity Image Counterfactuals with Probabilistic Causal Models

Authors: Fabio De Sousa Ribeiro, Tian Xia, Miguel Monteiro, Nick Pawlowski, Ben Glocker

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments demonstrate that our proposed mechanisms are capable of accurate abduction and estimation of direct, indirect and total effects as measured by axiomatic soundness of counterfactuals. We present 3 case studies on counterfactual inference of high-dimensional structured variables. To quantitatively evaluate our deep SCMs, we measure effectiveness and composition, which are axiomatic properties of counterfactuals that hold true in all causal models (Pearl, 2009; Monteiro et al., 2023).
Researcher Affiliation Collaboration 1Imperial College London 2Microsoft Research Cambridge, UK.
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks. It describes methods in prose.
Open Source Code Yes We present 3 case studies on counterfactual inference of high-dimensional structured variables1. 1https://github.com/biomedia-mira/causal-gen
Open Datasets Yes For our Morpho-MNIST experiments, we construct a similar scenario to Pawlowski et al. (2020) using the Morpho-MNIST (Castro et al., 2019) dataset. We randomly split the full dataset into subsets of 19466 training, 3500 validation and 3500 test samples. We further extend the proposed approach to the MIMICCXR dataset (Johnson et al., 2019). Finally, we split the dataset into 62,336 subjects for training, 9,968 for validation and 30,535 for testing.
Dataset Splits Yes We randomly split the full dataset into subsets of 19466 training, 3500 validation and 3500 test samples. We further ensure no overlapping subjects between the training and evaluation datasets exist. Finally, we split the dataset into 62,336 subjects for training, 9,968 for validation and 30,535 for testing.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU models, CPU types, or memory) used for running the experiments.
Software Dependencies No The paper mentions "Pyro and Pytorch" and "Torchvision", but does not specify their version numbers.
Experiment Setup Yes We trained our HVAEs for 1M steps using a batch size of 32 and the Adam W optimizer (Loshchilov & Hutter, 2017). We used an initial learning rate of 1e-3 with 100 linear warmup steps, β1 = 0.9, β2 = 0.9 and a weight decay of 0.01. We set gradient clipping to 350 and set a gradient update skipping threshold of 500 (based on L2 norm). For data-augmentation, we applied zero-padding of 4 on all borders and random cropped to 32 32 resolution. Pixel intensities we rescaled to [ 1, 1] for and validation/test images were zero-padded to 32 32.