Invariant Causal Representation Learning for Out-of-Distribution Generalization

Authors: Chaochao Lu, Yuhuai Wu, José Miguel Hernández-Lobato, Bernhard Schölkopf

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on both synthetic and real-world datasets demonstrate that our approach outperforms a variety of baseline methods. 5 EXPERIMENTS We compare our approach with a variety of methods on both synthetic and real-world datasets. In all comparisons, unless stated otherwise, we average performance over ten runs.
Researcher Affiliation Collaboration 1University of Cambridge, 2MPI for Intelligent Systems, 3Stanford University, 4Google Research, 5The Alan Turing Institute
Pseudocode Yes Algorithm 1: Invariant Causal Representation Learning (i Ca RL) Phase 1: We first learn a NF-i VAE model, including the decoder and its corresponding encoder, by optimizing the objective function in (10) on the data {X, Y , E}. Then, we use the mean of the NF-i VAE encoder to infer the latent variables Z from observations {X, Y , E}. The latent variables are guaranteed to be identified up to a permutation and simple transformation. Phase 2: After inferring Z, we first conduct the PC algorithm to learn a Markov equivalence class of DAGs, and then discover direct causes (parents) of Y among its neighbors by testing all pairs of latent variables with (conditional) independence testing, i.e., finding a set of latent variables in which each pair of Zi and Zj satisfies that the dependency between them increases after additionally conditioning on Y . Phase 3: Having obtained Pa(Y ), we can solve (11) to learn the invariant classifier w. When in a new environment, we first infer Pa(Y ) from X by solving (12) and then leverage the learned w for prediction.
Open Source Code No The paper does not include an explicit statement about releasing source code for the described methodology or a link to a code repository.
Open Datasets Yes We use the exact same environment as in Arjovsky et al. (2019). Arjovsky et al. (2019) propose to create an environment for training to classify digits in MNIST data. We modify the fashion MNIST dataset in a manner similar to the MNIST digits dataset. We also report the results on one of the widely used realistic datasets for OOD generalization: VLCS (Fang et al., 2013). We report the results on another one of the widely used realistic datasets for OOD generalization: PACS (Li et al., 2017a).
Dataset Splits Yes There are three environments (two training containing 30,000 points each, one test containing 10,000 points) We add noise to the preliminary label ( y = 0 if the digit is between 0-4 and y = 1 if the digit is between 5-9) by flipping it with 25 percent probability to construct the final labels. We used the exact experimental setting that is described in Gulrajani & Lopez-Paz (2020). Specifically, we trained our model over all possible train and test environment combination for one of the commonly used hyper-parameter tuning procedure: train domain validation.
Hardware Specification No The paper does not provide specific details regarding the hardware used for running the experiments, such as GPU/CPU models or memory specifications.
Software Dependencies No The paper mentions implementing parts of the model using PyTorch (Paszke et al., 2019) or TensorFlow (Abadi et al., 2015), but it does not specify the exact version numbers of these software dependencies used for the experiments.
Experiment Setup Yes N HYPERPARAMETERS AND ARCHITECTURES In this section, we describe the hyperparameters and architectures of different models used in different experiments. Unless stated otherwise, we have λ1 = 1 and λ2 = 1, both of which are selected on training/validation data. N.1 SYNTHETIC DATA We used Adam optimizer for training with learning rate set to 1e-3 and batch size set to 128. N.2 CMNIST AND CFMNIST For example, the batch size is set to 256, and the learning rate is 10 4.