A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms

Authors: Yoshua Bengio, Tristan Deleu, Nasim Rahaman, Nan Rosemary Ke, Sebastien Lachapelle, Olexa Bilaniuk, Anirudh Goyal, Christopher Pal

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments in the two-variable case validate the proposed ideas and theoretical results.
Researcher Affiliation Academia Yoshua Bengio1, 2, 5 Tristan Deleu1 Nasim Rahaman4 Nan Rosemary Ke3 Sébastien Lachapelle1 Olexa Bilaniuk1 Anirudh Goyal1 Christopher Pal3, 5 Mila Montreal, Quebec, Canada 1 Université de Montréal, 2 CIFAR Senior Fellow, 3 École Polytechnique Montréal, 4 Max-Planck Institute for Intelligent Systems, Tübingen, 5 Canada CIFAR AI Chair
Pseudocode Yes Algorithm 1 Meta-learning algorithm for learning the structural parameter
Open Source Code Yes The source code for the experiments is available here: https://bit.ly/2M6X1al.
Open Datasets Yes In order to get a set of initial parameters, we first train all 4 modules on a training distribution (p in the main text). This distribution corresponds to a fixed choice of π(1) A and πB|a (for all N possible values of a). The superscript in π(1) A emphasizes the fact that this defines the distribution prior to an intervention, with the mechanism p(B | A) being unchanged by the intervention. These probability vectors are sampled randomly from a uniform Dirichlet distribution: π(1) A Dirichlet(1N) (65) πB|a Dirichlet(1N) a [1, N]. (66) A pµ(A) = N(µ, σ2 = 4) (70) B := f(A) + NB NB N(0, 1),
Dataset Splits No The paper does not provide specific training/validation/test dataset splits needed for reproduction.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes In our experiment, all the MLPs have only one hidden layer with H = 8 hidden units, with a Re LU non-linearity, and the output layer has a softmax non-linearity. The conditional distributions p(B | A) and p(A | B) are parametrized as 2-layer Mixture Density Networks (MDNs; Bishop, 1994), with 32 hidden units and 10 components. The marginal distributions p(A) and p(B) are parametrized as Gaussian Mixture Models (GMMs), also with 10 components. In our experiment, we used T = 20 datapoints. In our experiment, d = 100. In our experiment, θD = π/4 is fixed for all our observation and intervention datasets. We choose K = 8 points in our experiments.