A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms
Authors: Yoshua Bengio, Tristan Deleu, Nasim Rahaman, Nan Rosemary Ke, Sebastien Lachapelle, Olexa Bilaniuk, Anirudh Goyal, Christopher Pal
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments in the two-variable case validate the proposed ideas and theoretical results. |
| Researcher Affiliation | Academia | Yoshua Bengio1, 2, 5 Tristan Deleu1 Nasim Rahaman4 Nan Rosemary Ke3 Sébastien Lachapelle1 Olexa Bilaniuk1 Anirudh Goyal1 Christopher Pal3, 5 Mila Montreal, Quebec, Canada 1 Université de Montréal, 2 CIFAR Senior Fellow, 3 École Polytechnique Montréal, 4 Max-Planck Institute for Intelligent Systems, Tübingen, 5 Canada CIFAR AI Chair |
| Pseudocode | Yes | Algorithm 1 Meta-learning algorithm for learning the structural parameter |
| Open Source Code | Yes | The source code for the experiments is available here: https://bit.ly/2M6X1al. |
| Open Datasets | Yes | In order to get a set of initial parameters, we first train all 4 modules on a training distribution (p in the main text). This distribution corresponds to a fixed choice of π(1) A and πB|a (for all N possible values of a). The superscript in π(1) A emphasizes the fact that this defines the distribution prior to an intervention, with the mechanism p(B | A) being unchanged by the intervention. These probability vectors are sampled randomly from a uniform Dirichlet distribution: π(1) A Dirichlet(1N) (65) πB|a Dirichlet(1N) a [1, N]. (66) A pµ(A) = N(µ, σ2 = 4) (70) B := f(A) + NB NB N(0, 1), |
| Dataset Splits | No | The paper does not provide specific training/validation/test dataset splits needed for reproduction. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | In our experiment, all the MLPs have only one hidden layer with H = 8 hidden units, with a Re LU non-linearity, and the output layer has a softmax non-linearity. The conditional distributions p(B | A) and p(A | B) are parametrized as 2-layer Mixture Density Networks (MDNs; Bishop, 1994), with 32 hidden units and 10 components. The marginal distributions p(A) and p(B) are parametrized as Gaussian Mixture Models (GMMs), also with 10 components. In our experiment, we used T = 20 datapoints. In our experiment, d = 100. In our experiment, θD = π/4 is fixed for all our observation and intervention datasets. We choose K = 8 points in our experiments. |