Towards a Learning Theory of Cause-Effect Inference

Authors: David Lopez-Paz, Krikamol Muandet, Bernhard Schölkopf, Iliya Tolstikhin

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct an array of experiments to test the effectiveness of a simple implementation of the presented causal learning framework. Given the use of random embeddings (14) in our classifier, we term our method the Randomized Causation Coefficient (RCC). Throughout our simulations, we featurize each sample S = {(xi, yi)}n i=1 as ν(S) = (µk,m(PSx), µk,m(PSy), µk,m(PSxy)), (15) where the three elements forming (15) stand for the lowdimensional representations (14) of the empirical kernel mean embeddings of {xi}n i=1, {yi}n i=1, and {(xi, yi)}n i=1, respectively. The representation (15) is motivated by the typical conjecture in causal inference about the existence of asymmetries between the marginal and conditional distributions of causally-related pairs of random variables (Sch olkopf et al., 2012). Each of these three embeddings has random features sampled to approximate the sum of three Gaussian kernels (2) with hyper-parameters 0.1γ, γ, and 10γ, where γ is found using the median heuristic. In practice, we set m = 1000, and observe no significant improvements when using larger amounts of random features. To classify the embeddings (15) in each of the experiments, we use the random forest3 implementation from Python s sklearn-0.16-git. The number of trees is chosen from {100, 250, 500, 1000, 5000} via cross-validation.
Researcher Affiliation Collaboration David Lopez-Paz1,2 DAVID@LOPEZPAZ.ORG Krikamol Muandet1 KRIKAMOL@TUEBINGEN.MPG.DE Bernhard Sch olkopf1 BS@TUEBINGEN.MPG.DE Ilya Tolstikhin1 ILYA@TUEBINGEN.MPG.DE 1Max-Planck-Institute for Intelligent Systems 2University of Cambridge
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Our experiments can be replicated using the source code at https://github.com/lopezpaz/causation_learning_theory.
Open Datasets Yes The Tübingen cause-effect pairs is a collection of heterogeneous, hand-collected, real-world cause-effect samples (Zscheischler, 2014). URL http://webdav.tuebingen.mpg.de/cause-effect/. ... The cause-effect challenges organized by Guyon (2014) provided N = 16, 199 training causal samples Si, each drawn from the distribution of Xi Yi, and labeled either Xi Yi , Xi Yi , Xi Zi Yi , or Xi Yi . URL https://www.codalab.org/ competitions/1381.
Dataset Splits Yes The number of trees is chosen from {100, 250, 500, 1000, 5000} via cross-validation.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory) used for running its experiments.
Software Dependencies Yes To classify the embeddings (15) in each of the experiments, we use the random forest3 implementation from Python s sklearn-0.16-git.
Experiment Setup Yes In practice, we set m = 1000, and observe no significant improvements when using larger amounts of random features. To classify the embeddings (15) in each of the experiments, we use the random forest3 implementation from Python s sklearn-0.16-git. The number of trees is chosen from {100, 250, 500, 1000, 5000} via cross-validation. ... Each of these three embeddings has random features sampled to approximate the sum of three Gaussian kernels (2) with hyper-parameters 0.1γ, γ, and 10γ, where γ is found using the median heuristic. ... A cause vector (ˆxij)n j=1 is sampled from a mixture of Gaussians with c components. The mixture weights are sampled from U(0, 1), and normalized to sum to one. The mixture means and standard deviations are sampled from N(0, σ1), and N(0, σ2), respectively, accepting only positive standard deviations. ... A noise vector (ˆϵij)n j=1 is sampled from a centered Gaussian, with variance sampled from U(0, σ3). ... A mapping mechanism ˆfi is conceived as a spline fitted using an uniform grid of df elements from min((ˆxij)n j=1) to max((ˆxij)n j=1) as inputs, and df normally distributed outputs. ... We set n = 1000, and N = 10, 000.