Consistency of Causal Inference under the Additive Noise Model
Authors: Samory Kpotufe, Eleni Sgouritsa, Dominik Janzing, Bernhard Schölkopf
ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Current insights into this last question are mostly empirical. Various works (Shimizu et al., 2006; Hoyer et al., 2009; Peters et al., 2011a) have successfully validated procedures based on the CAM (outlined in Section 1.1 below) on a mix of artificial and real-world datasets where the causal structure to be inferred is clear. However, on the theoretical side, it remains unclear whether these procedures can infer causality from samples in general situations where the CAM is identifiable. In the particular case where the functional relation between X and Y is linear, Hyv arinen et al. (2008) proposed a successful method shown to be consistent. Two recent Arxived results, B uhlmann et al. (2013); Nowzohour & B uhlmann (2013), show the consistency of maximum log-likelihood approaches to causal inference under the multi-variable network extension of Peters et al. (2011b). |
| Researcher Affiliation | Academia | Samory Kpotufe SAMORY@TTIC.EDU Toyota Technological Institute-Chicago Eleni Sgouritsa ELENI.SGOURITSA@TUEBINGEN.MPG.DE Max Planck Institute for Intelligent Systems Dominik Janzing DOMINIK.JANZING@TUEBINGEN.MPG.DE Max Planck Institute for Intelligent Systems Bernhard Sch olkopf BS@TUEBINGEN.MPG.DE Max Planck Institute for Intelligent Systems |
| Pseudocode | No | The paper describes a 'meta-procedure' and 'family of inference procedures' in prose, but does not present them in a structured pseudocode or algorithm block. |
| Open Source Code | No | The paper does not provide any statement or link regarding the availability of open-source code for the described methodology. |
| Open Datasets | No | Figure 1 describes the generation of 'Simulated data' as 'Y = b X3 + X + η. X is sampled from a uniform distribution on the open interval ( 2.5, 2.5), while η is sampled as |N|q sign(N) where N is a standard normal.' However, it does not specify public availability or provide access information for this simulated data. |
| Dataset Splits | Yes | Definition 4 (Decoupled-estimation). fn and gn are learned on half of the sample {(Xi, Yi)}n 1, and the Hn (ηY,fn) and Hn (ηX,gn) are learned on the other half of the sample (w.l.o.g. assume n is even). |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU, CPU models, memory) used for running the simulations or experiments. |
| Software Dependencies | No | The paper mentions 'kernel regression (KR)' and 'kernel ridge regression (KRR)' and 'entropy estimation we employ a resubstitution estimate using a kernel density estimator' but does not specify any software names with version numbers. |
| Experiment Setup | Yes | Figure 1. Plots of the difference between the complexity measures (CY X CXY ) for coupled and decoupled-estimation in various scenarios. Simulated data is generated as Y = b X3 + X + η. X is sampled from a uniform distribution on the open interval ( 2.5, 2.5), while η is sampled as |N|q sign(N) where N is a standard normal. b controls the strength of the nonlinearity of the function and q controls the non-Gaussianity of the noise: q = 1 gives a Gaussian, while q > 1 and q < 1 produces super-Gaussian and sub-Gaussian distributions, respectively. For entropy estimation we employ a resubstitution estimate using a kernel density estimator tuned against log-likelihood (Beirlant et al., 1997) and for regression estimator we use kernel regression (KR). For every combination of the parameters, each experiment was repeated 10 times, and average results for (CY X CXY ) are reported along with standard deviation across repetitions. Plot (a): increasing kernel bandwidth of regressor geometrically (by factors of l = 1.5), i.e. decreasing richness of the algorithm. Plot (b): increasing sample size (bandwidth of KR tuned by cross-validation). Plot (c): increasing q, i.e. the tail of the noise is made sharper (KR tuned by cross-validation). |