Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Causal Discovery with Continuous Additive Noise Models

Authors: Jonas Peters, Joris M. Mooij, Dominik Janzing, Bernhard Schölkopf

JMLR 2014 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide practical algorithms for ﬁnitely many samples, RESIT (regression with subsequent independence test) and two methods based on an independence score. We prove that RESIT is correct in the population setting and provide an empirical evaluation. Keywords: causal inference, structural equation models, additive noise, identiﬁability, causal minimality, Bayesian networks... The following subsections report some empirical performance of the described methods. 5.1 Experiments on Synthetic Data
Researcher Affiliation	Academia	Seminar for Statistics, ETH Zürich Rämistrasse 101, 8092 Zürich Switzerland; Institute for Informatics, University of Amsterdam Postbox 94323, 1090 GH Amsterdam The Netherlands; Institute for Computing and Information Sciences, Radboud University Nijmegen Postbox 9010, 6500 GL Nijmegen The Netherlands; Max Planck Institute for Intelligent Systems Spemannstraße 38, 72076 Tübingen Germany
Pseudocode	Yes	Algorithm 1 Regression with subsequent independence test (RESIT)
Open Source Code	Yes	Code for the proposed methods is provided on the ﬁrst and second author s homepage.
Open Datasets	Yes	We consider recordings of average temperature T, average duration of sunshine DS and the altitude A at 349 German weather stations (Deutscher Wetterdienst, 2008)... We have tested the performance of additive noise models on a collection of various causeeﬀect pairs, an extended version of the Cause-eﬀect pairs data set described in Mooij and Janzing (2010)... The complete data set and a more detailed description of each pair can be obtained from http://webdav.tuebingen.mpg.de/cause-effect.
Dataset Splits	No	For varying sample size n and number of variables p we compare the described methods. For each pair of variables (Xi, Yi), with i = 1, . . . , 86, we test the two possible additive noise models that correspond with the two diﬀerent possible causal directions, Xi Yi and Yi Xi. If we take a decision for all pairs, 72 +/- 6% of the decisions are correct, signiﬁcantly more than random guessing. The paper does not specify traditional training/test/validation splits for its experiments.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments. It mentions synthetic data and real-world data experiments but lacks any explicit mention of CPU, GPU models, memory, or specific computing environments.
Software Dependencies	No	As a regression method we choose linear regression, gam regression (R package mgcv) or Gaussian process regression (R package gptk). We use the GPML toolbox (Rasmussen and Nickisch, 2010). The paper mentions software packages and toolboxes like 'R package mgcv', 'R package gptk', and 'GPML toolbox' but does not specify their version numbers.
Experiment Setup	Yes	For a linear and a nonlinear setting we report the average structural Hamming distance... Given a value of p, we randomly choose an ordering of the variables with respect to the uniform distribution and include each of the p(p 1)/2 possible edges with a probability of 2/(p 1). This results in an expected number of p edges and can be considered as a (modestly) sparse setting. The coeﬃcients βjk are uniformly chosen from [ 2, 0.1] [0.1, 2] and the noise variables Nj are independent and distributed according to Kj sign(Mj) \|Mj\|αj with Mj iid N(0, 1), Kj iid U([0.1, 0.5]) and αj iid U([2, 4]). We choose an additive structure as in equation (8) and sample the functions from a Gaussian process with bandwidth one. The noise variables Nj are independent and normally distributed with a uniformly chosen variance. As a regression method we choose linear regression, gam regression (R package mgcv) or Gaussian process regression (R package gptk). For the regularization parameter λ we propose to use log(0.05) log(0.01).