On the Identifiability and Estimation of Causal Location-Scale Noise Models
Authors: Alexander Immer, Christoph Schultheiss, Julia E Vogt, Bernhard Schölkopf, Peter Bühlmann, Alexander Marx
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically compare LOCI to state-of-the-art bivariate causal inference methods and study the benefit of modelling location-scale noise as well as a post-hoc independence test. The performance of bivariate causal inference methods is assessed in terms of accuracy and area under the decision rate curve (AUDRC). |
| Researcher Affiliation | Academia | 1Department of Computer Science, ETH Zurich, Switzerland 2Max Planck Institute for Intelligent Systems, Tübingen, Germany 3Seminar for Statistics, ETH Zurich, Switzerland 4AI Center, ETH Zurich, Switzerland. |
| Pseudocode | No | No pseudocode or clearly labeled algorithm blocks are present. Algorithms are described in text, e.g., "In Supplementary Material A, we describe an algorithm to estimate the model parameters." |
| Open Source Code | Yes | For reproducibility, we make our code publicly available3 and provide all proofs in the Supplementary Material. 3https://github.com/Alex Immer/loci |
| Open Datasets | Yes | Datasets. To assess the performance of our method on datasets where the model assumptions are only slightly violated, we consider the five synthetic datasets proposed by Tagasovska et al. (2020) that consist of additive (AN, ANs), location scale (LS, LSs), and multiplicative (MNU) noise-models. The datasets with suffix s use an invertible sigmoidal function making identification more difficult. Further, we assess the performance of our approach on a wide variety of common benchmarks where our assumptions are most likely violated. We consider the Net dataset constructed from random neural networks, the Multi datasets using polynomial mechanisms and various noise settings (including post non-linear noise), and the Cha dataset used for the effect pair challenge (Guyon et al., 2019). Further, we consider the SIM and Tübingen datasets of the benchmark by Mooij et al. (2016). |
| Dataset Splits | No | Even though, our estimator is suitable for Gaussian LSNMs if we perform sample splitting, we observe a substantially improved performance when using the full dataset for training and independence testing. All considered datasets were standardized to zero mean and unit variance for our methods, and use the recommended pre-processing for the baselines. The paper doesn't provide specific dataset splits like percentages or counts for training, validation, or testing. While it mentions using "standard cause-effect benchmark datasets," it does not explicitly detail the splits used for its own experiments. |
| Hardware Specification | No | No specific hardware details (e.g., CPU, GPU models, or memory) are provided for the experimental setup. |
| Software Dependencies | No | For our estimators LOCIM and LOCIH, we use spline feature maps (Eilers & Marx, 1996) of order 5 and with 25 knots implemented in scikit-learn (Pedregosa et al., 2011). The parameters are optimized by full-batch gradient-descent on the log-likelihood using Adam (Kingma & Ba, 2014) with initial learning rate 10 2 that is decayed to 10 6 using a cosine learning rate schedule for 5 000 steps. The paper mentions `scikit-learn` and `Adam` but does not provide specific version numbers for these software dependencies, nor does it list other key components like Python or a deep learning framework version. |
| Experiment Setup | Yes | For the neural network based estimators, NN-LOCIM and NN-LOCIH, we use a neural network with a single hidden layer of width 100 and Tan H activation function. The first output of the network is unrestricted and the second output applies an exponential function to ensure a positive output as described in Eq. 3 and App. A.2. The parameters are optimized by full-batch gradient-descent on the log-likelihood using Adam (Kingma & Ba, 2014) with initial learning rate 10 2 that is decayed to 10 6 using a cosine learning rate schedule for 5 000 steps. |