reproducibilityindex.ai

Inferring Cause and Effect in the Presence of Heteroscedastic Noise

Authors: Sascha Xu, Osman A Mian, Alexander Marx, Jilles Vreeken

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide a thorough empirical evaluation on synthetic and real world data, comparing to a wide range of methods for bi-variate causal inference. These results show that our method, HECI, performs as well as the strongest competitor whenever noise is homoscedasctic, and is the strongest whenever noise does depend on the cause. In this section, we empirically evaluate HECI on both synthetic data and the real-world T ubingen cause and effect pairs (Mooij et al., 2016) benchmark.
Researcher Affiliation	Academia	1CISPA Helmholtz Center for Information Security, Saarbr ucken, Germany 2ETH Z urich & ETH AI Center, Z urich, Switzerland.
Pseudocode	Yes	Algorithm 1: HECI(X, Y, )
Open Source Code	Yes	HECI is implemented in Python and we provide the source code as well as the synthetic data for research purposes.1 https://eda.mmci.uni-saarland.de/heci/
Open Datasets	Yes	We test HECI on two different settings. First, we generate synthetic data according to our assumed causal model in Eq. (1). Next, we use the synthetic data of Gaussian processes provided by Tagasovska et al. (2020) over different noise settings. Last, we benchmark on the real-world T ubingen Cause Effect pairs dataset.
Dataset Splits	No	The paper describes how synthetic data is generated and mentions using the Tübingen dataset, but it does not specify explicit train/validation/test dataset splits or cross-validation setups for model training and evaluation.
Hardware Specification	Yes	All experiments were executed on a 4-core Intel i7 machine with 16 GB RAM, running Windows 10.
Software Dependencies	No	The paper states
Experiment Setup	Yes	We initiate the binning algorithm with b equal-width bins that partition the domain of X. A local function is fitted inside a single bin or over multiple, neighboring bins. In our experiments, we set = 0.05, with which the best performance was achieved. We therefore require a min support of 10 unique data points per bin. To choose the polynomial degree, we use BIC and minimize