reproducibilityindex.ai

Cause-Effect Inference in Location-Scale Noise Models: Maximum Likelihood vs. Independence Testing

Authors: Xiangyu Sun, Oliver Schulte

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	However, through an extensive empirical evaluation, we demonstrate that the accuracy deteriorates sharply when the form of the noise distribution is misspecified by the user. Our analysis shows that the failure occurs mainly when the conditional variance in the anti-causal direction is smaller than that in the causal direction. As an alternative, we find that causal model selection through residual independence testing is much more robust to noise misspecification and misleading conditional variance.
Researcher Affiliation	Academia	Xiangyu Sun Simon Fraser University xiangyu_sun@sfu.ca Oliver Schulte Simon Fraser University oschulte@cs.sfu.ca
Pseudocode	Yes	Algorithm 1 CAREFL-H ... Algorithm 2 CAREFL-M ... Algorithm 3 CAREFL-H (Between Residuals)
Open Source Code	Yes	The code and scripts to reproduce all the results are given online 1. 1https://github.com/xiangyu-sun-789/CAREFL-H
Open Datasets	Yes	Experiments with 580 synthetic and 99 real-world datasets are given in Section 7. The code and scripts to reproduce all the results are given online 1. ... We compare CAREFL-M and CAREFL-H against the SIM benchmark suite [14]. ... The Tübingen Cause-Effect Pairs benchmark [14] is commonly used to evaluate cause-effect inference algorithms [11, 30, 9].
Dataset Splits	Yes	We use both splitting methods: (i) CAREFL(0.8): 80% as training and 20% as testing. (ii) CAREFL(1.0): training = testing = 100%.
Hardware Specification	Yes	The running time is measured on a computer running Ubuntu 20.04.5 LTS with Intel Core i7-6850K 3.60GHz CPU and 32 GB memory. No GPUs are used.
Software Dependencies	No	The paper mentions 'Adam optimizer [12]' but does not provide specific version numbers for programming languages (e.g., Python), frameworks (e.g., PyTorch, TensorFlow), or other libraries used for implementation.
Experiment Setup	Yes	The flow estimator T is parameterized with 4 sub-flows (alternatively: 1, 7 and 10). For each sub-flow, f, g, h and k are modelled as four-layer MLPs with 5 hidden neurons in each layer (alternatively: 2, 10 and 20). Prior distribution is Laplace (alternatively: Gaussian prior). Adam optimizer [12] is used to train each model for 750 epochs (alternatively: 500, 1000 and 2000). L2-penalty strength is 0 by default (alternatively: 0.0001, 0.001, 0.1).