Cause-Effect Inference in Location-Scale Noise Models: Maximum Likelihood vs. Independence Testing

Authors: Xiangyu Sun, Oliver Schulte

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental However, through an extensive empirical evaluation, we demonstrate that the accuracy deteriorates sharply when the form of the noise distribution is misspecified by the user. Our analysis shows that the failure occurs mainly when the conditional variance in the anti-causal direction is smaller than that in the causal direction. As an alternative, we find that causal model selection through residual independence testing is much more robust to noise misspecification and misleading conditional variance.
Researcher Affiliation Academia Xiangyu Sun Simon Fraser University xiangyu_sun@sfu.ca Oliver Schulte Simon Fraser University oschulte@cs.sfu.ca
Pseudocode Yes Algorithm 1 CAREFL-H ... Algorithm 2 CAREFL-M ... Algorithm 3 CAREFL-H (Between Residuals)
Open Source Code Yes The code and scripts to reproduce all the results are given online 1. 1https://github.com/xiangyu-sun-789/CAREFL-H
Open Datasets Yes Experiments with 580 synthetic and 99 real-world datasets are given in Section 7. The code and scripts to reproduce all the results are given online 1. ... We compare CAREFL-M and CAREFL-H against the SIM benchmark suite [14]. ... The Tübingen Cause-Effect Pairs benchmark [14] is commonly used to evaluate cause-effect inference algorithms [11, 30, 9].
Dataset Splits Yes We use both splitting methods: (i) CAREFL(0.8): 80% as training and 20% as testing. (ii) CAREFL(1.0): training = testing = 100%.
Hardware Specification Yes The running time is measured on a computer running Ubuntu 20.04.5 LTS with Intel Core i7-6850K 3.60GHz CPU and 32 GB memory. No GPUs are used.
Software Dependencies No The paper mentions 'Adam optimizer [12]' but does not provide specific version numbers for programming languages (e.g., Python), frameworks (e.g., PyTorch, TensorFlow), or other libraries used for implementation.
Experiment Setup Yes The flow estimator T is parameterized with 4 sub-flows (alternatively: 1, 7 and 10). For each sub-flow, f, g, h and k are modelled as four-layer MLPs with 5 hidden neurons in each layer (alternatively: 2, 10 and 20). Prior distribution is Laplace (alternatively: Gaussian prior). Adam optimizer [12] is used to train each model for 750 epochs (alternatively: 500, 1000 and 2000). L2-penalty strength is 0 by default (alternatively: 0.0001, 0.001, 0.1).