Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Cause-Effect Inference in Location-Scale Noise Models: Maximum Likelihood vs. Independence Testing

Authors: Xiangyu Sun, Oliver Schulte

NeurIPS 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental However, through an extensive empirical evaluation, we demonstrate that the accuracy deteriorates sharply when the form of the noise distribution is misspecified by the user. Our analysis shows that the failure occurs mainly when the conditional variance in the anti-causal direction is smaller than that in the causal direction. As an alternative, we find that causal model selection through residual independence testing is much more robust to noise misspecification and misleading conditional variance.
Researcher Affiliation Academia Xiangyu Sun Simon Fraser University EMAIL Oliver Schulte Simon Fraser University EMAIL
Pseudocode Yes Algorithm 1 CAREFL-H ... Algorithm 2 CAREFL-M ... Algorithm 3 CAREFL-H (Between Residuals)
Open Source Code Yes The code and scripts to reproduce all the results are given online 1. 1https://github.com/xiangyu-sun-789/CAREFL-H
Open Datasets Yes Experiments with 580 synthetic and 99 real-world datasets are given in Section 7. The code and scripts to reproduce all the results are given online 1. ... We compare CAREFL-M and CAREFL-H against the SIM benchmark suite [14]. ... The Tรผbingen Cause-Effect Pairs benchmark [14] is commonly used to evaluate cause-effect inference algorithms [11, 30, 9].
Dataset Splits Yes We use both splitting methods: (i) CAREFL(0.8): 80% as training and 20% as testing. (ii) CAREFL(1.0): training = testing = 100%.
Hardware Specification Yes The running time is measured on a computer running Ubuntu 20.04.5 LTS with Intel Core i7-6850K 3.60GHz CPU and 32 GB memory. No GPUs are used.
Software Dependencies No The paper mentions 'Adam optimizer [12]' but does not provide specific version numbers for programming languages (e.g., Python), frameworks (e.g., PyTorch, TensorFlow), or other libraries used for implementation.
Experiment Setup Yes The flow estimator T is parameterized with 4 sub-flows (alternatively: 1, 7 and 10). For each sub-flow, f, g, h and k are modelled as four-layer MLPs with 5 hidden neurons in each layer (alternatively: 2, 10 and 20). Prior distribution is Laplace (alternatively: Gaussian prior). Adam optimizer [12] is used to train each model for 750 epochs (alternatively: 500, 1000 and 2000). L2-penalty strength is 0 by default (alternatively: 0.0001, 0.001, 0.1).