reproducibilityindex.ai

Bivariate Causal Discovery using Bayesian Model Selection

Authors: Anish Dhir, Samuel Power, Mark Van Der Wilk

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To demonstrate the usefulness of our approach, we use a Gaussian process latent variable model (GPLVM) (Titsias and Lawrence, 2010), which has the ability to model a wide range of densities. We test this on a range of benchmark datasets with various data generating assumptions. We also compare against previously proposed methods, both those which rely on strict restrictions, and those which are more flexible, but lack formal identifiability guarantees. and 6. Experiments Having laid out our method, we now test it on a mixture of real and synthetic datasets.
Researcher Affiliation	Academia	1Imperial College London 2University of Bristol 3University of Oxford.
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks labeled as 'Algorithm' or 'Pseudocode'.
Open Source Code	No	The paper does not provide an explicit statement of code release for the authors' own methodology (GPLVM) or a direct link to a source code repository.
Open Datasets	Yes	CE-Cha: A mixture of synthetic and real world data. Taken from the cause-effect pairs challenge (Guyon et al., 2019). and CE-Multi (Goudet et al., 2018): Synthetic data with effects generated with varying noise relationships. and CE-Gauss (Mooij et al., 2016): Synthetic data generated with random noise distributions E1, E2 defined in (Mooij et al., 2016). and CE-Tueb (Mooij et al., 2016): Contains 105 pairs of real cause effect pairs taken from the UCI dataset.
Dataset Splits	No	The paper does not explicitly provide information about validation dataset splits (e.g., percentages, sample counts, or predefined splits for validation).
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory, or type of computing cluster) used to run its experiments.
Software Dependencies	No	The paper mentions software components like 'Adam' and 'BFGS' (optimizers) and refers to 'the author s code' for SLOPPY, but it does not provide specific version numbers for any key software dependencies (e.g., Python, PyTorch, specific libraries).
Experiment Setup	Yes	For GPLVM-closed form, we use the sum of an RBF and linear kernels... The model was first trained using Adam with a learning rate of 0.1. After 20,000 epochs, the model was trained using BFGS... For GPLVM-stochastic... The model was trained with Adam with a learning rate of 0.05. The model stopped training if the value of the ELBO plateaued, else it ran for a maximum of 100,000 epochs. In our experiments, we only use GPLVM-stochastic for CE-Tueb as it had a few datasets that had a large number of samples... We use 200 inducing points for all experiments... As GPLVMs are known to suffer from local optima issues, we use 20 random restarts of hyperparameter initialisations, and choose the highest estimate of the approximate marginal likelihood as the final score. For the various hyperparameters, the sampling procedures were: 1. The kernel variances were always set to 1. 2. The likelihood variances were sampled by first sampling κ Uniform(10, 100), and then σ2 Likelihood = 1/κ2. 3. The kernel lengthscales were sampled by first sampling ψ Uniform(1, 100), then set λLengthscale = 1/ψ.