reproducibilityindex.ai

Optimal Transport for Causal Discovery

Authors: Ruibo Tu, Kun Zhang, Hedvig Kjellstrom, Cheng Zhang

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our method demonstrated state-of-the-art results on both synthetic and causal discovery benchmark datasets. We demonstrate and evaluate our method on the synthetic and real-world cause-effect pair data (Mooij et al., 2016).
Researcher Affiliation	Collaboration	Ruibo Tu KTH Royal Institute of Technology ruibo@kth.se Kun Zhang Carnegie Mellon University Mohamed bin Zayed University of Artiﬁcial Intelligence kunz1@cmu.edu Hedvig Kjellstr om KTH Royal Institute of Technology Silo AI hedvig@kth.se Cheng Zhang Microsoft Research Cheng.Zhang@microsoft.com
Pseudocode	Yes	Algorithm 1: DIVOT: divergence measure with optimal transport for causal direction determination.
Open Source Code	No	The paper does not explicitly state that the code for their method is open-source or provide a direct link to a repository.
Open Datasets	Yes	We demonstrate and evaluate our method on the synthetic and real-world cause-effect pair data (Mooij et al., 2016). We apply DIVOT to the T ubingen cause-effect pair dataset (Mooij et al., 2016).
Dataset Splits	No	The paper discusses sample sizes (e.g., '10, 25, 50, 100, 200, and 500') and compares results on benchmark datasets, but does not specify explicit training, validation, and testing splits for reproducibility, nor does it refer to predefined splits from external sources for its own experiments.
Hardware Specification	Yes	The experiments are based on Mac Book Pro (15-inch, 2018) with 2.9 GHz 6-Core Intel Core i9.
Software Dependencies	No	Our implementation is based on JAX (Bradbury et al., 2018) which uses Apache License and the running time is measured with the command %timeit in JAX. While JAX is mentioned, a specific version number for the library used in the experiments is not explicitly stated.
Experiment Setup	Yes	Noise data generation: the ﬁrst step of computing the divergence measure. To compute the divergence measure, we need to know the velocity ﬁeld v as deﬁned in the time evolution equation (5). It requires the couplings of the data of x0 = [Ex, Ey] and x T = [X, Y ] . But in the bivariate causal discovery task, only the data of x T are given. Therefore, as shown in Line 11 of Alg. 1, we ﬁrst deal with the issue due to the lack of the noise data of x0, denoted by {(ei x, ei y)}. To obtain the noise data, we may assume a multivariate probability distribution of x0 with the density p0(Ex, Ey) and then sample data from it, represented by (ei x, ei y) p0(Ex, Ey). Fortunately, due to the FCM constraints, we know that p0(Ex, Ey) = p0(X, Ey) = p(X)p(Ey). So we only need to assume the probability distribution of Ey and parameterize it with θ, denoted by p(Ey; θ). Suppose that the dataset of x T with N samples is given, denoted by {(xi, yi)}N. We ﬁrst sample a data set of Ey with the sample size N, denoted by {ei y}N, e.g., in the experiments of this work, we use the simpliﬁed reparameterization trick, ei y = f noise θ (esource y ) = θ esource y and esource y N(0, 1)/U(0, 1), (9) As for the synthetic data experiments, we used gradient descend for ﬁnding the optimal θ . The gradient jax.grad(loss) is computed with the autograd in JAX (Bradbury et al., 2018). We update θ by specifying a step size sz and θ := θ jax.grad(loss) sz. If after the update θ < 0 (which has never happened), we set the value of θ as a positive number close to zero. We used sz = 1 for all the synthetic data experiments. For the synthetic data experiments in Sec. 6, we use the batch size 0.4 for the datasets with sample size 10; 0.2 for the datasets with sample sizes 25 and 50; 0.15 for the datasets with sample sizes 100 and 200; and 0.05 for the datasets with sample size 500.