Optimal Transport for Causal Discovery
Authors: Ruibo Tu, Kun Zhang, Hedvig Kjellstrom, Cheng Zhang
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our method demonstrated state-of-the-art results on both synthetic and causal discovery benchmark datasets. We demonstrate and evaluate our method on the synthetic and real-world cause-effect pair data (Mooij et al., 2016). |
| Researcher Affiliation | Collaboration | Ruibo Tu KTH Royal Institute of Technology ruibo@kth.se Kun Zhang Carnegie Mellon University Mohamed bin Zayed University of Artificial Intelligence kunz1@cmu.edu Hedvig Kjellstr om KTH Royal Institute of Technology Silo AI hedvig@kth.se Cheng Zhang Microsoft Research Cheng.Zhang@microsoft.com |
| Pseudocode | Yes | Algorithm 1: DIVOT: divergence measure with optimal transport for causal direction determination. |
| Open Source Code | No | The paper does not explicitly state that the code for their method is open-source or provide a direct link to a repository. |
| Open Datasets | Yes | We demonstrate and evaluate our method on the synthetic and real-world cause-effect pair data (Mooij et al., 2016). We apply DIVOT to the T ubingen cause-effect pair dataset (Mooij et al., 2016). |
| Dataset Splits | No | The paper discusses sample sizes (e.g., '10, 25, 50, 100, 200, and 500') and compares results on benchmark datasets, but does not specify explicit training, validation, and testing splits for reproducibility, nor does it refer to predefined splits from external sources for its own experiments. |
| Hardware Specification | Yes | The experiments are based on Mac Book Pro (15-inch, 2018) with 2.9 GHz 6-Core Intel Core i9. |
| Software Dependencies | No | Our implementation is based on JAX (Bradbury et al., 2018) which uses Apache License and the running time is measured with the command %timeit in JAX. While JAX is mentioned, a specific version number for the library used in the experiments is not explicitly stated. |
| Experiment Setup | Yes | Noise data generation: the first step of computing the divergence measure. To compute the divergence measure, we need to know the velocity field v as defined in the time evolution equation (5). It requires the couplings of the data of x0 = [Ex, Ey] and x T = [X, Y ] . But in the bivariate causal discovery task, only the data of x T are given. Therefore, as shown in Line 11 of Alg. 1, we first deal with the issue due to the lack of the noise data of x0, denoted by {(ei x, ei y)}. To obtain the noise data, we may assume a multivariate probability distribution of x0 with the density p0(Ex, Ey) and then sample data from it, represented by (ei x, ei y) p0(Ex, Ey). Fortunately, due to the FCM constraints, we know that p0(Ex, Ey) = p0(X, Ey) = p(X)p(Ey). So we only need to assume the probability distribution of Ey and parameterize it with θ, denoted by p(Ey; θ). Suppose that the dataset of x T with N samples is given, denoted by {(xi, yi)}N. We first sample a data set of Ey with the sample size N, denoted by {ei y}N, e.g., in the experiments of this work, we use the simplified reparameterization trick, ei y = f noise θ (esource y ) = θ esource y and esource y N(0, 1)/U(0, 1), (9) As for the synthetic data experiments, we used gradient descend for finding the optimal θ . The gradient jax.grad(loss) is computed with the autograd in JAX (Bradbury et al., 2018). We update θ by specifying a step size sz and θ := θ jax.grad(loss) sz. If after the update θ < 0 (which has never happened), we set the value of θ as a positive number close to zero. We used sz = 1 for all the synthetic data experiments. For the synthetic data experiments in Sec. 6, we use the batch size 0.4 for the datasets with sample size 10; 0.2 for the datasets with sample sizes 25 and 50; 0.15 for the datasets with sample sizes 100 and 200; and 0.05 for the datasets with sample size 500. |