An Asymptotic Test for Conditional Independence using Analytic Kernel Embeddings
Authors: Meyer Scetbon, Laurent Meunier, Yaniv Romano
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct a series of experiments showing that our new test outperforms state-of-the-art methods both in terms of type-I and type-II errors even in the high dimensional setting. The goal of this section is three fold: (i) to investigate the effects of the parameters J and p on the performances of our method, (ii) to validate our theoretical results depicted in Propositions 3.3 and 3.7, and (iii) to compare our method with those proposed in the literature. In more detail, we first compare the performance of our method, both in terms of both power and type-I error, by varying the hyperparameters J and p. |
| Researcher Affiliation | Collaboration | 1CREST, ENSAE, France 2Facebook AI Research, Paris, France 3Universit e Paris-Dauphine, France 4Departments of Electrical and Computer Engineering and of Computer Science, Technion, Israel. |
| Pseudocode | No | The paper describes its statistical procedure and approximations in detail, including mathematical formulations, but it does not contain any explicitly labeled "Pseudocode" or "Algorithm" blocks or figures. |
| Open Source Code | Yes | The code is available at https://github.com/meyerscetbon/lp-ci-test5. |
| Open Datasets | No | To evaluate the type-I error, we generate data that follows the model: X = f1(εx), Y = f2(εy), and Z N(0dz, Idz), and To compare the power of the tests, we also consider the model: X = f1(εx + 0.8εb), Y = f2(εy + 0.8εb). The paper uses synthetic data generated according to specified models rather than pre-existing, publicly accessible datasets with direct download links or DOIs. |
| Dataset Splits | No | The paper does not explicitly provide details about specific train/validation/test dataset splits. It describes data generation functions for synthetic data and mentions using a batch of observations for hyperparameter optimization without detailing a formal split for model evaluation. |
| Hardware Specification | No | Software packages of all the above tests are freely available online and each experiment was run on a single CPU. This mentions a general type of hardware (single CPU) but lacks specific details such as CPU model, memory, or GPU specifications. |
| Software Dependencies | No | Our code requires a slight modification of the Gaussian Process Regression implemented in scikit-learn (Pedregosa et al., 2011) to limit the number of iterations involved in the optimization procedure. The paper mentions scikit-learn but does not provide specific version numbers for it or any other software dependencies. |
| Experiment Setup | Yes | The first category includes both the choice of the locations ((tx, tz)j, (ty)j))J j=1 on which differences between the mean embeddings are computed and the choice of the kernels k X and k Y. ... we restrict ourselves to one-dimensional kernel bandwidths σX , σY, and σZ for the kernels k X , k Y, and k Z, respectively. More precisely, we select the median of { xi xj 2}1 i<j n, { yi yj 2}1 i<j n, and { zi zj 2}1 i<j n for σX , σY, and σZ, respectively. ... we run this method only on a batch of size 200 observations randomly selected and we perform only 10 iterations for choosing the hyperparameters involved in the RLS problems. |