An Asymptotic Test for Conditional Independence using Analytic Kernel Embeddings

Authors: Meyer Scetbon, Laurent Meunier, Yaniv Romano

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct a series of experiments showing that our new test outperforms state-of-the-art methods both in terms of type-I and type-II errors even in the high dimensional setting. The goal of this section is three fold: (i) to investigate the effects of the parameters J and p on the performances of our method, (ii) to validate our theoretical results depicted in Propositions 3.3 and 3.7, and (iii) to compare our method with those proposed in the literature. In more detail, we first compare the performance of our method, both in terms of both power and type-I error, by varying the hyperparameters J and p.
Researcher Affiliation Collaboration 1CREST, ENSAE, France 2Facebook AI Research, Paris, France 3Universit e Paris-Dauphine, France 4Departments of Electrical and Computer Engineering and of Computer Science, Technion, Israel.
Pseudocode No The paper describes its statistical procedure and approximations in detail, including mathematical formulations, but it does not contain any explicitly labeled "Pseudocode" or "Algorithm" blocks or figures.
Open Source Code Yes The code is available at https://github.com/meyerscetbon/lp-ci-test5.
Open Datasets No To evaluate the type-I error, we generate data that follows the model: X = f1(εx), Y = f2(εy), and Z N(0dz, Idz), and To compare the power of the tests, we also consider the model: X = f1(εx + 0.8εb), Y = f2(εy + 0.8εb). The paper uses synthetic data generated according to specified models rather than pre-existing, publicly accessible datasets with direct download links or DOIs.
Dataset Splits No The paper does not explicitly provide details about specific train/validation/test dataset splits. It describes data generation functions for synthetic data and mentions using a batch of observations for hyperparameter optimization without detailing a formal split for model evaluation.
Hardware Specification No Software packages of all the above tests are freely available online and each experiment was run on a single CPU. This mentions a general type of hardware (single CPU) but lacks specific details such as CPU model, memory, or GPU specifications.
Software Dependencies No Our code requires a slight modification of the Gaussian Process Regression implemented in scikit-learn (Pedregosa et al., 2011) to limit the number of iterations involved in the optimization procedure. The paper mentions scikit-learn but does not provide specific version numbers for it or any other software dependencies.
Experiment Setup Yes The first category includes both the choice of the locations ((tx, tz)j, (ty)j))J j=1 on which differences between the mean embeddings are computed and the choice of the kernels k X and k Y. ... we restrict ourselves to one-dimensional kernel bandwidths σX , σY, and σZ for the kernels k X , k Y, and k Z, respectively. More precisely, we select the median of { xi xj 2}1 i<j n, { yi yj 2}1 i<j n, and { zi zj 2}1 i<j n for σX , σY, and σZ, respectively. ... we run this method only on a batch of size 200 observations randomly selected and we perform only 10 iterations for choosing the hyperparameters involved in the RLS problems.