Nearest-Neighbor Sampling Based Conditional Independence Testing

Authors: Shuai Li, Ziqi Chen, Hongtu Zhu, Christina Dan Wang, Wang Wen

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we demonstrate the efficiency of our proposed test in both synthetic and real data analyses.
Researcher Affiliation Academia 1School of Statistics, KLATASDS-MOE, East China Normal University, Shanghai, China 2 Departments of Biostatistics, Statistics, Computer Science, and Genetics, The University of North Carolina at Chapel Hill, Chapel Hill, USA 3 Business Division, New York University Shanghai, Shanghai, China 4 School of Mathematics and Statistics, Central South University, Changsha, China
Pseudocode Yes Algorithm 1: 1-Nearest-Neighbor sampling (1NN(V1,V2,n)) and Algorithm 2: Nearest-Neighbor sampling based conditional independence test (NNSCIT)
Open Source Code Yes The source code of NNSCIT is available at https://github.com/LeeShuaikenwitch/NNSCIT.
Open Datasets No The synthetic data sets are generated by using the post nonlinear model similar to those in Zhang et al. (2011); Doran et al. (2014); and Bellot and van der Schaar (2019). While generation methods and citations are given, no specific link or access information to a publicly available dataset is provided.
Dataset Splits No The paper specifies training and testing splits (e.g., "U1 := {Xtrain, Ytrain, Ztrain} with sample size n n/3 and U2 := {Xtest, Ytest, Ztest} with sample size n/3"), but does not explicitly mention a separate validation set for hyperparameter tuning.
Hardware Specification No No specific hardware (e.g., GPU/CPU models, memory) used for experiments is mentioned in the paper.
Software Dependencies No No specific software dependencies with version numbers (e.g., programming language versions, library versions) are mentioned.
Experiment Setup Yes Set M = 500 and k = 3. Consider the following four scenarios: Scenario I. Set f, g and h to be the identity functions, inducing linear dependencies, Z N(0.7, 1), and X N(0, 1) under H1. Scenario II. Set f, g and h as in Scenario I, but use a Laplace distribution to generate Z. Scenario III. Set f, g and h as in Scenario I, but use Uniform[ 2.5, 2.5] to generate Z. Scenario IV. Set f, g and h to be randomly sampled from x2, x3, tanh(x), cos(x) . Set Z N(0, 1), and X N(0, 1) under H1. The significance level is set at α = 0.05 and the sample size is fixed at n = 1000.