reproducibilityindex.ai

Nearest-Neighbor Sampling Based Conditional Independence Testing

Authors: Shuai Li, Ziqi Chen, Hongtu Zhu, Christina Dan Wang, Wang Wen

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we demonstrate the efﬁciency of our proposed test in both synthetic and real data analyses.
Researcher Affiliation	Academia	1School of Statistics, KLATASDS-MOE, East China Normal University, Shanghai, China 2 Departments of Biostatistics, Statistics, Computer Science, and Genetics, The University of North Carolina at Chapel Hill, Chapel Hill, USA 3 Business Division, New York University Shanghai, Shanghai, China 4 School of Mathematics and Statistics, Central South University, Changsha, China
Pseudocode	Yes	Algorithm 1: 1-Nearest-Neighbor sampling (1NN(V1,V2,n)) and Algorithm 2: Nearest-Neighbor sampling based conditional independence test (NNSCIT)
Open Source Code	Yes	The source code of NNSCIT is available at https://github.com/LeeShuaikenwitch/NNSCIT.
Open Datasets	No	The synthetic data sets are generated by using the post nonlinear model similar to those in Zhang et al. (2011); Doran et al. (2014); and Bellot and van der Schaar (2019). While generation methods and citations are given, no specific link or access information to a publicly available dataset is provided.
Dataset Splits	No	The paper specifies training and testing splits (e.g., "U1 := {Xtrain, Ytrain, Ztrain} with sample size n n/3 and U2 := {Xtest, Ytest, Ztest} with sample size n/3"), but does not explicitly mention a separate validation set for hyperparameter tuning.
Hardware Specification	No	No specific hardware (e.g., GPU/CPU models, memory) used for experiments is mentioned in the paper.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., programming language versions, library versions) are mentioned.
Experiment Setup	Yes	Set M = 500 and k = 3. Consider the following four scenarios: Scenario I. Set f, g and h to be the identity functions, inducing linear dependencies, Z N(0.7, 1), and X N(0, 1) under H1. Scenario II. Set f, g and h as in Scenario I, but use a Laplace distribution to generate Z. Scenario III. Set f, g and h as in Scenario I, but use Uniform[ 2.5, 2.5] to generate Z. Scenario IV. Set f, g and h to be randomly sampled from x2, x3, tanh(x), cos(x) . Set Z N(0, 1), and X N(0, 1) under H1. The signiﬁcance level is set at α = 0.05 and the sample size is ﬁxed at n = 1000.