Nearest-Neighbor Sampling Based Conditional Independence Testing
Authors: Shuai Li, Ziqi Chen, Hongtu Zhu, Christina Dan Wang, Wang Wen
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we demonstrate the efficiency of our proposed test in both synthetic and real data analyses. |
| Researcher Affiliation | Academia | 1School of Statistics, KLATASDS-MOE, East China Normal University, Shanghai, China 2 Departments of Biostatistics, Statistics, Computer Science, and Genetics, The University of North Carolina at Chapel Hill, Chapel Hill, USA 3 Business Division, New York University Shanghai, Shanghai, China 4 School of Mathematics and Statistics, Central South University, Changsha, China |
| Pseudocode | Yes | Algorithm 1: 1-Nearest-Neighbor sampling (1NN(V1,V2,n)) and Algorithm 2: Nearest-Neighbor sampling based conditional independence test (NNSCIT) |
| Open Source Code | Yes | The source code of NNSCIT is available at https://github.com/LeeShuaikenwitch/NNSCIT. |
| Open Datasets | No | The synthetic data sets are generated by using the post nonlinear model similar to those in Zhang et al. (2011); Doran et al. (2014); and Bellot and van der Schaar (2019). While generation methods and citations are given, no specific link or access information to a publicly available dataset is provided. |
| Dataset Splits | No | The paper specifies training and testing splits (e.g., "U1 := {Xtrain, Ytrain, Ztrain} with sample size n n/3 and U2 := {Xtest, Ytest, Ztest} with sample size n/3"), but does not explicitly mention a separate validation set for hyperparameter tuning. |
| Hardware Specification | No | No specific hardware (e.g., GPU/CPU models, memory) used for experiments is mentioned in the paper. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., programming language versions, library versions) are mentioned. |
| Experiment Setup | Yes | Set M = 500 and k = 3. Consider the following four scenarios: Scenario I. Set f, g and h to be the identity functions, inducing linear dependencies, Z N(0.7, 1), and X N(0, 1) under H1. Scenario II. Set f, g and h as in Scenario I, but use a Laplace distribution to generate Z. Scenario III. Set f, g and h as in Scenario I, but use Uniform[ 2.5, 2.5] to generate Z. Scenario IV. Set f, g and h to be randomly sampled from x2, x3, tanh(x), cos(x) . Set Z N(0, 1), and X N(0, 1) under H1. The significance level is set at α = 0.05 and the sample size is fixed at n = 1000. |