reproducibilityindex.ai

Rates of Convergence for Large-scale Nearest Neighbor Classification

Authors: Xingye Qiao, Jiexin Duan, Guang Cheng

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical studies have veriﬁed the theoretical ﬁndings. All numerical studies are conducted on HPC clusters... We repeat the simulation for 1000 times... In Table 1, we compare the average empirical risk (test error), the empirical CIS, and the speedup of big NN relative to oracle k NN, over 500 replications.
Researcher Affiliation	Academia	Xingye Qiao Department of Mathematical Sciences Binghamton University New York, USA qiao@math.binghamton.edu Jiexin Duan Department of Statistics Purdue University West Lafayette, Indiana, USA duan32@purdue.edu Guang Cheng Department of Statistics Purdue University West Lafayette, Indiana, USA chengg@purdue.edu
Pseudocode	No	No. The paper describes the methods textually and mathematically (e.g., 'g n,k,s(x) = 1{ 1 s Ps j=1 g(j) n,k(x) > 1/2}') but does not include any structured pseudocode or algorithm blocks.
Open Source Code	No	No. The paper does not provide an explicit statement or a link to its own source code for the described methodology.
Open Datasets	Yes	We have retained benchmark data sets HTRU2 [34], Gisette [22], Musk 1 [16], Musk 2 [17], Occupancy [8], Credit [45], and SUSY [4], from the UCI machine learning repository [33].
Dataset Splits	Yes	Parameters in k NN and OWNN are tuned using cross-validation, and the parameter k in big NN for each subsample is the optimally chosen k for the oracle k NN divided by s.
Hardware Specification	Yes	All numerical studies are conducted on HPC clusters with two 12-core Intel Xeon Gold Skylake processors and four 10-core Xeon-E5 processors, with memory between 64 and 128 GB.
Software Dependencies	No	No. The paper states 'The R environment is used in this study.' but does not provide specific version numbers for R or any other software libraries or dependencies.
Experiment Setup	Yes	Set k = kon2α/(2α+1)s 1/(2α+1) as N where ko is a constant. We choose the split coefﬁcient γ = 0.0, 0.1 . . . 0.9 and N = 1000 (1, 2, 3, 4, 8, 9, 16, 27, 32). The number of neighbors k is chosen as kon2α/(2α+1)s 1/(2α+1) as stated in the theorems with ko = 1, truncated at 1. We ﬁx number of neighbors k = 5, let γ range from 0 to 0.7, and let N = 1000 (1, 2, 4, 8, 10, 12, 16, 20, 32). We set N = 27000, d = 8, the pre-training split coefﬁcient γ = 0.2, 0.3, number of prediction subsampling repeats I = 5, 9, 13, 17, 21, and the prediction subsample size coefﬁcient θ = 0.1, 0.2, . . . , 0.7.