Rates of Convergence for Large-scale Nearest Neighbor Classification

Authors: Xingye Qiao, Jiexin Duan, Guang Cheng

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical studies have verified the theoretical findings. All numerical studies are conducted on HPC clusters... We repeat the simulation for 1000 times... In Table 1, we compare the average empirical risk (test error), the empirical CIS, and the speedup of big NN relative to oracle k NN, over 500 replications.
Researcher Affiliation Academia Xingye Qiao Department of Mathematical Sciences Binghamton University New York, USA qiao@math.binghamton.edu Jiexin Duan Department of Statistics Purdue University West Lafayette, Indiana, USA duan32@purdue.edu Guang Cheng Department of Statistics Purdue University West Lafayette, Indiana, USA chengg@purdue.edu
Pseudocode No No. The paper describes the methods textually and mathematically (e.g., 'g n,k,s(x) = 1{ 1 s Ps j=1 g(j) n,k(x) > 1/2}') but does not include any structured pseudocode or algorithm blocks.
Open Source Code No No. The paper does not provide an explicit statement or a link to its own source code for the described methodology.
Open Datasets Yes We have retained benchmark data sets HTRU2 [34], Gisette [22], Musk 1 [16], Musk 2 [17], Occupancy [8], Credit [45], and SUSY [4], from the UCI machine learning repository [33].
Dataset Splits Yes Parameters in k NN and OWNN are tuned using cross-validation, and the parameter k in big NN for each subsample is the optimally chosen k for the oracle k NN divided by s.
Hardware Specification Yes All numerical studies are conducted on HPC clusters with two 12-core Intel Xeon Gold Skylake processors and four 10-core Xeon-E5 processors, with memory between 64 and 128 GB.
Software Dependencies No No. The paper states 'The R environment is used in this study.' but does not provide specific version numbers for R or any other software libraries or dependencies.
Experiment Setup Yes Set k = kon2α/(2α+1)s 1/(2α+1) as N where ko is a constant. We choose the split coefficient γ = 0.0, 0.1 . . . 0.9 and N = 1000 (1, 2, 3, 4, 8, 9, 16, 27, 32). The number of neighbors k is chosen as kon2α/(2α+1)s 1/(2α+1) as stated in the theorems with ko = 1, truncated at 1. We fix number of neighbors k = 5, let γ range from 0 to 0.7, and let N = 1000 (1, 2, 4, 8, 10, 12, 16, 20, 32). We set N = 27000, d = 8, the pre-training split coefficient γ = 0.2, 0.3, number of prediction subsampling repeats I = 5, 9, 13, 17, 21, and the prediction subsample size coefficient θ = 0.1, 0.2, . . . , 0.7.