Rates of Convergence for Large-scale Nearest Neighbor Classification
Authors: Xingye Qiao, Jiexin Duan, Guang Cheng
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical studies have verified the theoretical findings. All numerical studies are conducted on HPC clusters... We repeat the simulation for 1000 times... In Table 1, we compare the average empirical risk (test error), the empirical CIS, and the speedup of big NN relative to oracle k NN, over 500 replications. |
| Researcher Affiliation | Academia | Xingye Qiao Department of Mathematical Sciences Binghamton University New York, USA qiao@math.binghamton.edu Jiexin Duan Department of Statistics Purdue University West Lafayette, Indiana, USA duan32@purdue.edu Guang Cheng Department of Statistics Purdue University West Lafayette, Indiana, USA chengg@purdue.edu |
| Pseudocode | No | No. The paper describes the methods textually and mathematically (e.g., 'g n,k,s(x) = 1{ 1 s Ps j=1 g(j) n,k(x) > 1/2}') but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | No. The paper does not provide an explicit statement or a link to its own source code for the described methodology. |
| Open Datasets | Yes | We have retained benchmark data sets HTRU2 [34], Gisette [22], Musk 1 [16], Musk 2 [17], Occupancy [8], Credit [45], and SUSY [4], from the UCI machine learning repository [33]. |
| Dataset Splits | Yes | Parameters in k NN and OWNN are tuned using cross-validation, and the parameter k in big NN for each subsample is the optimally chosen k for the oracle k NN divided by s. |
| Hardware Specification | Yes | All numerical studies are conducted on HPC clusters with two 12-core Intel Xeon Gold Skylake processors and four 10-core Xeon-E5 processors, with memory between 64 and 128 GB. |
| Software Dependencies | No | No. The paper states 'The R environment is used in this study.' but does not provide specific version numbers for R or any other software libraries or dependencies. |
| Experiment Setup | Yes | Set k = kon2α/(2α+1)s 1/(2α+1) as N where ko is a constant. We choose the split coefficient γ = 0.0, 0.1 . . . 0.9 and N = 1000 (1, 2, 3, 4, 8, 9, 16, 27, 32). The number of neighbors k is chosen as kon2α/(2α+1)s 1/(2α+1) as stated in the theorems with ko = 1, truncated at 1. We fix number of neighbors k = 5, let γ range from 0 to 0.7, and let N = 1000 (1, 2, 4, 8, 10, 12, 16, 20, 32). We set N = 27000, d = 8, the pre-training split coefficient γ = 0.2, 0.3, number of prediction subsampling repeats I = 5, 9, 13, 17, 21, and the prediction subsample size coefficient θ = 0.1, 0.2, . . . , 0.7. |