Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Nearest-Neighbor Sample Compression: Efficiency, Consistency, Infinite Dimensions
Authors: Aryeh Kontorovich, Sivan Sabato, Roi Weiss
NeurIPS 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We prove that this algorithm is strongly Bayes-consistent in metric spaces with finite doubling dimension the first consistency result for an efficient nearest-neighbor sample compression scheme. Rather surprisingly, we discover that this algorithm continues to be Bayes-consistent even in a certain infinite-dimensional setting, in which the basic measure-theoretic conditions on which classic consistency proofs hinge are violated. This is all the more surprising, since it is known that k-NN is not Bayes-consistent in this setting. We pose several challenging open problems for future research. |
| Researcher Affiliation | Academia | Aryeh Kontorovich Department of Computer Science Ben-Gurion University of the Negev EMAIL Sivan Sabato Department of Computer Science Ben-Gurion University of the Negev EMAIL Roi Weiss Department of Computer Science and Applied Mathematics Weizmann Institute of Science EMAIL |
| Pseudocode | Yes | Algorithm 1 KSU: 1-NN compression-based algorithm |
| Open Source Code | No | The paper does not provide any information about open-source code for the described methodology. There are no links or explicit statements about code availability. |
| Open Datasets | No | The paper focuses on theoretical analysis and proofs. It does not mention specific datasets used for training or provide access information for any public datasets. |
| Dataset Splits | No | The paper is theoretical and does not describe experiments with dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not describe computational experiments that would require specific hardware. Therefore, no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not describe computational experiments that would require specific software dependencies with version numbers. |
| Experiment Setup | No | The paper focuses on theoretical aspects and does not describe specific experimental setups, hyperparameters, or training settings for practical implementation. |