reproducibilityindex.ai

On Convergence of Nearest Neighbor Classifiers over Feature Transformations

Authors: Luka Rimanic, Cedric Renggli, Bo Li, Ce Zhang

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically validate that both properties have an impact on the k NN convergence on 30 feature transformations with 6 benchmark datasets spanning from the vision to the text domain. We highlight the usefulness and validate our novel theoretical understanding by conducting a thorough experimental evaluation ranging over 6 real-world datasets from two popular machine learning modalities, and 30 different feature transformations.
Researcher Affiliation	Academia	Luka Rimanic ETH Zurich luka.rimanic@inf.ethz.ch; Cedric Renggli ETH Zurich cedric.renggli@inf.ethz.ch; Bo Li UIUC lbo@illinois.edu; Ce Zhang ETH Zurich ce.zhang@inf.ethz.ch
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	whereas the source code for reproducing the numbers can be found in the supplementary ﬁles.
Open Datasets	Yes	We perform the evaluation on two data modalities which are ubiquotous in modern machine learning. The first group consists of visual classiﬁcation tasks, including MNIST, CIFAR10 and CIFAR100. The second group consists of standard text classiﬁcation tasks, where we focus on IMDB, SST2 and YELP.
Dataset Splits	Yes	Table 1: Dataset Statistics: MNIST (TRAINING SAMPLES 60K, TEST SAMPLES 10K), CIFAR10 (TRAINING SAMPLES 50K, TEST SAMPLES 10K), CIFAR100 (TRAINING SAMPLES 50K, TEST SAMPLES 10K), IMDB (TRAINING SAMPLES 25K, TEST SAMPLES 25K), SST2 (TRAINING SAMPLES 67K, TEST SAMPLES 872), YELP (TRAINING SAMPLES 500K, TEST SAMPLES 50K).
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments.
Software Dependencies	No	The paper mentions "Tensor Flow Hub and Py Torch Hub" but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	For k NN, we restrict ourselves to the Euclidean distance, the most commonly used distance function. Furthermore, we set k = 1 for this entire section, whereas the empirical analysis of the inﬂuence of k > 1 is conducted in Section B of the supplementary material. More precisely, we examine g(x) = σ(hw, xi), with Lg = kwk2. When dealing with multi-class tasks, we use g(x) = softmax whilst reporting k Wk F , the Frobenius norm of the weights, in place of kwk2.