Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

ProtoNN: Compressed and Accurate kNN for Resource-scarce Devices

Authors: Chirag Gupta, Arun Sai Suggala, Ankit Goyal, Harsha Vardhan Simhadri, Bhargavi Paranjape, Ashish Kumar, Saurabh Goyal, Raghavendra Udupa, Manik Varma, Prateek Jain

ICML 2017 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct systematic empirical evaluation of Proto NN on a variety of supervised learning tasks (binary, multi-class, multi-label classification) and show that it gives nearly state-of-the-art prediction accuracy on resource-scarce devices while consuming several orders lower storage, and using minimal working memory.Finally, we conduct extensive experiments to benchmark Proto NN against existing state-of-the-art methods for various learning tasks.
Researcher Affiliation Collaboration 1Microsoft Research, India 2Carnegie Mellon University, Pittsburgh 3University of Michigan, Ann Arbor 4IIT Delhi, India.
Pseudocode Yes Algorithm 1 Proto NN: Train Algorithm
Open Source Code Yes We have implemented Proto NN as part of an open source embedded device ML library and it can be downloaded online2.2https://github.com/Microsoft/ELL
Open Datasets Yes Experimental Settings: Datasets: Table 3 in Appendix 9.1 lists the binary, multiclass and multilabel datasets used in our experiments. (Mentions common benchmark datasets like mnist, usps, cifar, letter-26, curet-61 in subsequent tables).
Dataset Splits Yes For baselines the optimal hyper-parameters are selected through cross-validation. We use the above parameter settings for all binary, multiclass datasets except for binary versions of usps, character and eye which require 5-fold cross validation. Proto NN values obtained with the above hyper-parameters are reported for all datasets, except usps and character recognition which require 5-fold cross validation.
Hardware Specification Yes The Arduino Uno has an 8 bit, 16 MHz Atmega328P microcontroller, with 2k B of SRAM and 32k B of read-only flash.
Software Dependencies No The paper mentions implementing an 'integer version' of Proto NN and notes properties of a microcontroller, but does not specify software components (e.g., Python, TensorFlow, PyTorch) with their version numbers.
Experiment Setup Yes Hyperparameters: In all our experiments, we fix the no. of alternating minimization iterations(T) to 150. Each such iteration does e-many epochs each over the 3 parameters, W, B, and Z. For small binary and multiclass datasets we do GD with e set to 20. For multilabel and large multiclass (aloi) datasets, we do SGD with e set to 5, batch size to 512. Kernel parameter γ is computed after initializing B, W as 2.5 median(D)... Binary: ˆd = 10, s Z = s B = 0.8. m = 40 if s W = 1.0 gives model larger than 16k B. Else, s W = 1.0 and m is increased to reach 16 k B model. Multiclass: ˆd = 15, s Z = s B = 0.8. m = 5/class if s W = 1.0 gives model larger than 64kb. Else, m is increased to reach 64 k B model.