ProtoNN: Compressed and Accurate kNN for Resource-scarce Devices

Authors: Chirag Gupta, Arun Sai Suggala, Ankit Goyal, Harsha Vardhan Simhadri, Bhargavi Paranjape, Ashish Kumar, Saurabh Goyal, Raghavendra Udupa, Manik Varma, Prateek Jain

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct systematic empirical evaluation of Proto NN on a variety of supervised learning tasks (binary, multi-class, multi-label classification) and show that it gives nearly state-of-the-art prediction accuracy on resource-scarce devices while consuming several orders lower storage, and using minimal working memory.Finally, we conduct extensive experiments to benchmark Proto NN against existing state-of-the-art methods for various learning tasks.
Researcher Affiliation Collaboration 1Microsoft Research, India 2Carnegie Mellon University, Pittsburgh 3University of Michigan, Ann Arbor 4IIT Delhi, India.
Pseudocode Yes Algorithm 1 Proto NN: Train Algorithm
Open Source Code Yes We have implemented Proto NN as part of an open source embedded device ML library and it can be downloaded online2.2https://github.com/Microsoft/ELL
Open Datasets Yes Experimental Settings: Datasets: Table 3 in Appendix 9.1 lists the binary, multiclass and multilabel datasets used in our experiments. (Mentions common benchmark datasets like mnist, usps, cifar, letter-26, curet-61 in subsequent tables).
Dataset Splits Yes For baselines the optimal hyper-parameters are selected through cross-validation. We use the above parameter settings for all binary, multiclass datasets except for binary versions of usps, character and eye which require 5-fold cross validation. Proto NN values obtained with the above hyper-parameters are reported for all datasets, except usps and character recognition which require 5-fold cross validation.
Hardware Specification Yes The Arduino Uno has an 8 bit, 16 MHz Atmega328P microcontroller, with 2k B of SRAM and 32k B of read-only flash.
Software Dependencies No The paper mentions implementing an 'integer version' of Proto NN and notes properties of a microcontroller, but does not specify software components (e.g., Python, TensorFlow, PyTorch) with their version numbers.
Experiment Setup Yes Hyperparameters: In all our experiments, we fix the no. of alternating minimization iterations(T) to 150. Each such iteration does e-many epochs each over the 3 parameters, W, B, and Z. For small binary and multiclass datasets we do GD with e set to 20. For multilabel and large multiclass (aloi) datasets, we do SGD with e set to 5, batch size to 512. Kernel parameter γ is computed after initializing B, W as 2.5 median(D)... Binary: ˆd = 10, s Z = s B = 0.8. m = 40 if s W = 1.0 gives model larger than 16k B. Else, s W = 1.0 and m is increased to reach 16 k B model. Multiclass: ˆd = 15, s Z = s B = 0.8. m = 5/class if s W = 1.0 gives model larger than 64kb. Else, m is increased to reach 64 k B model.