reproducibilityindex.ai

Distributed Distillation for On-Device Learning

Authors: Ilai Bistritz, Ariana Mann, Nicholas Bambos

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Simulations support our theoretical ﬁndings and show that even a naive implementation of our algorithm signiﬁcantly reduces the communication overhead while achieving an overall comparable accuracy to the state-of-the-art. and 5 Simulation Results We conduct DNN simulations to evaluate the performance of Distributed Distillation (D-Distillation) compared to two baselines: Distributed-SGD (D-SGD) and Silo-SGD (where each device trains its DNN with only its private data and no communication).
Researcher Affiliation	Academia	Ilai Bistritz, Ariana J. Mann, Nicholas Bambos Stanford University {bistritz,ajmann,bambos}@stanford.edu
Pseudocode	Yes	Algorithm 1 Distributed Distillation
Open Source Code	No	The paper does not provide an unambiguous statement or a direct link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	training Le Net-5 on MNIST [40] and Res Net-8 on CIFAR10 [41]. Citations [40] and [41] refer to papers introducing these standard public datasets.
Dataset Splits	No	The paper mentions distributing "MNIST training data" to devices and using a "test accuracy", but does not provide explicit details about train/validation/test dataset splits, specific percentages, or sample counts for each split.
Hardware Specification	No	The paper discusses 'edge devices' like 'smartphone or IoT' as the target environment for on-device learning but does not specify the hardware (e.g., GPU models, CPU types, memory) used to run the simulations or experiments presented in the paper.
Software Dependencies	No	The paper mentions machine learning models (DNNs, Le Net-5, Res Net) and datasets (MNIST, CIFAR-10) but does not provide specific software dependencies with version numbers, such as deep learning frameworks or libraries.
Experiment Setup	Yes	We selected the best hyperparameters for each algorithm from a limited search as detailed in Appendix 12.1. and from Appendix 12.1: For the MNIST dataset, we used a LeNet-5 architecture with a learning rate of 0.01 for the SGD optimizer. The batch size was 32. For the CIFAR-10 dataset, we used a ResNet-8 architecture with a learning rate of 0.01 for the SGD optimizer. The batch size was 128.