Distributed Distillation for On-Device Learning
Authors: Ilai Bistritz, Ariana Mann, Nicholas Bambos
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Simulations support our theoretical findings and show that even a naive implementation of our algorithm significantly reduces the communication overhead while achieving an overall comparable accuracy to the state-of-the-art. and 5 Simulation Results We conduct DNN simulations to evaluate the performance of Distributed Distillation (D-Distillation) compared to two baselines: Distributed-SGD (D-SGD) and Silo-SGD (where each device trains its DNN with only its private data and no communication). |
| Researcher Affiliation | Academia | Ilai Bistritz, Ariana J. Mann, Nicholas Bambos Stanford University {bistritz,ajmann,bambos}@stanford.edu |
| Pseudocode | Yes | Algorithm 1 Distributed Distillation |
| Open Source Code | No | The paper does not provide an unambiguous statement or a direct link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | training Le Net-5 on MNIST [40] and Res Net-8 on CIFAR10 [41]. Citations [40] and [41] refer to papers introducing these standard public datasets. |
| Dataset Splits | No | The paper mentions distributing "MNIST training data" to devices and using a "test accuracy", but does not provide explicit details about train/validation/test dataset splits, specific percentages, or sample counts for each split. |
| Hardware Specification | No | The paper discusses 'edge devices' like 'smartphone or IoT' as the target environment for on-device learning but does not specify the hardware (e.g., GPU models, CPU types, memory) used to run the simulations or experiments presented in the paper. |
| Software Dependencies | No | The paper mentions machine learning models (DNNs, Le Net-5, Res Net) and datasets (MNIST, CIFAR-10) but does not provide specific software dependencies with version numbers, such as deep learning frameworks or libraries. |
| Experiment Setup | Yes | We selected the best hyperparameters for each algorithm from a limited search as detailed in Appendix 12.1. and from Appendix 12.1: For the MNIST dataset, we used a LeNet-5 architecture with a learning rate of 0.01 for the SGD optimizer. The batch size was 32. For the CIFAR-10 dataset, we used a ResNet-8 architecture with a learning rate of 0.01 for the SGD optimizer. The batch size was 128. |