Accelerated On-Device Forward Neural Network Training with Module-Wise Descending Asynchronism
Authors: Xiaohan Zhao, Hualin Zhang, Zhouyuan Huo, Bin Gu
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluations conducted on NVIDIA s AGX Orin, a popular embedded device, show that Async FGD reduces memory consumption and enhances hardware efficiency, offering a novel approach to on-device learning. |
| Researcher Affiliation | Academia | 1 Nanjing University of Information Science and Technology, China 2 Mohamed bin Zayed University of Artificial Intelligence, UAE 3 School of Artificial Intelligence, Jilin University, China Author Zhouyuan Huo is currently at Google. No work performed at Google. Corresponding author. |
| Pseudocode | Yes | Algorithm 1 Async FGD-SGD |
| Open Source Code | No | The paper does not provide any concrete statement or link regarding the public availability of its source code. |
| Open Datasets | Yes | Table 1 mentions 'CIFAR-10' and 'SVHN' datasets. Section 6.3 states 'The models are fine-tuned with weights pre-trained on Imagenet'. |
| Dataset Splits | No | The paper mentions 'validation performance' for learning rate selection but does not provide specific data split information (percentages, counts, or explicit citation for predefined splits) for training, validation, or test sets. |
| Hardware Specification | Yes | Experiments utilize Python 3.8 and Pytorch, primarily on n Vidia s AGX Orin. Additional results on alternate platforms are in the appendix. In Section I.2, specific hardware includes: NVIDIA AGX Orin, four NVIDIA 1080 Ti GPUs, NVIDIA A100, and Intel(R) Xeon(R) CPU E5-2678 v3 @2.50GHZ. |
| Software Dependencies | No | The paper states 'Experiments utilize Python 3.8 and Pytorch', but does not provide a specific version number for Pytorch, nor other key software dependencies with version numbers. |
| Experiment Setup | Yes | Batch size is 64 unless noted. The optimal learning rate (chosen from {1e 5, 1e 4, 1e 3, 1e 2} with Adam optimizer [18]) is based on validation performance. The parameter α is initially set to 1 for the classifier, with others at 0 for the first 10 epochs. Subsequently, α is gradually increased to 0.15 for specific layers. The seeds for all experiments are fixed to 0. Details on network architecture and model splitting are provided in Appendix H. |