AutoAssist: A Framework to Accelerate Training of Deep Neural Networks

Authors: Jiong Zhang, Hsiang-Fu Yu, Inderjit S. Dhillon

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To demonstrate the effectiveness of Auto Assist in training large-scale DNNs, we conduct experiments 3 on two applications where DNNs have been successful: image classification and neural machine translation.
Researcher Affiliation Collaboration Jiong Zhang zhangjiong724@utexas.edu Hsiang-Fu Yu rofu.yu@gmail.com Inderjit S. Dhillon inderjit@cs.utexas.edu The University of Texas at Austin Amazon
Pseudocode Yes Algorithm 1 Assistant: Utility Aware Batch Generator
Open Source Code Yes The code is available at https://github.com/zhangjiong724/autoassist-exp
Open Datasets Yes We consider MNIST [21], rotated MNIST4, CIFAR10 [18] and raw Image Net [7] datasets.
Dataset Splits No The paper states training and test set sizes (e.g., '50k images for training, and 10k for test' for some datasets) but does not explicitly mention a separate validation split or the methodology for creating one in the main text or appendix.
Hardware Specification Yes We use 8 Nvidia V100 GPUs for Boss training and stop after training on 6 billion tokens.
Software Dependencies No The paper mentions using 'Fairseq [25] codebase' but does not provide specific version numbers for Fairseq or any other software dependencies.
Experiment Setup Yes Following [11], we use SGD with momentum as the optimizer. The detailed parameter settings are listed in the Appendix. For MNIST, rotated MNIST and CIFAR10, we used 50k images for training, and 10k for test. We used an initial learning rate of 0.1 for 18-layer and 34-layer Res Nets and 0.05 for 101-layer Res Nets. The learning rate is divided by 10 at 80th and 120th epochs. We trained for 160 epochs with batch size 128. A weight decay of 0.0001 and momentum of 0.9 is applied to the optimizer. For Image Net, we used an initial learning rate of 0.1 for 18-layer Res Net which is divided by 10 at 30th, 60th and 90th epochs. We trained for 100 epochs with batch size 256. A weight decay of 0.0001 and momentum of 0.9 is applied.