reproducibilityindex.ai

AutoAssist: A Framework to Accelerate Training of Deep Neural Networks

Authors: Jiong Zhang, Hsiang-Fu Yu, Inderjit S. Dhillon

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To demonstrate the effectiveness of Auto Assist in training large-scale DNNs, we conduct experiments 3 on two applications where DNNs have been successful: image classiﬁcation and neural machine translation.
Researcher Affiliation	Collaboration	Jiong Zhang zhangjiong724@utexas.edu Hsiang-Fu Yu rofu.yu@gmail.com Inderjit S. Dhillon inderjit@cs.utexas.edu The University of Texas at Austin Amazon
Pseudocode	Yes	Algorithm 1 Assistant: Utility Aware Batch Generator
Open Source Code	Yes	The code is available at https://github.com/zhangjiong724/autoassist-exp
Open Datasets	Yes	We consider MNIST [21], rotated MNIST4, CIFAR10 [18] and raw Image Net [7] datasets.
Dataset Splits	No	The paper states training and test set sizes (e.g., '50k images for training, and 10k for test' for some datasets) but does not explicitly mention a separate validation split or the methodology for creating one in the main text or appendix.
Hardware Specification	Yes	We use 8 Nvidia V100 GPUs for Boss training and stop after training on 6 billion tokens.
Software Dependencies	No	The paper mentions using 'Fairseq [25] codebase' but does not provide specific version numbers for Fairseq or any other software dependencies.
Experiment Setup	Yes	Following [11], we use SGD with momentum as the optimizer. The detailed parameter settings are listed in the Appendix. For MNIST, rotated MNIST and CIFAR10, we used 50k images for training, and 10k for test. We used an initial learning rate of 0.1 for 18-layer and 34-layer Res Nets and 0.05 for 101-layer Res Nets. The learning rate is divided by 10 at 80th and 120th epochs. We trained for 160 epochs with batch size 128. A weight decay of 0.0001 and momentum of 0.9 is applied to the optimizer. For Image Net, we used an initial learning rate of 0.1 for 18-layer Res Net which is divided by 10 at 30th, 60th and 90th epochs. We trained for 100 epochs with batch size 256. A weight decay of 0.0001 and momentum of 0.9 is applied.