AutoAssist: A Framework to Accelerate Training of Deep Neural Networks
Authors: Jiong Zhang, Hsiang-Fu Yu, Inderjit S. Dhillon
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To demonstrate the effectiveness of Auto Assist in training large-scale DNNs, we conduct experiments 3 on two applications where DNNs have been successful: image classification and neural machine translation. |
| Researcher Affiliation | Collaboration | Jiong Zhang zhangjiong724@utexas.edu Hsiang-Fu Yu rofu.yu@gmail.com Inderjit S. Dhillon inderjit@cs.utexas.edu The University of Texas at Austin Amazon |
| Pseudocode | Yes | Algorithm 1 Assistant: Utility Aware Batch Generator |
| Open Source Code | Yes | The code is available at https://github.com/zhangjiong724/autoassist-exp |
| Open Datasets | Yes | We consider MNIST [21], rotated MNIST4, CIFAR10 [18] and raw Image Net [7] datasets. |
| Dataset Splits | No | The paper states training and test set sizes (e.g., '50k images for training, and 10k for test' for some datasets) but does not explicitly mention a separate validation split or the methodology for creating one in the main text or appendix. |
| Hardware Specification | Yes | We use 8 Nvidia V100 GPUs for Boss training and stop after training on 6 billion tokens. |
| Software Dependencies | No | The paper mentions using 'Fairseq [25] codebase' but does not provide specific version numbers for Fairseq or any other software dependencies. |
| Experiment Setup | Yes | Following [11], we use SGD with momentum as the optimizer. The detailed parameter settings are listed in the Appendix. For MNIST, rotated MNIST and CIFAR10, we used 50k images for training, and 10k for test. We used an initial learning rate of 0.1 for 18-layer and 34-layer Res Nets and 0.05 for 101-layer Res Nets. The learning rate is divided by 10 at 80th and 120th epochs. We trained for 160 epochs with batch size 128. A weight decay of 0.0001 and momentum of 0.9 is applied to the optimizer. For Image Net, we used an initial learning rate of 0.1 for 18-layer Res Net which is divided by 10 at 30th, 60th and 90th epochs. We trained for 100 epochs with batch size 256. A weight decay of 0.0001 and momentum of 0.9 is applied. |