Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization
Authors: Tianyi Liu, Shiyang Li, Jianping Shi, Enlu Zhou, Tuo Zhao
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments on both streaming PCA and training deep neural networks are provided to support our findings for Async-MSGD. |
| Researcher Affiliation | Collaboration | Tianyi Liu School of Industrial and System Engineering Georgia Institute of Technology Atlanta, GA 30332 tliu341@gatech.edu Shiyang Li Harbin Institue of Technology lsydevin@gmail.com Jianping Shi Sensetime Group Limited shijianping@sensetime.com Enlu Zhou School of Industrial and System Engineering Georgia Institute of Technology Atlanta, GA 30332 enlu.zhou@isye.gatech.edu Tuo Zhao School of Industrial and System Engineering Georgia Institute of Technology Atlanta, GA 30332 tuo.zhao@isye.gatech.edu |
| Pseudocode | No | The paper describes the Async-MSGD algorithm using mathematical equations (e.g., Equation 3, Equation 5) but does not provide a structured pseudocode block. |
| Open Source Code | No | The paper does not explicitly state that the source code for the methodology described is publicly available or provide a link to a repository. |
| Open Datasets | Yes | training a 32-layer hyperspherical residual neural network (Sphere Res Net34) using the CIFAR-100 dataset for a 100-class image classification task. |
| Dataset Splits | Yes | 50k images are used for training, and the rest 10k are used for testing. |
| Hardware Specification | Yes | We use a computer workstation with 8 Titan XP GPUs. |
| Software Dependencies | No | The paper mentions software like 'deep neural networks' but does not provide specific version numbers for any libraries, frameworks, or solvers used (e.g., TensorFlow, PyTorch, scikit-learn versions). |
| Experiment Setup | Yes | We choose a batch size of 128. We choose the initial step size as 0.2. We decrease the step size by a factor of 0.2 after 60, 120, and 160 epochs. The momentum parameter is tuned over {0.1, 0.3, 0.5, 0.7, 0.9}. |