Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization

Authors: Tianyi Liu, Shiyang Li, Jianping Shi, Enlu Zhou, Tuo Zhao

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments on both streaming PCA and training deep neural networks are provided to support our findings for Async-MSGD.
Researcher Affiliation Collaboration Tianyi Liu School of Industrial and System Engineering Georgia Institute of Technology Atlanta, GA 30332 tliu341@gatech.edu Shiyang Li Harbin Institue of Technology lsydevin@gmail.com Jianping Shi Sensetime Group Limited shijianping@sensetime.com Enlu Zhou School of Industrial and System Engineering Georgia Institute of Technology Atlanta, GA 30332 enlu.zhou@isye.gatech.edu Tuo Zhao School of Industrial and System Engineering Georgia Institute of Technology Atlanta, GA 30332 tuo.zhao@isye.gatech.edu
Pseudocode No The paper describes the Async-MSGD algorithm using mathematical equations (e.g., Equation 3, Equation 5) but does not provide a structured pseudocode block.
Open Source Code No The paper does not explicitly state that the source code for the methodology described is publicly available or provide a link to a repository.
Open Datasets Yes training a 32-layer hyperspherical residual neural network (Sphere Res Net34) using the CIFAR-100 dataset for a 100-class image classification task.
Dataset Splits Yes 50k images are used for training, and the rest 10k are used for testing.
Hardware Specification Yes We use a computer workstation with 8 Titan XP GPUs.
Software Dependencies No The paper mentions software like 'deep neural networks' but does not provide specific version numbers for any libraries, frameworks, or solvers used (e.g., TensorFlow, PyTorch, scikit-learn versions).
Experiment Setup Yes We choose a batch size of 128. We choose the initial step size as 0.2. We decrease the step size by a factor of 0.2 after 60, 120, and 160 epochs. The momentum parameter is tuned over {0.1, 0.3, 0.5, 0.7, 0.9}.