Accelerating SGD with momentum for over-parameterized learning

Authors: Chaoyue Liu, Mikhail Belkin

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental evaluation of Ma SS for several standard architectures of deep networks, including Res Net and convolutional networks, shows improved performance over SGD, SGD+Nesterov and Adam.
Researcher Affiliation Academia Chaoyue Liu Department of Computer Science The Ohio State University Columbus, OH 43210 liu.2656@osu.edu Mikhail Belkin Department of Computer Science The Ohio State University Columbus, OH 43210 mbelkin@cse.ohio-state.edu
Pseudocode Yes A PSEUDOCODE FOR MASS Algorithm 1 : Ma SS Momentum-added Stochastic Solver
Open Source Code Yes Code url: https://github.com/ts66395/Ma SS
Open Datasets Yes Real data: MNIST and CIFAR-10. We compare the optimization performance of SGD, SGD+Nesterov and Ma SS on the following tasks: classification of MNIST with a fullyconnected network (FCN), classification of CIFAR-10 with a convolutional neural network (CNN) and Gaussian kernel regression on MNIST.
Dataset Splits No The paper mentions training and testing but does not specify details about a validation dataset split or how it was used.
Hardware Specification No GPUs donated by Nvidia were used for the experiments. (This is too general, lacking specific models or types.)
Software Dependencies No The paper does not specify versions for any software components used in the experiments (e.g., Python, PyTorch, TensorFlow, etc.).
Experiment Setup Yes All algorithms are implemented with mini batches of size 64 for neural network training. In each task, we use the same initial learning rate for Ma SS, SGD and SGD+Nesterov, and run the same number of epochs (150 epochs for CNN and 300 epochs for Res Net-32). CNN: η = 0.01 (initial), α = 0.05, κm = 3; η = 0.3 (initial), α = 0.05, κm = 6. Res Net-32: η = 0.1 (initial), α = 0.05, κm = 2; η = 0.3 (initial), α = 0.05, κm = 24.