Accelerating SGD with momentum for over-parameterized learning
Authors: Chaoyue Liu, Mikhail Belkin
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental evaluation of Ma SS for several standard architectures of deep networks, including Res Net and convolutional networks, shows improved performance over SGD, SGD+Nesterov and Adam. |
| Researcher Affiliation | Academia | Chaoyue Liu Department of Computer Science The Ohio State University Columbus, OH 43210 liu.2656@osu.edu Mikhail Belkin Department of Computer Science The Ohio State University Columbus, OH 43210 mbelkin@cse.ohio-state.edu |
| Pseudocode | Yes | A PSEUDOCODE FOR MASS Algorithm 1 : Ma SS Momentum-added Stochastic Solver |
| Open Source Code | Yes | Code url: https://github.com/ts66395/Ma SS |
| Open Datasets | Yes | Real data: MNIST and CIFAR-10. We compare the optimization performance of SGD, SGD+Nesterov and Ma SS on the following tasks: classification of MNIST with a fullyconnected network (FCN), classification of CIFAR-10 with a convolutional neural network (CNN) and Gaussian kernel regression on MNIST. |
| Dataset Splits | No | The paper mentions training and testing but does not specify details about a validation dataset split or how it was used. |
| Hardware Specification | No | GPUs donated by Nvidia were used for the experiments. (This is too general, lacking specific models or types.) |
| Software Dependencies | No | The paper does not specify versions for any software components used in the experiments (e.g., Python, PyTorch, TensorFlow, etc.). |
| Experiment Setup | Yes | All algorithms are implemented with mini batches of size 64 for neural network training. In each task, we use the same initial learning rate for Ma SS, SGD and SGD+Nesterov, and run the same number of epochs (150 epochs for CNN and 300 epochs for Res Net-32). CNN: η = 0.01 (initial), α = 0.05, κm = 3; η = 0.3 (initial), α = 0.05, κm = 6. Res Net-32: η = 0.1 (initial), α = 0.05, κm = 2; η = 0.3 (initial), α = 0.05, κm = 24. |