An Improved Analysis of Stochastic Gradient Descent with Momentum
Authors: Yanli Liu, Yuan Gao, Wotao Yin
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6 Experiments In this section, we verify our theoretical claims by numerical experiments. |
| Researcher Affiliation | Academia | Department of Mathematics, University of California, Los Angeles Department of IEOR, Columbia University |
| Pseudocode | Yes | Algorithm 1 Multistage SGDM Input: problem data f(x) as in (1), number of stages n, momentum weights {βi}n i=1 [0, 1), step sizes {αi}n i=1, and stage lengths {Ti}n i=1 at n stages, initialization x1 Rd and m0 = 0, iteration counter k = 1. 1: for i = 1, 2, ..., n do 2: α αi, β βi; 3: for j = 1, 2, ..., Ti do 4: Sample a minibatch ζk uniformly from the training data; 5: gk xl(xk, ζk); 6: mk βmk 1 + (1 β) gk; 7: xk+1 xk αmk; 8: k k + 1; 9: end for 10: end for 11: return x, which is generated by first choosing a stage l {1, 2, ...n} uniformly at random, and then choosing x {x T1+...+Tl 1+1, x T1+...+Tl 1+2, ..., x T1+...+Tl} uniformly at random; |
| Open Source Code | Yes | Our implementation is available at Git Hub1. 1https://github.com/gao-yuan-hangzhou/improved-analysis-sgdm |
| Open Datasets | Yes | The MNIST dataset consists of n = 60000 labeled examples of 28 28 gray-scale images of handwritten digits in K = 10 classes 0, 1, . . . , 9. |
| Dataset Splits | No | The paper mentions using MNIST and CIFAR-10 datasets, which have standard splits, but it does not explicitly state the train/validation/test split percentages or sample counts in the text. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU models, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions PyTorch [19] and TensorFlow [1] as frameworks where SGDM is implemented, but it does not specify the version numbers of these or any other software dependencies used for the experiments. |
| Experiment Setup | Yes | For all algorithms, we use batch size s = 64 (and hence number batches per epoch is m = 1874), number of epochs T = 50. The regularization parameter is λ = 5 10 4. |