Accelerated Convergence of Stochastic Heavy Ball Method under Anisotropic Gradient Noise
Authors: Rui Pan, Yuxing Liu, Xiaoyu Wang, Tong Zhang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we fill this theoretical gap by establishing a non-asymptotic convergence bound for stochastic heavy-ball methods with step decay scheduler on quadratic objectives, under the anisotropic gradient noise condition. |
| Researcher Affiliation | Academia | 1The Hong Kong University of Science and Technology 2Fudan University 3University of Illinois Urbana-Champaign |
| Pseudocode | Yes | Algorithm 1 Multistage Stochastic Heavy Ball with minibatch |
| Open Source Code | No | The paper does not explicitly provide a link to its source code or state that it has been made open source for the methodology described. |
| Open Datasets | Yes | We use a4a1 dataset (Chang and Lin, 2011; Dua and Graff, 2017) to realize this setting... In this experiment, CIFAR-10 (Krizhevsky et al., 2009) dataset is adopted... |
| Dataset Splits | Yes | We use 5,000 randomly chosen samples in the training set to form a validation set, then conduct grid searches by training on the remaining 45,000 samples and selecting the hyperparameter with the best validation accuracy. |
| Hardware Specification | No | The paper mentions simulating distributed learning with '16 nodes' but does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers, such as library or solver names, needed to replicate the experiment. |
| Experiment Setup | Yes | In all of our experiments, we set the number of epochs to 100... we set different batch sizes M {2048, 512, 128}... For all schedulers, we set η0 {100, 10 1, 10 2, 10 3}. As for the choice of momentum factor β, we set β = 0.9 for stochastic heavy ball methods. |