SP2 : A Second Order Stochastic Polyak Method
Authors: Shuang Li, William Joseph Swartworth, Martin Takáč, Deanna Needell, Robert M. Gower
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show SP2 is very competitive on matrix completion, non-convex test problems and logistic regression. We also provide a convergence theory on sums-of-quadratics. ... We test the proposed methods with a series of experiments in Section 6. |
| Researcher Affiliation | Academia | Shuang Li 1, William J. Swartworth 1, Martin Tak aˇc2, Deanna Needell1, and Robert M. Gower3 1Department of Mathematics, University of California, Los Angeles, USA 2Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, UAE 3Center for Computational Mathematics, Flatiron Institute, Simons Foundation, New York, USA |
| Pseudocode | No | The paper does not contain explicitly labeled pseudocode or algorithm blocks. Methods are described mathematically and textually. |
| Open Source Code | No | The paper mentions using a third-party Python package (`pybenchfunction`) and `torch.autograd.functional.hvp` in their code, but does not state that their own implementation of SP2 or other methods described in the paper is open-source or provide a link. |
| Open Datasets | Yes | We used two data sets: colon-cancer ((n, d) = (62, 2000)) Alon et al. (1999) and mushrooms ((n, d) = (8124, 112)) West et al. (2001), both of which interpolate when σ = 0. |
| Dataset Splits | No | The paper mentions using a grid search for hyperparameters and effective passes, but does not specify exact training, validation, or test dataset splits (e.g., percentages or sample counts) needed for reproduction. It does not provide detailed splitting methodology or references to predefined splits with citations for these splits. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions using `torch.autograd.functional.hvp` and the "Python Package pybenchfunction" but does not specify version numbers for PyTorch, Python, or the `pybenchfunction` package, which are necessary for reproducible software dependencies. |
| Experiment Setup | Yes | For the implementation of Newton s method, we found that the regularized Newton methods was best suited for these non-convex experiments where wt+1 = wt γ(Iϵ + 2f(wt)) 1 f(wt), where ϵ is the regularization parameter which we set to ϵ = 10 8, and γ is a learning rate which we tuned. Specifically, for all the experiments in this section we set a learning for all methods to 0.25 except for SGD and ADAM, both which had to be tuned by doing a grid search on the first 100 epochs over the grid [0.0001, 0.0005, 0.001, 0.005, 0.01, , 0.05, 0.25, 1]. For the SGD method, we use a learning rate Lmax/t in the t-th iteration, where Lmax = 1/4 maxi=1,...,n xi 2 denotes the smoothness constant of the loss function. We chose λ for SP2L2+, SP2L1+, and SP2max+ using a grid search of λ {0.1, 0.2, . . . , 0.9}, the details are in Section D. with momentum set to 0.3. |