SP2 : A Second Order Stochastic Polyak Method

Authors: Shuang Li, William Joseph Swartworth, Martin Takáč, Deanna Needell, Robert M. Gower

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show SP2 is very competitive on matrix completion, non-convex test problems and logistic regression. We also provide a convergence theory on sums-of-quadratics. ... We test the proposed methods with a series of experiments in Section 6.
Researcher Affiliation Academia Shuang Li 1, William J. Swartworth 1, Martin Tak aˇc2, Deanna Needell1, and Robert M. Gower3 1Department of Mathematics, University of California, Los Angeles, USA 2Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, UAE 3Center for Computational Mathematics, Flatiron Institute, Simons Foundation, New York, USA
Pseudocode No The paper does not contain explicitly labeled pseudocode or algorithm blocks. Methods are described mathematically and textually.
Open Source Code No The paper mentions using a third-party Python package (`pybenchfunction`) and `torch.autograd.functional.hvp` in their code, but does not state that their own implementation of SP2 or other methods described in the paper is open-source or provide a link.
Open Datasets Yes We used two data sets: colon-cancer ((n, d) = (62, 2000)) Alon et al. (1999) and mushrooms ((n, d) = (8124, 112)) West et al. (2001), both of which interpolate when σ = 0.
Dataset Splits No The paper mentions using a grid search for hyperparameters and effective passes, but does not specify exact training, validation, or test dataset splits (e.g., percentages or sample counts) needed for reproduction. It does not provide detailed splitting methodology or references to predefined splits with citations for these splits.
Hardware Specification No The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions using `torch.autograd.functional.hvp` and the "Python Package pybenchfunction" but does not specify version numbers for PyTorch, Python, or the `pybenchfunction` package, which are necessary for reproducible software dependencies.
Experiment Setup Yes For the implementation of Newton s method, we found that the regularized Newton methods was best suited for these non-convex experiments where wt+1 = wt γ(Iϵ + 2f(wt)) 1 f(wt), where ϵ is the regularization parameter which we set to ϵ = 10 8, and γ is a learning rate which we tuned. Specifically, for all the experiments in this section we set a learning for all methods to 0.25 except for SGD and ADAM, both which had to be tuned by doing a grid search on the first 100 epochs over the grid [0.0001, 0.0005, 0.001, 0.005, 0.01, , 0.05, 0.25, 1]. For the SGD method, we use a learning rate Lmax/t in the t-th iteration, where Lmax = 1/4 maxi=1,...,n xi 2 denotes the smoothness constant of the loss function. We chose λ for SP2L2+, SP2L1+, and SP2max+ using a grid search of λ {0.1, 0.2, . . . , 0.9}, the details are in Section D. with momentum set to 0.3.