Stochastic Newton Proximal Extragradient Method

Authors: Ruichen Jiang, Michal Derezinski, Aryan Mokhtari

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 7 Numerical experiments While our focus is on improving theoretical guarantees, we also provide simple experiments to showcase our method s improvements. All simulations are implemented on a Windows PC with an AMD processor and 16GB Memory. We consider minimizing the regularized log-sum-exp objective f(x) = ρ log(Pn i=1 exp( a i x bi) + λ 2 x 2, a common test function for second-order methods [23, 24] due to its high ill-conditioning. Here, λ > 0 is a regularization parameter, ρ > 0 is a smoothing parameter, and the entries of the vectors a1, . . . , an Rd and b Rd are randomly generated from the standard normal distribution and the uniform distribution over [0, 1], respectively. In our experiments, the regularization parameter λ is 10 3, the dimension d is 500, and the number of samples n is chosen from 50,000, 10,000, and 150,000, respectively. We compare our SNPE method with the stochastic Newton method in [1], using both uniform Hessian averaging (Section 4) and weighted averaging (Section 5). In addition, we evaluate it against accelerated gradient descent (AGD), damped Newton s method, and Newton Proximal Extragradient (NPE), which corresponds to our SNPE method with the exact Hessian. For the stochastic Hessian estimate, we use a subsampling strategy with a subsampling size of s = 500.
Researcher Affiliation Academia Ruichen Jiang ECE Department UT Austin rjiang@utexas.edu Michał Derezi nski EECS Department University of Michigan derezin@umich.edu Aryan Mokhtari ECE Department UT Austin mokhtari@austin.utexas.edu
Pseudocode Yes Algorithm 1 Stochastic NPE Subroutine 1 (η, ˆx) = BLS(x, g, H, α, β, σ)
Open Source Code Yes Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: The code and data are attached in the supplementary material.
Open Datasets No We consider minimizing the regularized log-sum-exp objective f(x) = ρ log(Pn i=1 exp( a i x bi) + λ 2 x 2, a common test function for second-order methods [23, 24] due to its high ill-conditioning. Here, λ > 0 is a regularization parameter, ρ > 0 is a smoothing parameter, and the entries of the vectors a1, . . . , an Rd and b Rd are randomly generated from the standard normal distribution and the uniform distribution over [0, 1], respectively.
Dataset Splits No No explicit details on training, validation, or test dataset splits (percentages, counts, or predefined splits) are provided. The paper mentions varying the number of samples 'n' but not how these samples are divided for training, validation, and testing.
Hardware Specification Yes All simulations are implemented on a Windows PC with an AMD processor and 16GB Memory.
Software Dependencies No No specific software dependencies with version numbers (e.g., programming languages, libraries, or frameworks with their versions) are mentioned.
Experiment Setup Yes In our experiments, the regularization parameter λ is 10 3, the dimension d is 500, and the number of samples n is chosen from 50,000, 10,000, and 150,000, respectively. For the stochastic Hessian estimate, we use a subsampling strategy with a subsampling size of s = 500.