Stochastic Newton Proximal Extragradient Method
Authors: Ruichen Jiang, Michal Derezinski, Aryan Mokhtari
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 7 Numerical experiments While our focus is on improving theoretical guarantees, we also provide simple experiments to showcase our method s improvements. All simulations are implemented on a Windows PC with an AMD processor and 16GB Memory. We consider minimizing the regularized log-sum-exp objective f(x) = ρ log(Pn i=1 exp( a i x bi) + λ 2 x 2, a common test function for second-order methods [23, 24] due to its high ill-conditioning. Here, λ > 0 is a regularization parameter, ρ > 0 is a smoothing parameter, and the entries of the vectors a1, . . . , an Rd and b Rd are randomly generated from the standard normal distribution and the uniform distribution over [0, 1], respectively. In our experiments, the regularization parameter λ is 10 3, the dimension d is 500, and the number of samples n is chosen from 50,000, 10,000, and 150,000, respectively. We compare our SNPE method with the stochastic Newton method in [1], using both uniform Hessian averaging (Section 4) and weighted averaging (Section 5). In addition, we evaluate it against accelerated gradient descent (AGD), damped Newton s method, and Newton Proximal Extragradient (NPE), which corresponds to our SNPE method with the exact Hessian. For the stochastic Hessian estimate, we use a subsampling strategy with a subsampling size of s = 500. |
| Researcher Affiliation | Academia | Ruichen Jiang ECE Department UT Austin rjiang@utexas.edu Michał Derezi nski EECS Department University of Michigan derezin@umich.edu Aryan Mokhtari ECE Department UT Austin mokhtari@austin.utexas.edu |
| Pseudocode | Yes | Algorithm 1 Stochastic NPE Subroutine 1 (η, ˆx) = BLS(x, g, H, α, β, σ) |
| Open Source Code | Yes | Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: The code and data are attached in the supplementary material. |
| Open Datasets | No | We consider minimizing the regularized log-sum-exp objective f(x) = ρ log(Pn i=1 exp( a i x bi) + λ 2 x 2, a common test function for second-order methods [23, 24] due to its high ill-conditioning. Here, λ > 0 is a regularization parameter, ρ > 0 is a smoothing parameter, and the entries of the vectors a1, . . . , an Rd and b Rd are randomly generated from the standard normal distribution and the uniform distribution over [0, 1], respectively. |
| Dataset Splits | No | No explicit details on training, validation, or test dataset splits (percentages, counts, or predefined splits) are provided. The paper mentions varying the number of samples 'n' but not how these samples are divided for training, validation, and testing. |
| Hardware Specification | Yes | All simulations are implemented on a Windows PC with an AMD processor and 16GB Memory. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., programming languages, libraries, or frameworks with their versions) are mentioned. |
| Experiment Setup | Yes | In our experiments, the regularization parameter λ is 10 3, the dimension d is 500, and the number of samples n is chosen from 50,000, 10,000, and 150,000, respectively. For the stochastic Hessian estimate, we use a subsampling strategy with a subsampling size of s = 500. |